Assessment war: Windows services

2008-02-01

Aleksander Czarnowski

AVET, Poland

Editor: Helen Martin

Abstract

In the world of Web 2.0, Java, .NET and other hot technologies we are often guilty of forgetting about the core components that make it all possible. Aleksander Czarnowski describes a simple attack scenario based on a high-privilege Windows service vulnerability.

Table of contents

Introduction
Targeted attacks
The plot
The reconnaissance
Anti-debugging
Auditing the service binary
What to look for
The debuggers
TOPSTACK method
The final scene

Introduction

In the world of Web 2.0, Java, .NET and other hot technologies we tend to forget about the core components that make it all possible. In the case of the Windows platform, the base components are the kernel and the Windows services. In fact, Service Control Manager (SCM) can be used to load kernel modules and use all ring 0 privileges – not to mention virtualization. Indeed, not much has changed since Windows NT 4.0: add RPC and DCOM and we have the foundation of the Windows operating system.

Targeted attacks

In an enterprise environment it is common to find custom-made business applications or plug-ins to well known solutions. This opens an interesting window of opportunity for potential attackers. After years of discussing secure programming, programmers still produce bad (insecure) code – which is later tested and deployed with the highest possible privileges. Because architects have provided programmers with bad architecture, programmers use high-level privileges and testing is based on the same set of access rights as on the developers’ machines. In the case of Windows services this means running as LocalSystem, even in XP, 2003 and Vista, which provide two additional built-in accounts for the job: NetworkService and LocalService.

In this article I will describe a simple attack scenario based on high-privilege service vulnerability. It’s not a true story, but the experiences and techniques have been gathered and developed over the course of real-life assessments.

The plot

Imagine the following scenario: in our corporate network we have deployed some kind of custom business application. Internally, inter-process communication is provided with the help of Windows-based services. Those services have network access and provide some kind of parser to gather data. Also in the environment is an internal attacker – the bad guy. He knows that intrusion prevention systems (IPS) have been deployed in the network, so trying to exploit the good old RPC-DCOM vulnerability or scanning for an ‘sa’ account with an empty password in MS SQL Server will be noticed pretty quickly and probably stopped by the IPS. He needs something ‘unusual’ to bypass all the protection and yet gain high privileges. The custom business application seems like an ideal potential target. One could ask why he would attack a Windows service – looking for SQL injection in an application web front-end would be easier and if done wisely would probably go undetected by the IPS (you should now be thinking of how to deal with SSL/TLS connections on your IPS). Let us assume, however, that our attacker is not only after the data provided by the application, but he also wants to gain high privileges and be able to penetrate the rest of our ActiveDirectory infrastructure. SQL injection might not be the best way in such a case, but it is still worth a try.

To complete the crime scene we also need a service programmer. For the reasons mentioned earlier the programmer decided to run his service with LocalSystem privileges. This has been recorded only in internal documentation, which is not available to the company’s customers. Also, source code is not available to any of the company’s employees. So our attacker is left with a binary file running with high privileges on Windows Server – or is he?

The reconnaissance

This is the part of the attack that is usually detected by network IPS systems. However, if done slowly and carefully it could be missed by the IPS or ignored by a security officer. Our attacker needs to learn as much as he can about the server running the targeted service. The simplest method would be to use nmap to detect all of the services:

nmap –sS –A server_ip

Another great tool for the reconnaissance phase in a Windows-based network is Winfingerprint. It can detect shares, services etc. as long as the RestrictAnonymous key in the registry is set to 0 or we have sufficiently high privileges within the AD infrastructure. Fortunately, enumerating server resources from an AD user account usually provides good results.

The next step is to learn more about RPC interfaces – rpcdump from Resource Toolkit is a great tool for the job:

rpcdump.exe /s server_ip /v /i

If our attacker were able to log on to the target server he would also be able to gather some more information about the execution environment. The tasklist not only provides a list of all processes running, but can also provide detailed information about services:

tasklist.exe /svc

The ‘/SVC’ switch shows the list of active services in each process. In the case of Windows 2000 the attacker would need to use the ‘tlist -s’ command. It is important to note that some configurations allow remote access to the SCM database which provides similar information over the network.

Anti-debugging

In most cases you will not find any anti-debugging techniques in custom services. The probability of dealing with compressed PE files is also low. However, if dynamic analysis goes wrong, it is worth checking if the binary is protected in some way. As noted earlier, the Windows service is a typical PE file, so if there is no import table or it contains only a few functions then you know that imports have been protected. This is an important observation as most of the clues we used to look for vulnerabilities were based on import table integrity. Also keep in mind that even if you use function names to identify C/C++ functions only, you may not find any calls. The reason is simple: every compiler uses inline functions, so instead of call instructions you will only find ‘unwound’ function code. This applies to some string functions, for example.

Returning to anti-debugging, this is a topic that could fill a book (or more), so I’ll just describe the most basic technique briefly. Remember that the service programmer probably wasn’t getting paid for anti-debugging code, so if you do find any in a custom service then it will probably be based on a simple technique like the IsDebuggerPresent function.

In fact, the IsDebuggerPresent method can be implemented in a number of different ways. The simplest method is based on calling the IsDebuggerPresent function from kernel32.dll. If the function returns 0 then the process is not being debugged. If you peek inside the IsDebuggerPresent function you will find some very simple code:

lkd> u kernel32!isdebuggerpresent
kernel32!IsDebuggerPresent:
7c813093 64a118000000  mov   eax,dword ptr fs:[00000018h]
7c813099 8b4030        mov   eax,dword ptr [eax+30h]
7c81309c 0fb64002      movzx eax,byte ptr [eax+2]
7c8130a0 c3            ret

A quick inspection of the PEB structure tells us that offset 2 is the BeingDebugged field. What is interesting is the fact that you can set this field to 0 after attaching a debugger. This is an even better method than intercepting calls to the IsDebuggerPresent function and always setting the EAX register to 0, because either a direct call to the function or invoking its code directly from the service will always provide the same result.

The IsDebuggerCode function can be even simpler – you can remove the first line as it gets a self pointer from _NT_TIB (you can look it up using the ‘dt’ command in WinDbg). So the new code may look like this:

mov eax, fs:[30h]
mov eax, byte [eax+2]

Speaking of PEB, it is worth mentioning that the NtGlobalFlag field at offset 68h is also modified if the process is being debugged. For example, FLG_HEAP_VALIDATE_PARAMETERS will be set. This can also be used for debugger detection. For a good review of different anti-debugging techniques in Windows see [1].

Auditing the service binary

Auditing Windows services is a bit different at first from auditing normal native applications. First of all, services are not run directly but with the help of SCM. Secondly, every service has at least two entry points. Inside the service binary is just a plain PE console application. What makes it different is a call to the StartServiceCtrlDispatcher() function. This function takes only one parameter: lpServiceTable.

lpServiceTable is a pointer to an array of SERVICE_TABLE_ENTRY [2] structures containing one entry for each service that can execute in the calling process. The members of the last entry in the table must have NULL values to designate the end of the table.

SERVICE_TABLE_ENTRY has the following structure:

typedef struct _SERVICE_TABLE_ENTRY {
  LPTSTR lpServiceName;
  LPSERVICE_MAIN_FUNCTION lpServiceProc;

} SERVICE_TABLE_ENTRY,
  *LPSERVICE_TABLE_ENTRY;

The most important is the lpServiceProc argument which points to the ServiceMain function, which is the real entry point for the particular service. So, to find all entry points in the service we first need to locate SERVICE_TABLE_ENTRY. This is trivial if you use IDA Pro – just find all references to StartServiceCtrlDispatcher() and you will have the lpServiceTable pointer. You don’t even need to do it manually, as the following IDC script will do it for you:

auto ea, ref;
ea = LocByName(“StartServiceCtrlDispatcher”);
if(ea != BADADDR)
{
  if(GetFunctionFlags(ea) != -1)
  {
    Message(“\nfound function at %8X:\n”, ea);
    for(ref=RfirstB(ea); ref != BADADDR;ref=RnextB(ea, ref))
    {
      Message(“ + called from %s (0x%8X)”, GetFunctionName(ref), ref);
    }
  }
  else
    Message(“No StartServiceCtrlDispatcher function found in imports.\n”);
}
else
  Message(“No StartServiceCtrlDispatcher function found in imports.\n”);

We also need to take a look at how the service starts. To do this we need to locate the CreateService() function within the audited binary. Here is the function prototype:

SC_HANDLE WINAPI CreateService(
 __in      SC_HANDLE hSCManager,
 __in      LPCTSTR lpServiceName,
 __in_opt  LPCTSTR lpDisplayName,
 __in      DWORD dwDesiredAccess,
 __in      DWORD dwServiceType,
 __in      DWORD dwStartType,
 __in      DWORD dwErrorControl,
 __in_opt  LPCTSTR lpBinaryPathName,
 __in_opt  LPCTSTR lpLoadOrderGroup,
 __out_opt LPDWORD lpdwTagId,
 __in_opt  LPCTSTR lpDependencies,
 __in_opt  LPCTSTR lpServiceStartName,
 __in_opt  LPCTSTR lpPassword
 );

We are mainly interested in three arguments: dwServiceType, lpServiceStartName and lpPassword. Sometimes – but not very often – you can find a clear text password using the lpPassword pointer. Usually, however, it is an empty string as one of the system accounts is being used. The dwServiceType will tell us if it is the kernel of a user-mode service. In addition, we need to check how the service is being run inside the system – whether as a separate process or not:

SERVICE_WIN32_OWN_PROCESS will be specified if the service is running within its own process.
SERVICE_WIN32_SHARE_PROCESS will be specified if the service is sharing a process with other services.

Also, if one of the above options is used we need to check for SERVICE_INTERACTIVE_PROCESS. If it is set then we are dealing with a service that is using the LocalSystem account [3] – a perfect target for exploiting. Also, if lpServiceStartName is NULL or NT AUTHORITY\LocalService, CreateService will use the LocalService account.

Now, once we have identified all entry points and possibly the privileges used by the service we can look further for vulnerabilities.

What to look for

The methods used by our attacker in the reconnaissance phase should be sufficient to identify possible remote attack vectors. However, sometimes the service does not run on the server – instead it is installed on the workstations that use the custom application. In such cases an attacker will have to analyse the service execution environment and enumerate its DACLs. ProcessExplorer is the tool for this task – it allows the attacker to check if a low privilege account like ‘Everyone’ has the relevant permissions to access service objects. This could be another possible local attack vector.

Now, if the service is using objects it is probably also using the SetSecurityDescriptorDacl() function. A quick check of the import table will give us all the information we need (if the binary is not compressed and the import table is not obfuscated). Assuming we have found SetSecurityDescriptorDacl(), let’s take a look at the arguments passed to it. If the pDacl argument is NULL then we have probably found an exploitable vulnerability. The same method has been used successfully against Oracle Database Server 10gR2 for Windows [4]. You can also look for other security-related functions that take NULL parameters – every one of them will increase the service attack surface, which is a good thing from the attacker’s perspective.

Next it’s time for some fuzzing. We should fuzz all interfaces. Before fuzzing it is good practice to attach a debugger to the target if it is possible. In the case of an attack this will not always be possible, however during a legitimate security assessment this should not be a problem. There is one problem, however, in the case of services that start during system boot. Under Windows 2000 you cannot attach a debugger to a process and detach it later without terminating the target. If the attacker is not able to attach a debugger to the service, how can he find vulnerabilities remotely? There are several possibilities. The most simple and effective is to measure response times – if after a certain request the delay in receiving a reply is longer than usual, this could be something interesting. Sometimes the attacker will be able to crash the service and it will not be restarted automatically.

The process of fuzzing is directly connected with the protocols used by our target. A lot of services use well known protocols like HTTP or RPC for communication, so writing a fuzzer is not a hard task. Some protocols – even internal ones – use some form of authentication. In many cases authentication is based on a static password which is hard-coded somewhere in the service or other parts of the application. If the attacker is lucky the password will be transmitted in clear text over the network. In such a case any sniffer will do the job.

The debuggers

In the good old days everyone used the SoftICE debugger from NuMega (later from Compuware Corporation), but some time ago SoftICE became defunct and now almost everyone uses WinDbg from Microsoft. While WinDbg is one of very few tools that allows kernel-level debugging and works on x64 systems too, in the case of ring 3 applications there are more options available.

There are at least two user-mode debuggers worth mentioning: OllyDBG and Immunity Debugger. In fact, the latter is based on OllyDBG code. Immunity Debugger is interesting because it is one of the very first debuggers to target not bugs but vulnerabilities. Some of its extensions take it one step further: their aim is to speed up exploit development. So if you need to write an exploit for a custom user-mode service, then Immunity Debugger is worth checking out. It also supports command line and is integrated with Python so you can use your own or third-party Python modules.

When talking about security one must not forget IDA Pro – this great dissembler also offers local and remote debugging. Using IDA databases can be convenient if more than one person is working on a project. In reality, though, the whole binary audit is usually performed by just one reverse engineer – it’s hard to organize the work within teams because you simply cannot divide tasks per address range within an application. So a simple rule: one binary object, one person, makes a lot of sense here. There is, of course, the IDA Sync plug-in that allows the work of multiple analysts to be synchronized, but in real life when you are working on a project it is not that easy. No plug-in will quickly synchronize the knowledge about objects across a team.

We have had a few experiences in which, for various reasons, no third-party product could help us out. The reasons included bugs inside software, the length of time needed to implement extensions, etc. This takes us to debugging frameworks like PaiMai [5], but the same problems can apply. So sometimes the best option is write a small debugger yourself. Windows has a very nice set of APIs for debugging purposes. Its documentation is far from perfect as it is missing a lot of detail, which means a lot of time must be spent reading header files from SDK and browsing the web. One of the most important things to remember is that the initial breakpoint set by CreateProcess with the DEBUG_* flag enabled is not the first instruction of the application. One of the best strategies is to handle the initial breakpoint event and set up another breakpoint (the most obvious, trivial and simple method is to insert INT 3 opcode at the entry point). When the initial breakpoint is hit, your process sections are already in memory so it is possible to write to and read the code section. Keep in mind that Windows enforces memory protection, so before any write operation use VirtualQueryEx and VirtualProtectEx to disable and later re-enable page write protection. The following is an example (in assembly language):

invoke VirtualQueryEx, stDE.u.CreateProcessInfo.hProcess,
stDE.u.CreateProcessInfo.lpStartAddress, addr mbi, SIZEOF
MEMORY_BASIC_INFORMATION

invoke ReadProcessMemory, stDE.u.CreateProcessInfo.hProcess,
stDE.u.CreateProcessInfo.lpStartAddress, addr initalbpbuf, 1, NULL

invoke VirtualProtectEx, stDE.u.CreateProcessInfo.hProcess,
stDE.u.CreateProcessInfo.lpStartAddress, mbi.RegionSize,
PAGE_EXECUTE_READWRITE, addr mbi.Protect

[...]

invoke VirtualProtectEx, stDE.u.CreateProcessInfo.hProcess,
mbi.BaseAddress, mbi.RegionSize, mbi.Protect, addr dwOldProtect

Another strategy for stopping at the application entry point is to handle the CREATE_PROCESS_DEBUG_EVENT event and set up a breakpoint at this point.

When modifying a code section remember to flush the instruction cache:

invoke FlushInstructionCache, stDE.u.CreateProcessInfo.hProcess,
stDE.u.CreateProcessInfo.lpStartAddress, 1

You might be wondering why the above examples are written in assembly language. Actually, if you really need a lightweight tool, assembly is the way to do it. You can have quite a useful debugging tool in less than 10 kilobytes, which is really lightweight and it leaves almost no footprint in the system. One final tip: if you have a lot of time you can write your tools using FASM assembler. FASM is a great tool, but unfortunately it is missing some headers and definitions from Windows SDK so you have to write them yourself. While personally I prefer FASM, I must admit that MASM32 is better suited for this task if you need to dive in quickly. MASM32 has all the headers you will need.

TOPSTACK method

Since we are talking about vulnerabilities it is reasonable to take a look at shellcode. Due to the Windows architecture, when executing ring 3 shellcode the attacker needs to know the address of at least two functions inside kernel32.dll: LoadLibrary and GetProcAddress. With those two addresses he is able to locate any other function address he needs inside the shellcode. As we are talking about a targeted attack one could argue that, thanks to the ‘nmap -A’ switch, the attacker will know exactly what system version he is attacking. Thanks to this information he will be able to hard code all the addresses for the functions he needs to call from his shellcode. However, even in the case of targeted attacks, attackers still look for reliability (reliability is more important in targeted attacks than in the old script-kiddie-style attack when trying to exploit a few thousand hosts). One of the most reliable methods of finding the LoadLibrary/GetProcAddress function addresses is a method called topstack. This is a relatively new method, so I believe it is worth describing.

TOPSTACK is a method of finding kernel32.dll in memory. We need it to get the addresses of LoadLibrary and GetProcAddress so that we can use those functions later to get the addresses of other Windows API functions required by our shellcode:

  xor eax, eax
  mov eax, fs:[eax + 18h] ;get TEB address
  mov esi, eax            ;store it at ESI register
  lodsd                   ;add 4 to ESI
  lodsd                   ;grab the top of stack
  mov eax, [eax – 1Ch]    ;this pointer is address inside kernel32.dll
loop:
  dec eax                 ;scan memory at 64kb boundary
  xor ax, ax
  cmp word ptr [eax], 5A4Dh   ;check for MZ signature (start of PE file)
  jnz loop                ;nope – search further

The TOPSTACK method has several advantages:

It can occupy around 25 bytes of memory.
It works on NT, 2000, XP and 2003.
It works reliably, thanks to its simplicity.
The example shown above is free from bad bytes, so it can be used right away.

Actually, the example above can be optimized further – but I will leave that as an exercise for the reader (as a tip, take a look at how the ESI register is being used). To understand fully how it works we need to take a look at two Windows structures: TEB (Thread Environment Block) and TIB (nt!_NT_TIB for those using WinDbg). TEB is always located at address fs:0 and its layout is as follows:

lkd> dt nt!_TEB
   +0x000 NtTib                     : _NT_TIB
   +0x01c EnvironmentPointer        : Ptr32 Void
   +0x020 ClientId                  : _CLIENT_ID
   +0x028 ActiveRpcHandle           : Ptr32 Void
   +0x02c ThreadLocalStoragePointer : Ptr32 Void
   +0x030 ProcessEnvironmentBlock   : Ptr32 _PEB

Please note that we are talking about 32-bit systems – on x64 the _NT_TIB structure ends at address 38h and PEB is located at 60h. Now let’s take a look at _NT_TIB:

lkd> dt nt!_NT_TIB
  +0x000 ExceptionList  : Ptr32 _EXCEPTION_REGISTRATION_RECORD
  +0x004 StackBase      : Ptr32 Void
  +0x008 StackLimit     : Ptr32 Void
  +0x00c SubSystemTib   : Ptr32 Void
  +0x010 FiberData      : Ptr32 Void
  +0x010 Version        : Uint4B
  +0x014 ArbitraryUserPointer : Ptr32 Void
  +0x018 Self           : Ptr32 _NT_TIB

As you can see it starts with an exception record – this is why the SEH handler is installed using the mov fs:[0] instruction. At offset +4 we have a pointer to the stack base which we will use in our method. Using the top of the stack and going down 1Ch bytes we find an address that lies somewhere inside kernel32.dll.

After finding the start of kernel32.dll we just need to extract data from the export table, and voilà! We can start calling all Windows API functions.

The final scene

With all the tools and methods presented here, an attacker would be able to perform a successful targeted attack against most custom business applications. Of course, the aim of this article was not to educate the attacker but to provide readers with tools for auditing closed-source Windows services. We cannot afford to forget about the building blocks of our infrastructure because it leads to exploitable vulnerabilities. It also leads to a loss of compliance and in the world of Sarbanes-Oxley, PCI and BASEL II this could mean financial losses that are more significant than the consequences of an attack itself.

To be prepared for an attack you need to think like the attacker. Penetration testing strengthened by an application audit is a wise investment.

Bibliography

[1] Falliere, N. Windows Anti-Debug Reference, SecurityFocus. http://www.securityfocus.com/infocus/1893.

[2] http://msdn2.microsoft.com/en-us/library/ms686001(VS.85).aspx.

[3] http://msdn2.microsoft.com/en-us/library/ms682450(VS.85).aspx.

[4] Cerrudo, C. Practical 10 Minutes Security Audit Oracle Case. http://www.blackhat.com/presentations/bh-dc-07/Cerrudo/Presentation/bh-dc-07-Cerrudo-ppt.pdf.

[5] http://paimei.googlecode.com.

Latest articles:

Nexus Android banking botnet – compromising C&C panels and dissecting mobile AppInjects

Aditya Sood & Rohit Bansal provide details of a security vulnerability in the Nexus Android botnet C&C panel that was exploited to compromise the C&C panel in order to gather threat intelligence, and present a model of mobile AppInjects.

Cryptojacking on the fly: TeamTNT using NVIDIA drivers to mine cryptocurrency

TeamTNT is known for attacking insecure and vulnerable Kubernetes deployments in order to infiltrate organizations’ dedicated environments and transform them into attack launchpads. In this article Aditya Sood presents a new module introduced by…

Collector-stealer: a Russian origin credential and information extractor

Collector-stealer, a piece of malware of Russian origin, is heavily used on the Internet to exfiltrate sensitive data from end-user systems and store it in its C&C panels. In this article, researchers Aditya K Sood and Rohit Chaturvedi present a 360…

Fighting Fire with Fire

In 1989, Joe Wells encountered his first virus: Jerusalem. He disassembled the virus, and from that moment onward, was intrigued by the properties of these small pieces of self-replicating code. Joe Wells was an expert on computer viruses, was partly…

Run your malicious VBA macros anywhere!

Kurt Natvig wanted to understand whether it’s possible to recompile VBA macros to another language, which could then easily be ‘run’ on any gateway, thus revealing a sample’s true nature in a safe manner. In this article he explains how he recompiled…

Bulletin Archive