Unpacking x64 PE+ binaries part 2: using WinDbg

2012-09-01

Aleksander P. Czarnowski

AVET, Poland

Editor: Helen Martin

Abstract

In the first part of this tutorial series Aleksander Czarnowski described some fundamental differences between the 32- and 64-bit Windows PE+ file format and looked at using the Bochs IDA plug-in to find the original entry point of a file. In part two, he describes using WinDbg and demonstrates a different approach to the unpacking process.

Table of contents


Setting up WinDbg
WinDbg as an unpacking tool
Variations and other unpacking strategies
Setting an entry point breakpoint – the easy way
A quick peek into 64-bit IsDebuggerPresent
WinDbg: debugging WOW64
Note on DLL injection
Summary

In the first part of this tutorial series (see VB, July 2012, p.11) I described some fundamental differences between the 32- and 64-bit Windows PE+ file format. In that article, we looked at using the Bochs IDA plug-in to find the original entry point of a file. In this article I will describe using WinDbg and demonstrate a different approach to the unpacking process.

Setting up WinDbg

Before we start our analysis the first thing we need to do is to install WinDbg in the form of the Microsoft Windows Debugging Tools package [1]. Keep in mind that debugging tools are available for 32- and 64-bit platforms. You can install both on the same host, but for the rest of this tutorial we will be using the x64 version only.

After installing WinDbg, the next thing we need to do is set up a symbol server – this will be handy when we step over system DLLs and other Windows components. In order to do this, enter the following line into the ‘Symbol file path’ window (File->Symbol file path or use the Ctrl+S shortcut):

srv*DownstreamStore*http://msdl.microsoft.com/download/symbols;srv*

In case of problems with symbols you can always reload them using the .reload command (note the dot preceding the command). Since we are using a remote, public symbol store provided by Microsoft, our host needs an Internet connection. In the case of a real lab this requirement may be impossible to meet. In such a case you need to download the symbols and enter the path to the directory into which you have downloaded them.

WinDbg as an unpacking tool

WinDbg is definitively not user-friendly, and the more time you spend learning IDA’s quirks the more frustrating it will be to work around WinDbg-specific behaviour. Unfortunately, IDA’s built-in native debugger can’t handle ring 0 code yet, meaning, for example, that unpacking kernel drivers dynamically is not possible. In short: there are times when you might be forced to switch from your favourite tool to WinDbg (unless WinDbg is your favourite tool, which is something to be proud of I guess). In terms of unpacking 32-bit PE files, WinDbg has one important advantage over other tools like IDA and especially over OllyDbg/Immunity Debugger – it is targeted a lot less by malware authors than the others. Many anti-Olly debugging tricks do not work under WinDbg, and I’m aware of at least a few cases when people fed up with having to bypass numerous protection/anti-debugging/obfuscation layers have switched from Olly to WinDbg and found that arriving at the original entry point was very swift.

After this short introduction let’s get back to work. Let’s assume for a moment that our packed test executable is still unknown to us and we don’t know anything about the tools being used to compress it. In order to proceed with our analysis we need to open the executable file (File->Open Executable or Ctrl+E). This will trigger the loading of symbols and all modules required by the executable, and the initial breakpoint will be hit at:

ntdll!LdrpDoDebuggerBreak+0x30:
00000000`7746cb60 cc         int   3

This breakpoint is always set by WinDbg by default. While sometimes such behaviour might be handy, in our case it is a bit useless since we need to break at our process entry point and not somewhere in ntdll.

There are two methods we can use to achieve this: the long and more complicated one, and the short one. I’ll start with the longer one since it teaches us a bit about Windows operating system structures and the way they are used by the operating system when loading executable modules into memory.

The ‘!peb’ command displays the debugged Process Environment Block (PEB) [2], [3] (unless you specify an address as a command argument). The result of the ‘!peb’ command is shown in Figure 2. Note that from the PEB information we can learn:

The ImageBaseAddress value – we will need this to find the current entry point. Keep in mind that the current entry point has nothing to do with the original entry point.
That the BeingDebugged flag is set, signalling that the process is being debugged. This is the same flag as the kernel32 IsDebuggerPresent() function is checking.

Figure 1. Initial break after opening the executable.

Figure 2. Using the ‘!peb’ command to find the ImageBaseAddress value.

Since we have the ImageBaseAddress (0000000000400000 in our case) we have to find the second part of the process entry point address. This reflects the PE+ header where the entry point address is calculated by adding the values of two fields:

ImageBase (eight bytes in the case of PE+)
AddressOfEntryPoint (four bytes both for PE and PE+). In [4], the AddressOfEntryPoint is described as: ‘The address of the entry point relative to the image base when the executable file is loaded into memory. For program images, this is the starting address. For device drivers, this is the address of the initialization function. An entry point is optional for DLLs. When no entry point is present, this field must be zero.’

Since our file is not a device driver, AddressOfEntryPoint will point at the entry point (keep in mind that the instruction at the entry point might not be the first to be executed when running the executable module due to the functionality of TLS callbacks).

Now we need to find the value of the AddressOfEntryPoint field. Unfortunately it is not available in the PEB information. However, we can use the ‘!dh’ command (note that all commands starting with an exclamation mark in WinDbg are in fact extensions) to display it. The ‘!dh’ command requires the base address of the image we want to parse. Fortunately, the PEB has given us this information. So we issue the command:

!dh 0000000000400000

The output not only reveals the address we are looking for (see Figure 3), but also shows us information about the file sections (Figure 4).

Figure 3. Finding the entry point address with the ‘!dh’ command.

Figure 4. Section information gathered from ‘!dh’ output.

Now we can add the ImageBaseAddress and AddressOfEntryPoint values to find out the final address we were looking for:

ImageBaseAddress + AddressOfEntryPoint = Entry Point
0000000000400000 + 40C2 = 00000000004040C2

This allows us to set a breakpoint at 00000000004040C2 using the ‘bp’ command:

bp 00000000004040C2

Now we can finally run our target application using the ‘g’ command – see Figure 5.

Figure 5. Setting a breakpoint at the entry point.

If you disassemble the rest of the entry point Prolog code using the ‘u’ command you will find out that the code starts by pushing different registers onto the stack. Using this fact we can assume that the RDI register and other registers can be restored before jumping to the original entry point. If this assumption is correct, we can use a completely different strategy from that described in the previous part of this tutorial. Instead of tracing system calls we could set up a breakpoint on accessing the stack, hoping that right before transferring execution to the original entry point, this location will be read. Such tactics will allow us to completely bypass the decompression loop and import table address fixing loops. Since WinDbg does not provide anything similar to the IDA uunp plug-in out of the box, we might take our chances with the described approach since we don’t need any of the information we gathered in the previous tutorial. Now we can use the ‘t’ command, which stands for trace (but we could also use the ‘p’ command (step) in this case) to execute the ‘PUSH RDI’ instruction.

Now we need to enter the breakpoint on memory access – this is what the ‘ba’ (break on access) command stands for. As with many other commands its syntax is rather cryptic at first sight:

ba r 8 rsp

The ‘r’ stands for read access. Other possible access types are:

e – execution (the process retrieves an opcode from the address)
w – write
i – i/o access

The second argument is the size of a region which, in case of access, will trigger the breakpoint. Since all 64-bit registers occupy eight bytes, we are interested in eight bytes on the stack.

The third and final argument in our example is the address of the breakpoint. Since we have a stream of PUSH [reg] instructions starting at 0x04040c2, we can put our breakpoint at the current stack pointer (which has been modified by the PUSH RDI instruction we’ve just executed with the trace command). Obviously the current stack pointer address is kept in the RSP register. If you like you can pass the memory address instead of the register value.

Now we just need to execute our target with the ‘g’ command – the following is the listing of the whole of our session:

Breakpoint 1 hit
image00000000_00400000+0x40c2:
00000000`004040c2 57   push rdi
0:000> u
image00000000_00400000+0x40c2:
00000000`004040c2 57   push  rdi
00000000`004040c3 56   push  rsi
00000000`004040c4 53   push  rbx
00000000`004040c5 51   push  rcx
00000000`004040c6 52   push  rdx
00000000`004040c7 4150 push  r8
00000000`004040c9 488d054f030000  lea     rax,[image00000000_00400000+0x441f (00000000`0040441f)]
00000000`004040d0 488b30     mov   rsi,qword ptr [rax]
0:000> t
image00000000_00400000+0x40c3:
00000000`004040c3 56   push  rsi
0:000> ba r 8 rsp
0:000> g

Breakpoint 2 hit
image00000000_00400000+0x1180:
00000000`00401180 e97b0e0000 jmp   image00000000_00400000+0x2000 (00000000`00402000)
0:000> t
image00000000_00400000+0x2000:
00000000`00402000 4883ec28   sub   rsp,28h
0:000> u
image00000000_00400000+0x2000:
00000000`00402000 4883ec28         sub   rsp,28h
00000000`00402004 41b900000000     mov   r9d,0
00000000`0040200a 49c7c000104000   mov   r8,offset image00000000_00400000+0x1000 (00000000`00401000)
00000000`00402011 48c7c20e104000   mov   rdx,offset image00000000_00400000+0x100e (00000000`0040100e)
00000000`00402018 4831c9           xor   rcx,rcx
00000000`0040201b ff152b100000     call  qword ptr [image00000000_00400000+0x304c (00000000`0040304c)]
00000000`00402021 89c1             mov   ecx,eax
00000000`00402023 ff1513100000     call  qword ptr [image00000000_00400000+0x303c (00000000`0040303c)]

As you can see, when breakpoint 2 is hit the RIP point at 0x401180 contains a suspicious JMP instruction:

00000000`00401180 e97b0e0000 jmp   image00000000_00400000+0x2000 (00000000`00402000)

This looks like an execution flow transfer to the original entry point (OEP). You can now open the disassembly window (Figure 6) with the View->Disassembly option. You can see that before our jump instruction there is a stream of POP opcodes which is a hint that default register values are being restored. After the jump we definitely see some garbage code. Compilers usually lay code out in some order and are not trying to waste space (unless we are talking about DEBUG type compilation for example).

Figure 6. Disassembly window showing execution transfer to the original entry point (OEP).

Let’s execute this jump with the t(race) command and disassemble the code at the new RIP location using the ‘u’ command just as shown in the session above. Since our code starts with the SUB RSP instruction this might be a hint that we are in fact at the original entry point. Further code inspection and execution confirms this.

Variations and other unpacking strategies

Setting a breakpoint on the first stack access seems reasonable since after unpacking, the stack must be restored to its initial state, just like used registers. However, it might be a wise strategy to set up breakpoints after the next PUSH and in consequence on a lower RSP value or even set a breakpoint for the whole stack memory region used by storing the initial register values at the beginning of our packed file.

Coming back to the first part of this tutorial we used a completely different approach with IDA: setting breakpoints on some crucial API functions that were in the Import Table. The same method can be applied when unpacking with WinDbg. Unfortunately, WinDbg does not automatically create nice table views of the import and export directories of PE+ files like IDA does. You can inspect those tables with WinDbg, obviously, but as always it requires a bit of additional work and poking around in process memory. IDA does it automatically in most cases. However, nothing stops us from using the function breakpoint method described in the first part with WinDbg. Furthermore, if you are willing to reconstruct the IAT you will need to analyse the unpacking process more carefully. Right now we’ve just found the OEP.

In order to set breakpoints on APIs let’s restart our target application and set a breakpoint on the entry point. When the breakpoint is being hit we can add breakpoints for Windows API functions like GetProcessAddress and GetModuleHandleA (those functions are in the import table). Use the ‘bp’ command to set normal breakpoints on those functions:

Breakpoint 1 hit
image00000000_00400000+0x40c2:
00000000`004040c2 57         push  rdi
0:000> bp kernelbase!getprocaddress
0:000> bp kernelbase!getmodulehandlea
0:000> g

Breakpoint 3 hit
KERNELBASE!GetModuleHandleA:
000007fe`fdb831c0 4883ec48   sub   rsp,48h
0:000> kp
Child-SP   RetAddr           Call Site
00000000`0006fef8 00000000`004010b3 KERNELBASE!GetModuleHandleA
00000000`0006ff00 00000000`00000000 image00000000_00400000+0x10b3
0:000> u image00000000_00400000+0x10b3
image00000000_00400000+0x10b3:
00000000`004010b3 480bc0           or    rax,rax
00000000`004010b6 744b       je    image00000000_00400000+0x1103 (00000000`00401103)
00000000`004010b8 e80f000000 call  image00000000_00400000+0x10cc (00000000`004010cc)
00000000`004010bd 56         push  rsi
00000000`004010be 69727475616c50   imul  esi,dword ptr [rdx+74h],506C6175h
00000000`004010c5 726f       jb    image00000000_00400000+0x1136 (00000000`00401136)
00000000`004010c7 7465       je    image00000000_00400000+0x112e (00000000`0040112e)
00000000`004010c9 6374005a   movsxd      esi,dword ptr [rax+rax+5Ah]
0:000> bp image00000000_00400000+0x10b3
0:000> g

Breakpoint 4 hit
image00000000_00400000+0x10b3:
00000000`004010b3 480bc0           or    rax,rax
0:000> t
image00000000_00400000+0x10b6:
00000000`004010b6 744b       je    image00000000_00400000+0x1103 (00000000`00401103) [br=0]
0:000> t
image00000000_00400000+0x10b8:
00000000`004010b8 e80f000000 call  image00000000_00400000+0x10cc (00000000`004010cc)
0:000> t
image00000000_00400000+0x10cc:
00000000`004010cc 5a         pop   dx
0:000> t
image00000000_00400000+0x10cd:
00000000`004010cd 50         push  rax
0:000> t
image00000000_00400000+0x10ce:
00000000`004010ce 59         pop   rcx
0:000> t
image00000000_00400000+0x10cf:
00000000`004010cf e8bf000000 call  image00000000_00400000+0x1193 (00000000`00401193)
0:000> t
image00000000_00400000+0x1193:
00000000`00401193 ff25ab2e0000     jmp   qword ptr [image00000000_00400000+0x4044 (00000000`00404044)] 
ds:00000000`00404044={kernel32!GetProcAddressStub (00000000`76d03630)}
0:000> t
kernel32!GetProcAddressStub:
00000000`76d03630 eb06       jmp   kernel32!GetProcAddress (00000000`76d03638)
0:000> t
kernel32!GetProcAddress:
00000000`76d03638 ff250aa10700     jmp   qword ptr [kernel32!_imp_GetProcAddress (00000000`76d7d748)] 
ds:00000000`76d7d748={KERNELBASE!GetProcAddress (000007fe`fdb830e0)}
0:000> t

Breakpoint 2 hit
KERNELBASE!GetProcAddress:
000007fe`fdb830e0 48895c2408 mov   qword ptr [rsp+8],rbx ss:00000000`0006ff00={kernel32!BasepSxsCreateResourceStream <PERF> (kernel32+0x0) (00000000`76ce0000)}

Please note that we are using kernelbase as the first part of a symbol’s name. The first hit is with GetModuleHandleA. We inspect the call stack using the ‘kp’ command and set a breakpoint on the returning point from the API function. Again, we run the code and when our new breakpoint is hit, trace the unpacking process. You can quickly see how GetProcAddress is called. The rest of the process is the same as when using IDA.

Setting an entry point breakpoint – the easy way

In the previous section we had to do some manual work to make WinDbg stop at the process entry point. However, there is a much easier way to achieve the same thing, although it is buried deep within the user-unfriendly WinDbg documentation.

If we go back to our initial breakpoint screen set-up by WinDbg you will notice the following line:

ModLoad: 00000000`00400000 00000000`00405000   image00000000`00400000

We can use the image00000000`00400000 symbol as an argument to the $iment operator which is a leftover from MASM syntax (which for some time was the only syntax available in the Debugging Tools package). The $iment operator returns the address of the image entry point in the loaded module list and can be used when setting up breakpoints like this:

bp $iment(image00000000`00400000)

Now you can execute the module with the ‘g’ command and execution will stop at the entry point (00000000004040C2 in our case).

A quick peek into 64-bit IsDebuggerPresent

One of the PEB flags informs the process if it is being debugged. The same field is checked by the IsDebuggerPresent() Windows API function. Both flag and API function have been abused in the past to detect or to hide the presence of a debugger. Since this is such an important function and it accesses a crucial operating system structure, it is worth looking at – here is a disassembly generated with WinDbg:

KERNELBASE!IsDebuggerPresent:
000007fe`fdb8aee0 65488b042530000000     mov   rax,qword ptr gs:[30h]
000007fe`fdb8aee9 488b4860    mov  rcx,qword ptr [rax+60h]
000007fe`fdb8aeed 0fb64102    movzx      eax,byte ptr [rcx+2]
000007fe`fdb8aef1 c3          ret
000007fe`fdb8aef2 90          nop
000007fe`fdb8aef3 90          nop
000007fe`fdb8aef4 90          nop
000007fe`fdb8aef5 90          nop

First of all you may notice the symbol KERNELBASE, not kernel32 – this is the first difference from 32-bit Windows. Secondly, all the registers are 64-bit length. Furthermore, on x64 the GS register points to the Thread Environment Block (TEB, also called TIB), while on 32-bit Windows it is the role of the FS register, and GS is set to zero. Keep in mind, however, that in the case of applications running under the WOW64 layer, FS and GS registers behave just like they do on the regular 32-bit platform. TEB contains several pointers to other interesting and crucial Windows structures like PEB. WinDbg can be a great tool, enabling you to delve into core Windows structures. To get a peek into the TEB and find where the PEB pointer is located within the TEB we can use the ‘dt’ (display type) command:

0:000> dt _TEB @$teb
ntdll!_TEB
+0x000 NtTib      : _NT_TIB
+0x038 EnvironmentPointer : (null) 
+0x040 ClientId       : _CLIENT_ID
+0x050 ActiveRpcHandle      : (null) 
+0x058 ThreadLocalStoragePointer : 0x000007ff`fffde058 Void
+0x060 ProcessEnvironmentBlock : 0x000007ff`fffdb000 _PEB
+0x068 LastErrorValue       : 0x36b7

When debugging WinDbg set-ups, several virtual registers are encountered, including $teb and $peb which point to TEB and PEB respectively. We’ve used the $teb register for the display type command to inspect the current TEB. As you can see the 0x60 offset is a PEB pointer. This confirms the IsDebuggerPresent disassembly:

000007fe`fdb8aee9 488b4860   mov   rcx,qword ptr [rax+60h]

Now let’s use the $peb register to inspect its content:

0:000> dt _PEB @$peb
ntdll!_PEB
  +0x000 InheritedAddressSpace : 0 ‘’
  +0x001 ReadImageFileExecOptions : 0 ‘’
    +0x002 BeingDebugged          : 0x1 ‘’
  +0x003 BitField                 : 0 ‘’
  +0x003 ImageUsesLargePages  : 0y0
  +0x003 IsProtectedProcess   : 0y0

As you can see, the field at offset 0x02 is BeingDebugged and unsurprisingly it is set to 1. Again, this is in line with our IsDebuggerPresent disassembly:

000007fe`fdb8aeed 0fb64102   movzx eax,byte ptr [rcx+2]

If you take a peek at the 32-bit version of IsDebuggerPresent you will find out that while the code is different, the algorithm is exactly the same.

Now you not only know about some important differences between 64- and 32-bit Windows, but also you know how to find other differences yourself.

WinDbg: debugging WOW64

While Microsoft Debugging Tools for Windows comes in 32- and 64-bit flavours it is possible to debug a WOW64 application using the x64 edition of WinDbg. Microsoft provides the Wow64exts debugger extension that can be loaded from 64-bit WinDbg using the ‘!load wow64exts’ command. This extension provides several new commands:

!wow64exts.sw – switches between native and 32-bit (x86 in MS nomenclature) modes
!wow64exts.k [count] – dumps a combined 32-/64-bit stack trace
!wow64exts.info – dumps basic information about the PEB and current thread TEB plus TLS slots used by WOW64
!wow64exts.r [address] – dumps context for the specified address. If no address is specified then the context of the CPU will be dumped.

If you are willing to use both the 32- and 64-bit edition of MS Debugging Tools under an x64 system, remember that 32-bit tools are not able to disassemble and set breakpoints within the WOW64 thunk layer since this is 64-bit code.

Note on DLL injection

DLL injection can be used both by malware and reverse engineers during code analysis in some cases. It can be useful in some advanced unpacking techniques. The SetWindowsHookEx() function is available in 64-bit mode, however you have to remember that the DLL to be injected must be for the same mode as the process you are trying to inject the library into. In short: you can inject a 64-bit (native) DLL into a native process and a 32-bit DLL into an x86 process. This also means that 32-bit and 64-bit DLLs must have different names.

Summary

As it turns out, it’s not always about the tool we use but how well we understand the inner workings and how well we can handle and exploit our toolset capabilities. Learning to use another tool for a job which we handle perfectly well with a different one can only be an advantage. Sometimes correlating results from different tools can provide very useful information. Of course, none of the methods presented here scale well. They are all only suitable for a manual unpacking process.

Bibliography

[1] Microsoft Windows Debugging Tools. http://msdn.microsoft.com/en-us/windows/hardware/gg463009.aspx.

[2] MSDN PEB. http://msdn.microsoft.com/en-us/library/windows/desktop/aa813706(v=vs.85).aspx.

[3] PEB. http://en.wikipedia.org/wiki/Process_Environment_Block.

[4] Microsoft Portable Executable and Common Object File Format Specification. http://msdn.microsoft.com/en-us/library/windows/hardware/gg463119.aspx.

Latest articles:

Nexus Android banking botnet – compromising C&C panels and dissecting mobile AppInjects

Aditya Sood & Rohit Bansal provide details of a security vulnerability in the Nexus Android botnet C&C panel that was exploited to compromise the C&C panel in order to gather threat intelligence, and present a model of mobile AppInjects.

Cryptojacking on the fly: TeamTNT using NVIDIA drivers to mine cryptocurrency

TeamTNT is known for attacking insecure and vulnerable Kubernetes deployments in order to infiltrate organizations’ dedicated environments and transform them into attack launchpads. In this article Aditya Sood presents a new module introduced by…

Collector-stealer: a Russian origin credential and information extractor

Collector-stealer, a piece of malware of Russian origin, is heavily used on the Internet to exfiltrate sensitive data from end-user systems and store it in its C&C panels. In this article, researchers Aditya K Sood and Rohit Chaturvedi present a 360…

Fighting Fire with Fire

In 1989, Joe Wells encountered his first virus: Jerusalem. He disassembled the virus, and from that moment onward, was intrigued by the properties of these small pieces of self-replicating code. Joe Wells was an expert on computer viruses, was partly…

Run your malicious VBA macros anywhere!

Kurt Natvig wanted to understand whether it’s possible to recompile VBA macros to another language, which could then easily be ‘run’ on any gateway, thus revealing a sample’s true nature in a safe manner. In this article he explains how he recompiled…

Bulletin Archive