Anti-unpacker tricks – part two

2009-01-01

Peter Ferrie

Microsoft, USA

Editor: Helen Martin

Abstract

New anti-unpacking tricks continue to be developed because the older ones are constantly being defeated. Peter Ferrie describe some tricks that might become common in the future, along with some countermeasures.

Table of contents

Introduction

Anti-unpacking by anti-debugging

Heap flags
Special APIs
Hardware tricks
Process tricks

In the first part of this series last month (see VB, December 2008, p.4) we looked at a number of anti-unpacking tricks that have come to light recently. New anti-unpacking tricks continue to be developed because the older ones are constantly being defeated. In this article and the ones that follow, we will describe some tricks that might become common in the future, along with some countermeasures.

Introduction

Anti-unpacking tricks come in different forms, depending on what kind of unpacker they are intended to attack. The unpacker can be in the form of a memory-dumper, a debugger, an emulator, a code-buffer, or a W-X interceptor. It can also be a tool in a virtual machine. There are corresponding tricks for each of these.

A memory-dumper dumps the process memory of the running process without regard to the code inside it.
A debugger attaches to the process, allowing single-stepping, or the placing of breakpoints at key locations, in order to stop execution at the right place. The process can then be dumped with more precision than a memory-dumper alone.
An emulator, as referred to within this paper, is a purely software-based environment, most commonly used by anti-malware software. It places the file to execute inside the environment and watches the execution for particular events of interest.
A code-buffer is similar to a debugger. It also attaches to a process, but instead of executing instructions in place, it copies each instruction into a private buffer and executes it from there. It allows fine-grained control over execution as a result. It is also more transparent than a debugger, and faster than an emulator.
A W-X interceptor uses page-level tricks to watch for write-then-execute sequences. Typically, an executable region is marked as read-only and executable, and then everything else is marked as read-only and non-executable (or simply non-present, depending on the hardware capabilities). Then the code is allowed to execute freely. The interceptor intercepts exceptions that are triggered by writes to read-only pages, or execution from non-executable or non-present pages. If the hardware supports it, a read-only page will be replaced by a writable but non-executable page, and then the write will be allowed to continue. Otherwise, the single-step exception will be used to allow the write to complete, after which the page will be restored to its non-present state. In either case, the page address is kept in a list. In the event of exceptions triggered by execution of non-executable or non-present pages, the page address is compared to the entries in that list. A match indicates the execution of newly written code, and is a possible host entry point.

Now we move to potentially new tricks. All of these techniques were discovered and developed by the author of this paper. This article will concentrate on anti-debugging tricks.

Anti-unpacking by anti-debugging

Heap flags

Within the heap are two fields of interest. The PEB->NtGlobalFlag field forms the basis for the values in those fields. It should be noted that the HEAP_VALIDATE_PARAMETERS_ENABLED flag value was changed in Windows XP and later, from 0x200000 to 0x40000000, and that a new NtGlobalFlag flag 0x80 (FLG_HEAP_VALIDATE_ALL) was introduced (which corresponds to the HEAP_VALIDATE_ALL_ENABLED flag). Further, the location of the Flags and ForceFlags fields is different in Windows Vista. No current packer supports the new location, which is the reason why some packers will not run on Windows Vista.

Example code for Windows Vista looks like this:

mov eax, fs:[30h] ;PEB
;get process heap base
mov eax, [eax+18h]
mov eax, [eax+40h] ;Flags
dec eax
dec eax
jne being_debugged

and this:

mov eax, fs:[30h] ;PEB
;get process heap base
mov eax, [eax+18h]
cmp d [eax+44h], 0 ;ForceFlags
jne being_debugged

Special APIs

CreateFile

The kernel32 CreateFile() function can be used to open a file for exclusive access. This technique is not new in general, but it is new with respect to debugger detection techniques.

Example code looks like this:

  xor   ebx, ebx
  mov   ebp, offset l1
  push  104h ;MAX_PATH
  push  ebp
  push  ebx ;self filename
  call  GetModuleFileNameA
  push  ebx
  push  ebx
  push  3 ;OPEN_EXISTING
  push  ebx
  push  ebx
  push  80000000h ;GENERIC_READ
  push  ebp
  call  CreateFileA
  inc eax
  je  being_debugged
  ...
l1: db  104h dup (?) ;MAX_PATH

This technique works against the debugger Turbo Debug32, but not debuggers such as OllyDbg and WinDbg. It is related to the debug privilege, which debuggers such as OllyDbg and WinDbg maintain, while Turbo Debug32 does not.

RaiseException

The kernel32 RaiseException() function can be used to force certain exceptions to occur. These include exceptions that a debugger would normally consume.

Turbo Debug32 consumes the following exceptions:

0x40010005 (DBG_CONTROL_C)
0x40010007 (DBG_RIPEVENT)
0x80000002 (DATATYPE_MISALIGNMENT)
0x80000003 (BREAKPOINT)
0x80000004 (SINGLE_STEP)
0x80000029 (UNWIND_CONSOLIDATE)
0xC0000005 (ACCESS_VIOLATION)
0xC000008C (ARRAY_BOUNDS_EXCEEDED)
0xC000008D (FLOAT_DENORMAL_OPERAND)
0xC000008E (FLOAT_DIVIDE_BY_ZERO)
0xC000008F (FLOAT_INEXACT_RESULT)
0xC0000090 (FLOAT_INVALID_OPER)
0xC0000091 (FLOAT_OVERFLOW)
0xC0000092 (FLOAT_STACK_CHECK)
0xC0000093 (FLOAT_UNDERFLOW)
0xC0000094 (INTEGER_DIVIDE_BY_ZERO)
0xC0000095 (INTEGER_OVERFLOW)
0xC0000096 (PRIVILEGED_INSTRUCTION)

When raised in the presence of Turbo Debug32, none of these exceptions will be delivered to the debuggee. The missing exception can be used to infer the presence of Turbo Debug32.

Example code looks like this:

  xor   eax, eax
  push  offset l1
  push  d fs:[eax]
  mov   fs:[eax], esp
  push  eax
  push  eax
  push  eax
  ;DBG_CONTROL_C
  push  40010005h
  call RaiseException
  jmp being_debugged
l1: ...

By default, OllyDbg will consume a similar list of exceptions, but it can be configured to pass them to the debuggee.

The Interactive DisAssembler (IDA) debugger consumes the following exceptions:

0x40010006 (DBG_PRINTEXCEPTION_C)
0x40010007 (DBG_RIPEVENT)
0x80000003 (BREAKPOINT)

It is known that WinDbg consumes the DBG_PRINTEXCEPTION_C (0x40010006) exception, though this fact is used only rarely. However, WinDbg also consumes the following exceptions:

0x40000005 (SEGMENT_NOTIFICATION)
0x40010005 (DBG_CONTROL_C)
0x40010007 (DBG_RIPEVENT)
0x40010008 (DBG_CONTROL_BREAK)
0x40010009 (DBG_COMMAND_EXCEPTION)
0x80000001 (GUARD_PAGE_VIOLATION)
0xC0000420 (ASSERTION_FAILURE)

The SEGMENT_NOTIFICATION (0x40000005) exception is of particular interest, since it can be used to demonstrate several behaviours. One of these behaviours is to force a break into the VDM debugger prompt.

Example code looks like this:

  push offset l1
  push 4
  push 0  
 ;EXCEPTION_SEGMENT_NOTIFICATION
  push 40000005h
  call RaiseException
  ...
l1: dd  0c0000002h, 0
  dd offset l1, offset l1
  dd 0, 0, offset l1
  db 2b0h dup (0)

Another of the behaviours is to cause the debugger to remove a breakpoint from the specified location in the debuggee’s process memory.

Example code looks like this:

  push offset l4
  push 4
  push 0
  ;EXCEPTION_SEGMENT_NOTIFICATION
  push 40000005h
  call RaiseException
  push offset l5
  push 1
  push 0
  ;EXCEPTION_SEGMENT_NOTIFICATION
  push 40000005h
  ;remove breakpoint
  call RaiseException
l1: mov    al, 0cch
  ...
l2: dd     0
l3: dd     offset l7
l4: dd     2 ;dummy context request
l5: dd     6, offset l2, offset l3
  dd 0, offset l2
l6: db     3, 90h ;replacement value
  db 0ah dup (0)
l7: dw     0
  db offset l1 + 1
  db (offset l1 + 1) shr 8
  db (offset l1 + 1) shr 10h
  dw 0
  db (offset l1 + 1) shr 18h
  dd 0, offset l6
  db 7ch dup (0)
  dw 1
  db 8 dup (0), 1, 209h dup (0)

In this case, the value in AL at l1 is altered from 0xCC to 0x90.

In Windows Vista, there are two new exceptions. They are EXCEPTION_WX86_SINGLE_STEP (0x4000001E) and EXCEPTION_WX86_BREAKPOINT (0x4000001F). As their names imply, they are the x86 equivalents of EXCEPTION_BREAKPOINT (0x80000003) and EXCEPTION_SINGLE_STEP (0x80000004). When a single-step or breakpoint occurs in 32-bit mode, these new exceptions are raised instead of the old ones. If a debugger does not handle them, then the kernel translates them to the old values and dispatches them again. In either case, they will be consumed by the debugger if that was the previous behaviour.

DbgBreakPoint

The ntdll DbgBreakPoint() function is called when a debugger attaches to a process that is already running. This allows the debugger to gain control because an exception is raised that it can intercept. This technique can be defeated simply by erasing the breakpoint.

Example code looks like this:

  push offset l1
  call GetModuleHandleA
  push offset l2
  push eax
  call GetProcAddress
  push eax
  push esp
  push 40h ;PAGE_EXECUTE_READWRITE
  push 1
  push eax
  xchg ebx, eax
  call VirtualProtect
  mov   byte ptr [ebx], 0c3h
  ...
l1: db  “ntdll”, 0
l2: db  “DbgBreakPoint”, 0

If a debugger attempts to attach to a process that contains such a change, then the thread will exit immediately, and the debugger will not break in. Turbo Debug32, and possibly other console-mode debuggers, will hang as a result, because they wait infinitely for an exception to be raised in order to continue execution.

OutputDebugString

Despite the fact that the kernel32 OutputDebugString() function raises the DBG_PRINTEXCEPTION_C (0x40010006) exception, a registered Structured Exception Handler will not see it. The reason is that Windows registers its own Structured Exception Handler internally, which consumes the exception if a debugger does not do so. As such, the presence of a debugger that consumes the exception cannot be inferred by the absence of the exception.

However, in Windows XP and later, any registered Vectored Exception Handler will run before the Structured Exception Handler that Windows registers. This might be considered a bug in Windows. In this case the presence of a debugger that consumes the exception can be inferred by its absence.

DbgPrint

Similarly, despite the fact that the ntdll DbgPrint() function raises the DBG_PRINTEXCEPTION_C (0x40010006) exception, a registered Structured Exception Handler will not see it. Once again, the reason is that Windows registers its own Structured Exception Handler internally, which consumes the exception if a debugger does not do so. As such, the presence of a debugger that consumes the exception cannot be inferred by the absence of it.

However, as discussed previously, in Windows XP and later, any registered Vectored Exception Handler will run before the Structured Exception Handler that Windows registers and the presence of a debugger that consumes the exception can now be inferred by the absence of the exception. Further, a different exception is delivered to the Vectored Exception Handler if a debugger is present but has not consumed the exception, or if a debugger is not present. If a debugger is present but has not consumed the exception, then Windows will deliver the DBG_PRINTEXCEPTION_C (0x40010006) exception. If a debugger is not present, then Windows will deliver the EXCEPTION_ACCESS_VIOLATION (0xC0000005) exception. The presence of a debugger can now be inferred by either the absence of the exception, or by the value of the exception.

LoadLibrary

The kernel32 LoadLibrary() function is an unexpected method for debugger detection, but a simple and effective one. When a file is loaded in the presence of a debugger using the kernel32 LoadLibrary() function, and then freed, a handle remains open for that file. As a result, the file can no longer be opened for exclusive access. This fact can be used to infer the presence of the debugger.

Example code looks like this:

  mov esi, offset l1
  push    esi
  call LoadLibraryA
  push    eax
  call FreeLibrary
  xor ebx, ebx
  push    ebx
  push    ebx
  push 3
  push    ebx
  push    ebx
  push 80000000h
  push    esi
  call CreateFileA
  inc eax
  je  being_debugged
  ...
l1: db “myfile”, 0

A less obvious method of achieving the same thing is to use the resource-updating APIs, specifically the kernel32 EndUpdateResource() function. The reason this works is because it eventually calls the kernel32 CreateFile() function to write the new resource table.

Example code looks like this:

  mov esi, offset l1
  push    esi
  call    LoadLibraryA
  push    eax
  call    FreeLibrary
  push    0
  push    esi
  call    BeginUpdateResourceA
  push    0
  push    eax
  call    EndUpdateResourceA
  test    eax, eax
  je being_debugged
  ...
l1: db “myfile”, 0

NtQueryInformationProcess

As with the ProcessDebugPort class mentioned in [1], two other classes are similarly affected by arbitrary patching without checking the process handle: ProcessDebugObjectHandle and ProcessDebugFlags.

Example code for the ProcessDebugObjectHandle class looks like this:

  xor ebx, ebx
  mov ebp, offset l1
  push    ebp
  call    GetStartupInfoA
  ;sizeof(PROCESS_INFORMATION)
  sub esp, 10h
  push    esp
  push    ebp
  push    ebx
  push    ebx
  push 1 ;DEBUG_PROCESS
  push    ebx
  push    ebx
  push    ebx
  push    ebx
  push    offset l2
  call    CreateProcessA
  pop     eax
  push    eax
  mov ecx, esp
  push    0
  push    4 ;ProcessInformationLength
  push    ecx
  ;ProcessDebugObjectHandle
  push    1eh
  push    eax
  call    NtQueryInformationProcess
  pop eax
  test    eax, eax
  je  being_faked
  ...
  ;sizeof(STARTUPINFO)
l1: db 44h dup (?)
l2: db “myfile”, 0

Example code for the ProcessDebugFlags class looks like this:

  xor ebx, ebx
  mov ebp, offset l1
  push    ebp
  call    GetStartupInfoA
  ;sizeof(PROCESS_INFORMATION)
  sub esp, 10h
  push    esp
  push    ebp
  push    ebx
  push    ebx
  push    1 ;DEBUG_PROCESS
  push    ebx
  push    ebx
  push    ebx
  push    ebx
  push    offset l2
  call    CreateProcessA
  pop     eax
  push    eax
  mov ecx, esp
  push    0
  push    4 ;ProcessInformationLength
  push    ecx
  push    1fh ;ProcessDebugFlags
  push    eax
  call    NtQueryInformationProcess
  pop eax
  test    eax, eax
  jne being_faked
  ...
  ;sizeof(STARTUPINFO)
l1: db     44h dup (?)
l2: db     “myfile”, 0

Hardware tricks

Execution timing

When a debugger is used to single-step through code, there is a significant delay between the execution of the individual instructions when compared to native execution. This delay can be measured using one of several possible time sources. These sources include the kernel32 QueryPerformanceCounter(), kernel32 GetSystemTime() and kernel32 GetLocalTime() functions, the winmm timeGetSystemTime() function, and interrupt 0x2A (also known as the KiGetTickCount() function).

Process tricks

No import table

Windows NT and Windows 2000 assume that an executable file contains an import table, and that as a result, kernel32.dll is loaded. Kernel32.dll can be loaded by importing a function directly from kernel32.dll, but it is also acceptable to import a function from another DLL that also imports from kernel32.dll (user32.dll, gdi32.dll, etc.).

Normally, if kernel32.dll is not present, a fault will occur at the location at which the context EIP points, because no page is mapped there. However, it is possible to change the value in the PE->ImageBase field to place the executable file in that location. Then, whenever the file is executed, it will receive control instead of causing a fault. Further, since ntdll.dll is always loaded, it is possible to make use of some of its functions, such as ntdll LdrLoadDll() and ntdll LdrGetProcedureAddress(), to resolve the required functions and execute normally.

Anti-debugging DLLs

Dynamically loaded DLLs are called initially with the DLL_PROCESS_ATTACH parameter. If they refuse to load, they will be called immediately again, but with the DLL_PROCESS_DETACH parameter. Statically loaded DLLs are also called with the DLL_PROCESS_ATTACH parameter. However, if they refuse to load, then the ntdll NtRaiseHardError() function will be called in order to display the message: ‘The application failed to initialize properly’. Following that, the ntdll RtlRaiseStatus() function will be called.

In the absence of a debugger, this function will trigger an exception that cannot normally be intercepted, because all registered Structured Exception Handlers will have been removed already. However, if the topmost Structured Exception Handler is replaced, then it will be called via the ntdll RtlRaiseStatus() function call. This can allow a DLL to continue execution after a message that suggests that it terminated.

Example code looks like this:

  push  esi
  xor   esi, esi
  fs:lodsd
  inc  eax
l1: dec    eax
  xchg     eax, esi
  lodsd
  inc  eax
  jnz  l1
  mov  d [esi], offset l2
  pop  esi
  ret
l2: ...

In this case, l2 will gain control after the message box is dismissed.

TLS Callback

Thread Local Storage (TLS) callback is an old technique that remains relatively under-investigated. The following are some new extensions:

The TLS callback array can be altered (later entries can be modified) and/or extended (new entries can be appended) at runtime. Newly added or modified callbacks will be called using the new addresses. There is no limit to the number of callbacks that can be placed. This technique has been disclosed publicly [2].
Example callback code looks like this:
```
l1: mov d [offset cbEnd],offset l2
   retn
l2: ...
```
The callback at l2 will be called when the callback at l1 returns.
TLS callback addresses can point outside of the image – for example, to newly loaded DLLs.
Example callback code looks like this:
```
l1: push offset l2
  call LoadLibraryA
  mov   [offset cbEnd], eax
  ret
l2: db   “tls2”, 0
```
In this case, the ‘MZ’ header of tls2.dll will be executed when the callback at l1 returns. The file header can be made executable despite DEP, using the SectionAlignment trick described in [3]. This allows the code to run without error.
TLS callback addresses can contain RVAs of imported addresses from other DLLs if the import address table is altered to point into the callback array. Imports are resolved before callbacks are called, so imported functions will be called normally when the callback array entry is reached.
TLS callbacks receive three stack parameters, which can be passed directly to APIs. The first parameter is the ImageBase of the host process. It could be used by APIs such as the kernel32 LoadLibrary() or kernel32 WinExec() functions. The ImageBase parameter will be interpreted by the kernel32 LoadLibrary() or kernel32 WinExec() functions as a pointer to the filename to load or execute. By creating a file called ‘MZ[some string]’, where ‘some string’ matches the host file header contents, the TLS callback will access the file without any explicit reference. Of course, the ‘MZ’ portion of the string can also be replaced manually at runtime, but many APIs rely on this signature, so the results of such a change are unpredictable.
TLS callbacks are called whenever a thread is created or destroyed (unless the process calls the kernel32 DisableThreadLibraryCalls() or the ntdll LdrDisableThreadCalloutsForDll() functions). This includes the thread that is created by Windows when a debugger attaches to a process. The debugger thread is special in that its entrypoint does not point inside the image. Instead, it points inside kernel32.dll. Thus, a simple debugger detection method is to use a TLS callback to query the start address of each thread that is created.
Example callback code looks like this:
```
 push   eax
 mov eax, esp
 push   0
 push   4
 push   eax
 ;ThreadQuerySetWin32StartAddress
 push 9
 push   -2 ;GetCurrentThread()
 call   NtQueryInformationThread
 pop eax
 cmp eax, offset l1
 jnb being_debugged
 ...
l1: <code end>
```
Since TLS callbacks run before a debugger can gain control, the callback can make other changes, such as removing the breakpoint that is typically placed at the host entrypoint. When combined with the ntdll DbgBreakPoint() function patch, the result is a file that cannot be debugged by ordinary means. The debugger will attach to the debuggee, and then wait for the exception which will never occur. Using Ctrl-C to break in will work well enough to look at the code, but breakpoints that are placed within the other threads will not activate.
Example callback code looks like this:
```
 push   offset l2
 call   GetModuleHandleA
 push   offset l3
 push   eax
 call   GetProcAddress
 push   eax
 push   esp
 push   40h   ;PAGE_EXECUTE_READWRITE
 push   1
 push   eax
 xchg   ebx, eax
 call   VirtualProtect
 mov b [ebx], 0c3h
 ;<val> is byte at l1
 mov b [offset l1], <val>
 pop eax
 ret
l1: <host entrypoint>
 ...
l2: db   “ntdll”, 0
l3: db   “DbgBreakPoint”, 0
```
Currently, it seems that no debugger handles this case. However, the fix is very simple, and increasingly necessary. It is a matter of inserting the breakpoint on the first byte of the first TLS callback instead of the host entrypoint. This will allow an exception to be raised as usual. However, care must be taken regarding the callback address, since as noted above, the address may be the RVA of an imported function. Thus, the address cannot be taken from the file header. It must be read from the image memory.

In part three of this article next month we will look at some miscellaneous anti-debugging tricks, as well as a range of tricks that target specific debuggers.

The text of this paper was produced without reference to any Microsoft source code or personnel.

Bibliography

[1] Ferrie, P. Anti-unpacker tricks – part one. Virus Bulletin, December 2008, p.4. http://www.virusbtn.com/virusbulletin/archive/2008/12/vb200812-anti-unpacker-tricks.

[2] Self-modifying TLS callbacks. http://www.openrce.org/blog/view/1114/Self-modifying_TLS_callbacks.

[3] Ferrie, P. Anti-unpacker tricks. 2008. http://pferrie.tripod.com/papers/unpackers.pdf.

Latest articles:

Nexus Android banking botnet – compromising C&C panels and dissecting mobile AppInjects

Aditya Sood & Rohit Bansal provide details of a security vulnerability in the Nexus Android botnet C&C panel that was exploited to compromise the C&C panel in order to gather threat intelligence, and present a model of mobile AppInjects.

Cryptojacking on the fly: TeamTNT using NVIDIA drivers to mine cryptocurrency

TeamTNT is known for attacking insecure and vulnerable Kubernetes deployments in order to infiltrate organizations’ dedicated environments and transform them into attack launchpads. In this article Aditya Sood presents a new module introduced by…

Collector-stealer: a Russian origin credential and information extractor

Collector-stealer, a piece of malware of Russian origin, is heavily used on the Internet to exfiltrate sensitive data from end-user systems and store it in its C&C panels. In this article, researchers Aditya K Sood and Rohit Chaturvedi present a 360…

Fighting Fire with Fire

In 1989, Joe Wells encountered his first virus: Jerusalem. He disassembled the virus, and from that moment onward, was intrigued by the properties of these small pieces of self-replicating code. Joe Wells was an expert on computer viruses, was partly…

Run your malicious VBA macros anywhere!

Kurt Natvig wanted to understand whether it’s possible to recompile VBA macros to another language, which could then easily be ‘run’ on any gateway, thus revealing a sample’s true nature in a safe manner. In this article he explains how he recompiled…

Bulletin Archive