2006-06-01
Abstract
Sanjay Katkar describes how recent malicious programs have exploited the PE file format, manipulating the header fields to avoid detection.
Copyright © 2006 Virus Bulletin
Microsoft's PE (Portable Executable) file format has been in existence for quite a while. It is used in Win32-based operating systems.
In this article I will describe how recent malicious programs have exploited PE file format, manipulating the header fields to avoid detection. This technique has been in use for a couple of years – and, by now, most AV scanners should be able to detect the malware inside such header-manipulated PE files. However, there are still a number of scanners that can be fooled by this kind of trick.
Since I am assuming that most readers are familiar with the PE file format, I shall not discus the details of the PE header, section headers or the PE file structure here. More information about the headers and other details can be found in the various articles about the PE file format on Microsoft's website (see, for example, http://msdn.microsoft.com/msdnmag/issues/02/02/PE/).
To search a PE file for malware a scanner will typically need both to scan the file and to perform some form of emulation for the detection of polymorphic viruses.
At some point every scanner must reach the file offset where the file execution begins. AV scanners that do not scan the whole PE file need to determine this file execution offset accurately in order to reach the virus code and scan for the signature.
In the detection of polymorphic viruses, the bytes at the file execution offset are used as a starting point for the emulation or code byte analysis process. So, for many reasons, the calculation of the file execution start offset is very important for AV scanners, and if the execution start offset is miscalculated the scanner will miss the detection. It has been observed that an increasing number of malicious programs are using tricks that make it difficult for the scanner to reach the file execution start offset. It has also been observed that certain executable packers (e.g. NSPack, UPack) build PE file headers that cause this calculation to go wrong.
First, we will look at how the file-based execution start offset is calculated for a typical PE file. For this, we need to understand the PE header and section header. The table below shows the important fields within the PE optional header and section header for NOTEPAD.EXE (Windows XP Professional). All values are in hexadecimal.
Optional header | |||||
---|---|---|---|---|---|
Number of sections | Section alignment | Address of entry point | File alignment | Image base | |
06 | 00001000 | 0000739D | 00000200 | 01000000 | |
Section headers | |||||
Section name | Virtual size | Virtual address | Size of raw data | Pointer to raw data | Characteristics |
.text | 00007748 | 00001000 | 00007800 | 00000400 | 60000020 |
.data | 00001BA8 | 00009000 | 00000800 | 00007C00 | C0000040 |
.rsrc | 00008958 | 0000B000 | 00008A00 | 00008400 | 40000040 |
Table 1. Header information for NOTEPAD.EXE.
We know that, on disk, PE file format resembles very closely the image when Windows loads it into memory. The loader uses the memory-mapped file mechanism to map the appropriate section of the file into the virtual address space. So it is very easy to calculate the file-based PE file execution start offset.
The address of entry point that is stored in the optional header is a relative virtual address (RVA), where the loader will begin execution. An RVA is simply the offset of an item, relative to where the file is memory-mapped.
The following are the usual steps that are followed to reach the file execution start offset:
Determine each section's virtual memory map, i.e. virtual start address and end address. The virtual address and virtual size for each section can be found in the section header.
Determine in which section's virtual space the address of entry point lies.
Check the file offset of that section as per the section header. In the section header the pointer to raw data field gives us the file-based offset where the section data/bytes begin.
Calculate the difference between the address of entry point and the virtual address of the section in which the entry point lies. Add this difference to the pointer to raw data, which is the file-based offset of the section, in order to get the file-based execution start offset for that file.
In the case of Notepad, the address of entry point lies in the .text section, as the .text section starts at 0x00001000 and ends at 0x00008748 and the address of entry point is 0x0000739D. I have not added image base to any values here – since it is common to all RVAs, I can ignore it for calculation purposes.
So the file offset for execution start is:
(0x0000739D – 0x00001000) + 0x00000400
Here, 0x400 is the pointer to raw data of the .text section, which points to the file offset of the .text section. In this case the offset comes to 0x0000679D, which is where the execution will begin.
So what we see is that the loader reads each section's bytes from the pointer to raw data into a file and maps it to the virtual address given in the section header table. Since the values are RVAs we have to add these to the image base of the file to arrive at the actual pointer. (However, in the example given above I have omitted image base because all the values are RVAs.)
In the case of Notepad, you can see from the section header table that the first .text section will be mapped starting from virtual address 0x01001000. This means that the .text section, which begins at 0x400 in the file (0x400 is the pointer to raw data), will be mapped at 0x01001000 in memory.
Table 2 shows the header information of a typical malicious program that is packed using NSPack.
Optional header | |||||
Number of sections | Section alignment | Address of entry point | File alignment | Image base | |
02 | 00001000 | 0000101B | 00000200 | 00400000 | |
Section headers | |||||
Section name | Virtual size | Virtual address | Size of raw data | Pointer to raw data | Characteristics |
nsp0 | 00004000 | 00001000 | 0000000B | 0000001C | E0000060 |
nsp1 | 0000203D | 00005000 | 00000CFD | 00000200 | E0000060 |
Table 2. Header information for a typical piece of malware that is packed using NSPack.
The file execution start offset for this file is calculated as follows:
(0x0000101B - 0x00001000) + 0x0000001C = 0x00000037
But this is not the offset where file execution actually starts. The Windows loader rounds the pointer to raw data to 0x00000000 because it is less than the file alignment value (which is 0x00000200). This way, the loader assumes that the first section, nsp0, starts at file offset 0 and loads the section accordingly in the memory. So if we round the pointer to raw data, as the loader does, the file execution start offset is calculated as follows:
(0x0000101B - 0x00001000) + 0x00000000 = 0x0000001B
The offset 0x0000001B proves to be somewhere in the DOS header of the PE file. It lands in the reserved part of the DOS header – which is usually filled with zeros. At this location the packer inserts a five-byte jump instruction which will transfer control to code further ahead.
AV scanners need to implement a check such that, if the pointer to raw data is not a multiple of the file alignment it must be rounded to the nearest multiple and the remaining extra bytes skipped. Malware can avoid detection by an AV scanner that has not implemented such a check. I also observed that, for files whose file alignment value is not 0x00000200, the loader rounds it to a multiple of 0x00000200.
Many AV scanners do handle NSPack-ed PE executables correctly and are able to detect the malware. Some have implemented a rule such that the pointer to raw data of the first section is rounded to zero only if it is less than the file alignment – otherwise it is used as it is.
I observed that, even if I modified the pointer to raw data by increasing it by a few bytes (so that it would not be an exact multiple of file alignment), the file worked properly and had no problems in loading. I also checked with executable files whose control lies in different sections (e.g. first, second, or last). Regardless of which section the file control lies in, the pointer to raw data can be changed to any odd figure not just less than file alignment.
In most of the PE files I checked, I observed that the pointer to raw data field had a value that was a multiple of file alignment, so there were no issues of rounding the values or miscalculating. But as I came across some of the recent file packers that newer trojans and other malware are using I found that the packers are using this trick to avoid proper detection or to avoid debugging by standard debugging techniques.
I decided to check a number of AV scanners to see whether they had implemented the rule of rounding the pointer to raw data when calculating the file execution start offset.
I decided to use an old polymorphic virus. I selected a polymorphic virus because, where signature viruses are concerned, AV scanners have different methods for detection. Some of them scan for the signature in the few kilobytes at file executable start offset, but some scan the whole file for virus signatures – and in that case we would not be able to tell whether the scanner is calculating the file executable start offset. If the scanner is scanning the entire file, then it may not miss the detection even if we change the pointer to raw data of the control (execution/code) section. In the case of polymorphic viruses, the AV scanner must calculate the file execution start offset in order to reach the virus decryption loop/engine.
I used a sample of W32.CTX, also known as Win95.Marburg.8582, and the Virus Total service for this test. I took one sample of W32.CTX and named it CTX_ORG.EXE, then I copied this sample to CTX_CHG.EXE and modified the pointer to raw data of the .text section by increasing it by 0x199 bytes. The header information of both the files is shown in the tables below.
Optional header | |||||
---|---|---|---|---|---|
Number of sections | Section alignment | Address of entry point | File alignment | Image base | |
06 | 00001000 | 0000E365 | 00001000 | 00400000 | |
Section headers | |||||
Section name | Virtual size | Virtual address | Size of raw data | Pointer to raw data | Characteristics |
.text | 0002912D | 00001000 | 0002A000 | 00001000 | 60000020 |
.rdata | 00007AF8 | 0002B000 | 00008000 | 0002B000 | 40000040 |
.data | 000074A8 | 00033000 | 00003000 | 00033000 | C0000040 |
.idata | 00002092 | 0003B000 | 00003000 | 00036000 | C0000040 |
.rsrc | 000040E0 | 0003E000 | 00005000 | 00039000 | 40000040 |
.reloc | 000081D1 | 00043000 | 00009000 | 0003E000 | C2000040 |
Table 3. Header information for CTX_ORG.EXE.
Optional header | |||||
---|---|---|---|---|---|
Number of sections | Section alignment | Address of entry point | File alignment | Image base | |
06 | 00001000 | 0000E365 | 00001000 | 00400000 | |
Section headers | |||||
Section name | Virtual size | Virtual address | Size of raw data | Pointer to raw data | Characteristics |
.text | 0002912D | 00001000 | 0002A000 | 00001199 | 60000020 |
.rdata | 00007AF8 | 0002B000 | 00008000 | 0002B000 | 40000040 |
.data | 000074A8 | 00033000 | 00003000 | 00033000 | C0000040 |
.idata | 00002092 | 0003B000 | 00003000 | 00036000 | C0000040 |
.rsrc | 000040E0 | 0003E000 | 00005000 | 00039000 | 40000040 |
.reloc | 000081D1 | 00043000 | 00009000 | 0003E000 | C2000040 |
Table 4. Header information for CTX_CHG.EXE.
The only difference between CTX_ORG.EXE and CTX_CHG.EXE is that the pointer to raw data of the .text section is modified from 0x1000 to 0x1199 in CTX_CHG.EXE.
After this, I confirmed that both the files could be loaded and executed properly in Windows 9X systems and that the virus W32.CTX was activated properly.
The modified file cannot be loaded on Windows NT-based platforms as it is not a valid Win32 application. The NT loader checks a few more things in the header than Windows 95-based systems and thus finds the file suspicious. The PE header can be checked and modified further such that it does work on Windows 2000 and XP systems.
If an AV scanner does not round the pointer to raw data value it will calculate the file execution start offset as 0x199 bytes ahead of the actual execution start offset. Usually, CTX inserts a Jump instruction immediately at the beginning and hence if the scanner is not able to calculate the execution start offset correctly, it will miss the jump to the virus decryption polymorphic loop, will never reach the virus code and will miss the detection.
I submitted both the files for the on-line scanning services provided by Virus Total (www.virustotal.com). The results were that CTX_ORG.EXE was detected correctly (as infected) by 22 of the 24 scanners listed there. The file CTX_CHG.EXE was detected correctly by only 13 scanners. Nine AV scanners missed the detection – despite earlier having detected the same virus when it was not modified.
Even though the PE file format is quite old it has many more surprises in store, which are to be explored more carefully with respect to the boundary conditions and the OS loader.
There are other issues too, such as invalid information for size of raw data, virtual size or physical address, as these fields are needed both to reach the file execution start offset and often while cleaning a file to return it to its original status.
There is a need for further careful observation of the complete PE header. We still have to explore what else is there in 64-bit PE files.