2011-07-01
Abstract
Pseudorandom generators are increasingly becoming an integral component of modern malware. Raul Alvarez shows how Conficker uses a pseudorandom generator to produce random domain names while retaining its ability to communicate with the Command and Control (C&C) server.
Copyright © 2011 Virus Bulletin
The cat-and-mouse chase between the takedown of botnet Command and Control (C&C) servers and malware that incorporates self-updating technology stepped up a gear when malware started to generate pseudorandom domain names.
A few years ago, botnets updated themselves through static IP addresses coded deep within them, or domain names encrypted within their core. But anti-malware researchers soon became able to determine which IP addresses or domain names are used by a given piece of malware, thus leading the way for proactive takedowns, the closure and blocking of those addresses.
Now, however, malware is capable of creating pseudorandom domain names that are hard to track. The malware is able to update itself by employing a form of Monte Carlo simulation. A Monte Carlo simulation is a methodology that employs random numbers within a given set context.
A simple example is as follows:
We can randomly mark a dot on a sheet of paper. As long as the dot is marked on the paper we can predict the location of the dot. It is random in the sense that we don’t know the exact point at which the dot will land, but we do know the boundaries within which it is restricted.
Using the same concept, malware and its servers can create random domain names within a given border, thus allowing it to update itself while producing random domains.
This article will show how Conficker uses a pseudorandom generator to produce random domain names while retaining its ability to communicate with the Command and Control (C&C) server, and how the machines infected by Conficker can generate the same pseudorandom domain names in sync.
We first saw Conficker spring into action a couple of years ago. Exploiting vulnerabilities, propagating through removable drives and jumping on network shares were some of the ways in which Conficker spread itself. This article focuses on the malware’s pseudorandom generation of domain names.
Before executing its domain name generation routine, Conficker checks if the infected machine has an Internet connection by calling the InternetGetConnectedState() API. If there is no Internet connectivity, it will sleep for one minute then check again. It will keep checking until it can establish a connection. Once it is successful, it will proceed to check the current date.
In this particular variant, Conficker checks for a certain date before proceeding to the subroutine of generating the domain names. The date checking starts with a call to the GetSystemTime() API, which returns the current system date and time expressed in Coordinated Universal Time (UTC). If the retrieved date falls before January 2009, it will sleep for three hours by creating a loop of 18 iterations and sleeping for 10 minutes for each iteration. After three hours it will be awakened to check the date again.
When the right timing has been acquired (i.e. the date is later than January 2009), Conficker generates the starting point by calling the srand() function. The srand() function accepts one parameter, the seed, to set the starting point for generating a series of pseudorandom numbers.
To generate the seed, Conficker XORs all the resulting values from calls to the following APIs:
GetCurrentThreadId()
GetCurrentProcessId()
QueryPerformanceCounter()
GetTickCount()
The different seed values ensure that the pseudorandom number generator will generate a different succession of results in the subsequent calls to the rand() function. (A call to the rand() function generates a pseudorandom number.)
After setting the starting point of the pseudorandom generator, the first random number is retrieved by calling the rand() function and dividing the result by six. The resulting remainder from the division operation is then used to select from one of the following search engines: ‘baidu.com’, ‘google.com’, ‘yahoo.com’, ‘msn.com’, ‘ask.com’, and ‘w3.org’ (see Figure 1).
After adding the string ‘http://www’ to the selected search engine, another subroutine is executed. This subroutine starts by getting the user agent header string (containing information about compatibility, the browser, and the platform name) by calling the ObtainUserAgentString() API (see Figure 2).
Figure 2. User agent string – Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 2.0.50727; .NET4.0C; .NET4.0E).
The same header string is supplied as a parameter for a call to the InternetOpenA() API to initialize the use of the WinINet functions. (The WinINet API enables applications to access standard Internet protocols, such as FTP and HTTP [1].)
The selected search engine website, e.g. ‘www.yahoo.com’, is now opened via a call to the InternetOpenUrlA() API, which is immediately followed by a call to the HttpQueryInfoA() API with a query info flag of 0x20000013 (HTTP_QUERY_FLAG_NUMBER | HTTP_QUERY_URI). This flag identifies the specific location of the resource. Another call to HttpQueryInfoA() with a flag of 0x00000009 (HTTP_QUERY_DATE) retrieves the date and time at which the message was originated.
The date and time information is the most important element in the creation of Conficker’s pseudorandom domain names. This information is used to determine the value that synchronizes the domain names generated by the infected machines and by the malware’s Command and Control (C&C) server.
To generate the initial value, Conficker extracts the date, month and year from the information gathered by HttpQueryInfoA() and stores them in memory in SYSTEMTIME format [2]; a quick call to the SystemTimeToFileTime() API changes the time to FILETIME format [3].
A series of computations involving the lower and higher four bytes of FILETIME is performed to generate a 64-bit value. This serves as the initial value for Conficker’s pseudorandom number generator. The malware does not use the rand() function to generate its domain names. The pseudorandom number generator is the most important element in order to synchronize the domain names produced by infected machines in the wild.
Before we proceed further, let’s look closely at Conficker’s pseudorandom generator. The following are the step by step instructions of the generator subroutine:
A typical entry on a given subroutine has the following commands to set up the stack:
55 push ebp 8B EC mov ebp,esp 83 EC 20 sub esp,20h
The initial 64-bit value that we got from our previous calculations is stored in memory. Let’s call the upper 32-bit value MemLocHigh and the lower 32-bit value MemLocLow. The following codes copy the values to the ECX and EAX registers:
8B 0D 94 9D 3B 00 mov ecx,MemLocLow A1 90 9D 3B 00 mov eax,MemLocHigh
There are four additional memory storages used to hold the temporary 64-bit values for the rest of the calculations. Let’s call them TempMem1, TempMem2, TempMem3 and TempMem4. There are also three memory variables used for 32-bit computation. Let’s call them memA, memB and memC. These variables and memory locations will be used by Conficker in the series of computations that follow.
TempMem1 is zeroed out and the contents of MemLocLow are copied to memA:
83 65 F8 00 and dword ptr [ebp+TempMem1],0 56 push esi 8B D1 mov edx,ecx 57 push edi 89 55 FC mov memA,edx
Conficker stores the value of ‘MemLocLow AND 7FFFFFFFh’ to memB, and TempMem2 now points to MemLocHigh.
BF FF FF FF 7F mov edi,7FFFFFFFh 23 D7 and edx,edi 89 45 F0 mov dword ptr [ebp+TempMem2],eax 89 55 F4 mov memB,edx
The following codes introduce the instruction FILD, one of the assembly instructions in the FPU (Floating-Point Unit) instruction set. There are eight 80-bit data registers in FPU that are arranged as a stack: ST0, ST1, ST2, … ST7.
ST0 contains the value at the top of the stack, which is used by the FPU instructions in their computation. FPU instructions are mostly ignored or skipped by anti-virus emulators – malicious programs often use this instruction as one of their anti-emulator tricks. The resulting values of these FPU instructions constitute the overall action of the malware. If the anti-virus software can’t properly process the FPU instructions, there is a big chance of missing the actual intent of the malware.
FILD (integer load) is used to convert the TempMem2 value to the 80-bit extended precision format and push the result to ST0:
DF 6D F0 fild [ebp+TempMem2]
Conficker ANDs the value of memA with 80000000h:
BE 00 00 00 80 mov esi,80000000h 21 75 FC and memA,esi
It converts the TempMem1 value to the 80-bit format and pushes the result to ST0, the original value of ST0 is now pushed down to ST1:
DF 6D F8 fild [ebp+TempMem1]
It zeroes out the content of TempMem1 and memA now contains the result of MemLocLow AND 80000000h:
83 65 F8 00 and dword ptr [ebp+TempMem1],0 89 4D FC mov memA,ecx 21 75 FC and memA,esi
FCHS (change sign) is another FPU instruction that changes the sign of ST0:
D9 E0 fchs
This is followed by the codes that use FADDP, the content of ST0 and ST1 is added and the result is placed into ST1. It also pops the content of ST0 out of the stack.
DE C1 faddp st(1),st
Conficker copies MemLocHigh to TempMem, copies MemLocLow to memC, and saves MemLocLow to the regular stack:
89 45 E8 mov dword ptr [ebp+TempMem],eax 89 4D EC mov memC,ecx 51 push ecx
FSTP is used to store the value of ST0 to TempMem2 and pop the ST0 content out of the stack:
DD 5D F0 fstp [ebp+TempMem2]
Followed by the codes that show that Conficker keeps manipulating the values of MemLocHigh and MemLocLow.
51 push ecx DF 6D E8 fild [ebp+TempMem3] DF 6D F8 fild [ebp+TempMem1] D9 E0 fchs DE C1 faddp st(1),st
Conficker stores the value of ST0 to the regular stack and computes the sine of that value.
DD 1C 24 fstp RegStackPointer E8 65 94 00 00 call MSVCRT.sin
After getting the sine of ST0, another series of FPU instructions are executed. At the end of the codes below, it gets the log of ST0:
83 C4 08 add esp,8 DD 5D E0 fstp [ebp+TempMem4] 83 65 F8 00 and dword ptr [ebp+TempMem1],0 89 55 FC mov memA,edx 21 75 FC and memA,esi 23 D7 and edx,edi 89 45 E8 mov dword ptr [ebp+TempMem3],eax 89 55 EC mov memC,edx DF 6D E8 fild [ebp+TempMem3] 51 push ecx DF 6D F8 fild [ebp+TempMem1] 51 push ecx D9 E0 fchs DE C1 faddp st(1),st DC 45 E0 fadd [ebp+TempMem4] DC 4D F0 fmul [ebp+TempMem2] DC 4D F0 fmul [ebp+TempMem2] DD 5D E0 fstp [ebp+TempMem4] DD 45 F0 fld [ebp+TempMem2] DD 1C 24 fstp RegStackPointer E8 06 94 00 00 call MSVCRT.log
Finally, Conficker copies the value of ST0 to MemLocHigh and MemLocLow using the FSTP instruction. The return value at register EAX also contains the new MemLocHigh value.
59 pop ecx 59 pop ecx 5F pop edi DD 1D 90 9D 3B fstp MemLocHigh A1 90 9D 3B 00 mov eax,MemLocHigh 5E pop esi C9 leave C3 retn
The new values of the MemLocHigh and MemLocLow memory locations will now be supplied as the 64-bit value for the next execution of the pseudorandom generator.
Conficker’s pseudorandom generator accepts a 64-bit value. It performs a calculation on this 64-bit value using FPU instructions such as FILD, FCHS, FADDP, FSTP and FMUL. These instructions use the special stack registers ST0, ST1, …, ST7. Conficker also uses the mathematical functions sine and log to produce a different numeric result.
After the long and tedious calculations, the end result is a new 64-bit value. This new 64-bit value is used as the input parameter for the next call to the pseudorandom generator.
The lower 32-bit value is stored in the EAX register, which is essential in the generation of the domain names.
Conficker’s pseudorandom number generator is an important component in generating the pseudorandom domain names that are recognized by all Conficker-infected machines (of the same variant) and its C&C servers.
The actual domain name generating routine can be divided into three blocks of code (see Figure 4).
The first block of code, block A, sets up the counter for creating 250 (number varies by variant) domain names. Each domain name is stored in a memory location generated by a call to the GlobalAlloc() API.
The second block of code, block B, starts by calling Conficker’s pseudorandom generator routine. The resulting EAX value from the routine is converted by the CDQ instruction to quad word in EDX:EAX via sign extension. (For example: if EAX = 0 or positive, EDX will be 0000 0000; otherwise if EAX is negative, EDX will be 0xFFFFFFFF.)
PUSH 4, POP ECX AND IDIV ECX divides the value in EDX:EAX by four, yielding the remainder in EDX. The possible values for the remainder in EDX range from -3 to 3. Adding eight to the remainder gives us the number of characters to be generated for the new domain name.
The resulting EAX from a call to the pseudorandom generator is converted to its absolute value by calling the labs() API (which calculates the absolute value of a long integer). The value is now divided by 0x1A (26 in decimal), to determine which letter of the alphabet has been selected; adding 0x61 to the value transforms it to hexadecimal code representing the lower case equivalent of the letter.
The JMP instruction creates the loop that generates the pre-computed number of lower case letters for the domain name.
The third block of code, block C, ANDs the value of EAX from a call to the pseudorandom generator by seven. It effectively selects the TLD (top-level domain) suffix from one of the following: .cc, .cn, .ws, .com, .net, .org, .info and .biz (see Figure 3). The selected TLD suffix is now appended to the domain name generated from block B.
To summarize, in this Conficker variant, 250 domain names will be generated. Each domain name consists of lower case letters of the alphabet that range from five to 11 characters with the TLD suffix taken from the eight possible TLD strings. Note that each call to the pseudorandom generator produces a new 64-bit value that acts as the new input for the same routine.
Pseudorandom generators are increasingly becoming an integral component of modern malware, not just for generating random domain names. Given this ability, Conficker proves to us that if an anti-virus system is not capable of emulating FPU instructions, it will be left behind. Other Conficker variants have slight variations on their pseudorandom generator, yet the same idea remains.
Conficker synchronizes its generated domain names with other infected machines and C&C servers by using the date and time taken from a randomly selected search engine website.
In addition, we have recently seen domain name generation in the Licat file infector, the Srizbi trojan [4], and some phishing-capable trojans. The common denominator between Conficker and these pieces of malware is the use of the current date and time for synchronization; the use of random domain names will only be successful if they can also be generated by their C&C servers.
They are out there. Hundreds of pieces of malware with domain name generation capability are around, and there are more to come. The question is: can we catch up?
[1] Windows Internet. http://msdn.microsoft.com/en-us/library/aa385331(v=VS.85).aspx.
[2] SYSTEMTIME Structure. http://msdn.microsoft.com/en-us/library/ms724950(v=vs.85).aspx.
[3] FILETIME Structure. http://msdn.microsoft.com/en-us/library/ms724284(v=vs.85).aspx.