2009-07-01
Abstract
'The intent is the same, the information displayed to the user is the same, and the extorted money probably ends up in the same pocket.' Pierre-Marc Bureau, Eset.
Copyright © 2009 Virus Bulletin
As Dijkstra once said, ‘The effective exploitation of his powers of abstraction must be regarded as one of the most vital activities of a competent programmer.’ (Dijkstra, E.W. The Humble Programmer, 1972.) Malware operators and their sponsors understand this advice and are pushing the concept of abstraction to the next level. The gangs profiting from malware attacks do not really care about programming languages or anti-debugging tricks; they are interested in software that is relatively bug free and that matches their requirements (enabling them to steal passwords from game XYZ, send spam, etc.). Sometimes we need to take our heads out of the petri dish and stop over-analysing the programming detail. In more than one sense, the point isn’t how the code works, it’s what it does that matters.
Many of us are aware of the Waledac family of malware, which has now been around for several months. This threat is used to send massive amounts of spam. It communicates using a peer-to-peer network and spreads through malicious websites that use fast flux DNS server entries in order to make blocking by IP address more difficult. Does that description sound familiar? Exactly the same words could have been used to describe the Storm worm. This highlights an interesting problem we are facing increasingly frequently: similar malware, similar operations, but a completely different code base.
Rogue anti-virus programs have been prolific in the last 12 months. They are usually installed by other malware and generate income for their operators by scaring users and leading them to believe that the only way to clean their computer is by sending money to an unknown company. Once again, we have identified dozens of different code bases for rogue anti-virus programs. Some of them are programmed in Visual Basic, some in C++, Delphi, and so on. The intent is the same, the information displayed to the user is the same, and the extorted money probably ends up in the same pocket.
It has become clear that the organizations behind malware operations are prepared to sponsor complete rewrites of their malware. This may be to repair previous programming errors or design limitations, but they are also doing it to keep one step ahead of the research community and to evade anti-virus detection. It takes days, if not weeks, for a skilled reverse engineer to analyse and understand a piece of malware completely. Thus deploying completely new code will slow down investigative work considerably.
We need to take a step back from simply looking at compiled code. We have to focus on finding a model that will express program functionalities and intents. By deducing and classifying the intent of a program (or at least part of it), we might have a chance of identifying the relationship between malicious programs that are part of the same operation and built for the same purpose but which use different code.
This task will be very hard to accomplish and we cannot expect flawless results, but any effort in this area would be a step in the right direction. For example, Dullien and Porst have developed a platform-independent language to represent disassembled code (Dullien, T.; Porst, S. REIL: A platform independent intermediate representation of disassembled code for static analysis, 2009). While this approach gives results that are too detailed to be effective in creating a generic description of a program’s intent, it does show that independent representation is attainable. The next step is to create a model that is easily extractable from compiled code and reliable enough to recognize similarities in the intent of the gang using variable binaries as part of their operation. One way to develop this model would be to include both algorithmic information from the code and data it is using as parameters.
It follows logically that it will be even more difficult to decide whether two programs using different code bases have the same intent. Yet in practice we can (and do, almost routinely) deduce malicious intent behind unknown code. We miss a trick or two by focusing on the crime scene and forgetting the criminal.