Posted by Virus Bulletin on Aug 19, 2015
Szilard Stange makes the case for multi-engine malware scanning.
The VB2015 conference takes place next month (30 September to 2 October) in Prague, with an exciting programme that covers many of today's most pertinent security-related topics. In the run-up to the event, we have invited each of the VB2015 sponsors to write a guest post for our blog. Today, we publish a piece by Szilard Stange, Director of Product Management at OPSWAT (a sponsor of every VB conference since 2008). In his blog, Szilard makes the case for scanning files with multiple engines.
In the 1990s I was working for a small anti-virus vendor, analysing viruses. I started with simple malware, but later got into more tricky types as well; at the time, simple XOR encryption, variable encryptions, polymorphism and metamorphism were some of the challenges we tackled. Those were good times — every day brought a new malware trick to the table.
Back then we processed every piece of malware manually, but faced with an increasing volume of samples, we had to look for ways to speed up the malware processing task. We built an automation system, which we called AVAS (Automatic Virus Analysis System); it was a kind of sandbox system that executed files to see what they did. Some viruses evaded our detection methods, so we had to modify our system to feed in the required environment options to convince malware to infect it. The system became more complex, required more computers, and finally the processing time per file increased to a point at which we had to look for a smarter solution.
The basic recognition algorithm was simple pattern matching, using simple string matching as well as many features similar to regular expressions. We had to have at least two fix bytes to fire a special script that could confirm the detection and cure the infection if possible. This scripting language contained file access functions and other data manipulation functions to make it possible to decrypt simple viruses and recover the infected files. But viruses improved so much that later we couldn't even find two fix bytes to fire the special script. Variable encryptions and polymorphic viruses made the job harder. Over time we implemented a CPU emulator and a virtual file system with basic Windows emulation, which helped improve our simple pattern matching so that variable encryptions and polymorphic viruses were no longer too hard for us to deal with.
We found that it was much easier to mitigate anti-debug and other tricks in our emulator than in our sandbox system, but the biggest challenge was to speed up emulation and skip emulation of unrelated things to focus only on the interesting parts. We made adjustments to be able to look quickly at new, complex malware, but still we couldn't expand our resources sufficiently to deal with all of the new malware tricks.
Looking at the anti-malware scene today, we can classify software for threat detection into two major types: the 'traditional' anti-malware engines and the dynamic analysis/sandbox products. If we compare these technologies and their challenges, it's impossible to declare one better than the other: each product has advantages.
Dynamic analysis systems are facing similar issues to those we faced in the 1990s in our internal systems. Malware authors are continuously 'testing' their malware against products and trying to find weaknesses they can exploit to bypass the detection. Checking the system time and looking for specific environment parameters are common methods used to bypass detection, but there are more advanced techniques developed by malware authors as well. Sandbox developers have to fine-tune their system daily to mitigate these new anti-sandbox techniques.
Sandbox products also face challenges around the speed of analysis. They need to boot up (or resume) virtual machines and perform analysis for every individual file. If you could prepare virtual machines with every proper method and handle anti-sandbox techniques correctly, then the effectiveness of sandboxes could hit the roof, but the price of the required hardware resources, the required time of analysis and the continuous maintenance requirement makes this unfeasible.
What about the traditional scanning engines? In short, traditional scanning engines are the workhorses of malware detection. These products are designed to protect web and email traffic and to scan each file opened on a system; in all of these cases, response time is critical and customers expect maintenance-free products. Given these primary uses, traditional scanning engines focus on providing a binary result (is this malware or not) rather than detailed analysis, even though most are using emulation techniques, like sandboxes do, for heuristic detection purposes.
As you might have gathered from my story, scan engine developers had to make some simplifications during the emulation process in order to provide an acceptable response speed. These simplifications can lead to information being lost during the analysis, and in the end it's easy to miss something that would be critical for detecting a specific new malware.
In short, dynamic analysis products go much deeper in their efforts to detect each new threat, but as a result are both time- and resource-intensive. Traditional scanning engines provide a plug-and-play solution with real-time protection, but as a result they do not provide the sort of detailed analysis that can uncover outbreaks.
Customers always want to achieve 100% detection to protect their businesses against threats, and combining dynamic analysis products with traditional anti-virus scanning engines helps towards that goal. Customers can deploy traditional anti-malware products to catch the majority of malware, and deploy a dynamic analysis solution to focus on APTs and other special malware. This combination is very effective in most cases, however it can also be resource-intensive and requires regular maintenance tasks to keep detection rates up.
Another combination of technologies is to make the best of the proactive detection rate of scanning engines. According to third-party tests, individual scan engines can achieve 10%-90% detection rates for unknown malware. What if we scan for malware with lots of scanning engines? Each engine has its own strengths and weaknesses, due to the fact that every vendor has a different implementation for a similar problem, so the detection rate of the combination of products is superior to that of the individual scanning engines. The solution still requires minimal maintenance on the customer side, even if we outsource the problem of multiple scan engine licensing, updating and integration. The scanning speed of multi-engine solutions can be much faster than that of the dynamic analysis/scan engine combination, but it depends on the number of engines and the vendors included in the multi-scanning solution. In this case, a trade-off is made between scanning speed and detection capabilities — 30+ scan engines may take longer than a single scan-engine-plus-sandbox combination, but no one can beat the detection capabilities of this number of scanning engines.
Posted on 19 August 2015 by Martijn Grooten