Back to the future: anti-virus engines and sandboxes

Posted by   Virus Bulletin on   Aug 19, 2015

Szilard Stange makes the case for multi-engine malware scanning.

The VB2015 conference takes place next month (30 September to 2 October) in Prague, with an exciting programme that covers many of today's most pertinent security-related topics. In the run-up to the event, we have invited each of the VB2015 sponsors to write a guest post for our blog. Today, we publish a piece by Szilard Stange, Director of Product Management at OPSWAT (a sponsor of every VB conference since 2008). In his blog, Szilard makes the case for scanning files with multiple engines.

A journey back to the 1990s

In the 1990s I was working for a small anti-virus vendor, analysing viruses. I started with simple malware, but later got into more tricky types as well; at the time, simple XOR encryption, variable encryptions, polymorphism and metamorphism were some of the challenges we tackled. Those were good times — every day brought a new malware trick to the table.

Back then we processed every piece of malware manually, but faced with an increasing volume of samples, we had to look for ways to speed up the malware processing task. We built an automation system, which we called AVAS (Automatic Virus Analysis System); it was a kind of sandbox system that executed files to see what they did. Some viruses evaded our detection methods, so we had to modify our system to feed in the required environment options to convince malware to infect it. The system became more complex, required more computers, and finally the processing time per file increased to a point at which we had to look for a smarter solution.

The basic recognition algorithm was simple pattern matching, using simple string matching as well as many features similar to regular expressions. We had to have at least two fix bytes to fire a special script that could confirm the detection and cure the infection if possible. This scripting language contained file access functions and other data manipulation functions to make it possible to decrypt simple viruses and recover the infected files. But viruses improved so much that later we couldn't even find two fix bytes to fire the special script. Variable encryptions and polymorphic viruses made the job harder. Over time we implemented a CPU emulator and a virtual file system with basic Windows emulation, which helped improve our simple pattern matching so that variable encryptions and polymorphic viruses were no longer too hard for us to deal with.

We found that it was much easier to mitigate anti-debug and other tricks in our emulator than in our sandbox system, but the biggest challenge was to speed up emulation and skip emulation of unrelated things to focus only on the interesting parts. We made adjustments to be able to look quickly at new, complex malware, but still we couldn't expand our resources sufficiently to deal with all of the new malware tricks.

Sandboxes and scan engines

Looking at the anti-malware scene today, we can classify software for threat detection into two major types: the 'traditional' anti-malware engines and the dynamic analysis/sandbox products. If we compare these technologies and their challenges, it's impossible to declare one better than the other: each product has advantages.

Dynamic analysis systems are facing similar issues to those we faced in the 1990s in our internal systems. Malware authors are continuously 'testing' their malware against products and trying to find weaknesses they can exploit to bypass the detection. Checking the system time and looking for specific environment parameters are common methods used to bypass detection, but there are more advanced techniques developed by malware authors as well. Sandbox developers have to fine-tune their system daily to mitigate these new anti-sandbox techniques.

Sandbox products also face challenges around the speed of analysis. They need to boot up (or resume) virtual machines and perform analysis for every individual file. If you could prepare virtual machines with every proper method and handle anti-sandbox techniques correctly, then the effectiveness of sandboxes could hit the roof, but the price of the required hardware resources, the required time of analysis and the continuous maintenance requirement makes this unfeasible.

What about the traditional scanning engines? In short, traditional scanning engines are the workhorses of malware detection. These products are designed to protect web and email traffic and to scan each file opened on a system; in all of these cases, response time is critical and customers expect maintenance-free products. Given these primary uses, traditional scanning engines focus on providing a binary result (is this malware or not) rather than detailed analysis, even though most are using emulation techniques, like sandboxes do, for heuristic detection purposes.

As you might have gathered from my story, scan engine developers had to make some simplifications during the emulation process in order to provide an acceptable response speed. These simplifications can lead to information being lost during the analysis, and in the end it's easy to miss something that would be critical for detecting a specific new malware.

In short, dynamic analysis products go much deeper in their efforts to detect each new threat, but as a result are both time- and resource-intensive. Traditional scanning engines provide a plug-and-play solution with real-time protection, but as a result they do not provide the sort of detailed analysis that can uncover outbreaks.

Combining technologies

Customers always want to achieve 100% detection to protect their businesses against threats, and combining dynamic analysis products with traditional anti-virus scanning engines helps towards that goal. Customers can deploy traditional anti-malware products to catch the majority of malware, and deploy a dynamic analysis solution to focus on APTs and other special malware. This combination is very effective in most cases, however it can also be resource-intensive and requires regular maintenance tasks to keep detection rates up.

Another combination of technologies is to make the best of the proactive detection rate of scanning engines. According to third-party tests, individual scan engines can achieve 10%-90% detection rates for unknown malware. What if we scan for malware with lots of scanning engines? Each engine has its own strengths and weaknesses, due to the fact that every vendor has a different implementation for a similar problem, so the detection rate of the combination of products is superior to that of the individual scanning engines. The solution still requires minimal maintenance on the customer side, even if we outsource the problem of multiple scan engine licensing, updating and integration. The scanning speed of multi-engine solutions can be much faster than that of the dynamic analysis/scan engine combination, but it depends on the number of engines and the vendors included in the multi-scanning solution. In this case, a trade-off is made between scanning speed and detection capabilities — 30+ scan engines may take longer than a single scan-engine-plus-sandbox combination, but no one can beat the detection capabilities of this number of scanning engines.

Posted on 19 August 2015 by Martijn Grooten

twitter.png
fb.png
linkedin.png
hackernews.png
reddit.png

 

Latest posts:

VB2019 paper: DNS on fire

In a paper presented at VB2019, Cisco Talos researchers Warren Mercer and Paul Rascagneres looked at two recent attacks against DNS infrastructure: DNSpionage and Sea Turtle. Today we publish their paper and the recording of their presentation.

German Dridex spam campaign is unfashionably large

VB has analysed a malicious spam campaign targeting German-speaking users with obfuscated Excel malware that would likely download Dridex but that mostly stood out through its size.

Paper: Dexofuzzy: Android malware similarity clustering method using opcode sequence

We publish a paper by researchers from ESTsecurity in South Korea, who describe a fuzzy hashing algorithm for clustering Android malware datasets.

Emotet continues to bypass many email security products

Having returned from a summer hiatus, Emotet is back targeting inboxes and, as seen in the VBSpam test lab, doing a better job than most other malicious campaigns at bypassing email security products.

VB2019 paper: We need to talk - opening a discussion about ethics in infosec

Those working in the field of infosec are often faced with ethical dilemmas that are impossible to avoid. Today, we publish a VB2019 paper by Kaspersky researcher Ivan Kwiatkowski looking at ethics in infosec as well as the recording of Ivan's…

We have placed cookies on your device in order to improve the functionality of this site, as outlined in our cookies policy. However, you may delete and block all cookies from this site and your use of the site will be unaffected. By continuing to browse this site, you are agreeing to Virus Bulletin's use of data as outlined in our privacy policy.