Back to the future: anti-virus engines and sandboxes

Posted by   Virus Bulletin on   Aug 19, 2015

Szilard Stange makes the case for multi-engine malware scanning.

The VB2015 conference takes place next month (30 September to 2 October) in Prague, with an exciting programme that covers many of today's most pertinent security-related topics. In the run-up to the event, we have invited each of the VB2015 sponsors to write a guest post for our blog. Today, we publish a piece by Szilard Stange, Director of Product Management at OPSWAT (a sponsor of every VB conference since 2008). In his blog, Szilard makes the case for scanning files with multiple engines.

A journey back to the 1990s

In the 1990s I was working for a small anti-virus vendor, analysing viruses. I started with simple malware, but later got into more tricky types as well; at the time, simple XOR encryption, variable encryptions, polymorphism and metamorphism were some of the challenges we tackled. Those were good times — every day brought a new malware trick to the table.

Back then we processed every piece of malware manually, but faced with an increasing volume of samples, we had to look for ways to speed up the malware processing task. We built an automation system, which we called AVAS (Automatic Virus Analysis System); it was a kind of sandbox system that executed files to see what they did. Some viruses evaded our detection methods, so we had to modify our system to feed in the required environment options to convince malware to infect it. The system became more complex, required more computers, and finally the processing time per file increased to a point at which we had to look for a smarter solution.

The basic recognition algorithm was simple pattern matching, using simple string matching as well as many features similar to regular expressions. We had to have at least two fix bytes to fire a special script that could confirm the detection and cure the infection if possible. This scripting language contained file access functions and other data manipulation functions to make it possible to decrypt simple viruses and recover the infected files. But viruses improved so much that later we couldn't even find two fix bytes to fire the special script. Variable encryptions and polymorphic viruses made the job harder. Over time we implemented a CPU emulator and a virtual file system with basic Windows emulation, which helped improve our simple pattern matching so that variable encryptions and polymorphic viruses were no longer too hard for us to deal with.

We found that it was much easier to mitigate anti-debug and other tricks in our emulator than in our sandbox system, but the biggest challenge was to speed up emulation and skip emulation of unrelated things to focus only on the interesting parts. We made adjustments to be able to look quickly at new, complex malware, but still we couldn't expand our resources sufficiently to deal with all of the new malware tricks.

Sandboxes and scan engines

Looking at the anti-malware scene today, we can classify software for threat detection into two major types: the 'traditional' anti-malware engines and the dynamic analysis/sandbox products. If we compare these technologies and their challenges, it's impossible to declare one better than the other: each product has advantages.

Dynamic analysis systems are facing similar issues to those we faced in the 1990s in our internal systems. Malware authors are continuously 'testing' their malware against products and trying to find weaknesses they can exploit to bypass the detection. Checking the system time and looking for specific environment parameters are common methods used to bypass detection, but there are more advanced techniques developed by malware authors as well. Sandbox developers have to fine-tune their system daily to mitigate these new anti-sandbox techniques.

Sandbox products also face challenges around the speed of analysis. They need to boot up (or resume) virtual machines and perform analysis for every individual file. If you could prepare virtual machines with every proper method and handle anti-sandbox techniques correctly, then the effectiveness of sandboxes could hit the roof, but the price of the required hardware resources, the required time of analysis and the continuous maintenance requirement makes this unfeasible.

What about the traditional scanning engines? In short, traditional scanning engines are the workhorses of malware detection. These products are designed to protect web and email traffic and to scan each file opened on a system; in all of these cases, response time is critical and customers expect maintenance-free products. Given these primary uses, traditional scanning engines focus on providing a binary result (is this malware or not) rather than detailed analysis, even though most are using emulation techniques, like sandboxes do, for heuristic detection purposes.

As you might have gathered from my story, scan engine developers had to make some simplifications during the emulation process in order to provide an acceptable response speed. These simplifications can lead to information being lost during the analysis, and in the end it's easy to miss something that would be critical for detecting a specific new malware.

In short, dynamic analysis products go much deeper in their efforts to detect each new threat, but as a result are both time- and resource-intensive. Traditional scanning engines provide a plug-and-play solution with real-time protection, but as a result they do not provide the sort of detailed analysis that can uncover outbreaks.

Combining technologies

Customers always want to achieve 100% detection to protect their businesses against threats, and combining dynamic analysis products with traditional anti-virus scanning engines helps towards that goal. Customers can deploy traditional anti-malware products to catch the majority of malware, and deploy a dynamic analysis solution to focus on APTs and other special malware. This combination is very effective in most cases, however it can also be resource-intensive and requires regular maintenance tasks to keep detection rates up.

Another combination of technologies is to make the best of the proactive detection rate of scanning engines. According to third-party tests, individual scan engines can achieve 10%-90% detection rates for unknown malware. What if we scan for malware with lots of scanning engines? Each engine has its own strengths and weaknesses, due to the fact that every vendor has a different implementation for a similar problem, so the detection rate of the combination of products is superior to that of the individual scanning engines. The solution still requires minimal maintenance on the customer side, even if we outsource the problem of multiple scan engine licensing, updating and integration. The scanning speed of multi-engine solutions can be much faster than that of the dynamic analysis/scan engine combination, but it depends on the number of engines and the vendors included in the multi-scanning solution. In this case, a trade-off is made between scanning speed and detection capabilities — 30+ scan engines may take longer than a single scan-engine-plus-sandbox combination, but no one can beat the detection capabilities of this number of scanning engines.

Posted on 19 August 2015 by Martijn Grooten

twitter.png
fb.png
linkedin.png
hackernews.png
reddit.png

 

Latest posts:

VB2019 paper: APT cases exploiting vulnerabilities in region-specific software

At VB2019, JPCERT/CC's Shusei Tomonaga and Tomoaki Tani presented a paper on attacks that exploit vulnerabilities in software used only in Japan, using malware that is unique to Japan. Today we publish both their paper and the recording of their…

New paper: Detection of vulnerabilities in web applications by validating parameter integrity and data flow graphs

In a follow-up to a paper presented at VB2019, Prismo Systems researchers Abhishek Singh and Ramesh Mani detail algorithms that can be used to detect SQL injection in stored procedures, persistent cross-site scripting (XSS), and server‑side request…

VB2020 programme announced

VB is pleased to reveal the details of an interesting and diverse programme for VB2020, the 30th Virus Bulletin International Conference.

VB2019 paper: Cyber espionage in the Middle East: unravelling OSX.WindTail

At VB2019 in London, Jamf's Patrick Wardle analysed the WindTail macOS malware used by the WindShift APT group, active in the Middle East. Today we publish both Patrick's paper and the recording of his presentation.

VB2019 paper: 2,000 reactions to a malware attack – accidental study

At VB2019 cybercrime journalist and researcher Adam Haertlé presented an analysis of almost 2000 unsolicited responses sent by victims of a malicious email campaign. Today we publish both his paper and the recording of his presentation.

We have placed cookies on your device in order to improve the functionality of this site, as outlined in our cookies policy. However, you may delete and block all cookies from this site and your use of the site will be unaffected. By continuing to browse this site, you are agreeing to Virus Bulletin's use of data as outlined in our privacy policy.