Repercussions of dynamic testing

2008-12-01

Roel Schouwenberg

Kaspersky Lab, USA
Editor: Helen Martin

Abstract

Roel Schouwenberg highlights one potential consequence of the attention currently being devoted to dynamic testing: the possibility that the increased focus on dynamic testing may inspire malware authors to devote more attention to circumventing products’ protection capabilities, rather than just their detection abilities.


Anyone who is remotely involved in the anti-malware industry will know that testing is a hot topic – the subject has received a lot of attention lately, particularly following the formation of AMTSO, the Anti-Malware Testing Standards Organization, earlier this year.

This article is not intended to be the nth paper describing how testing should be performed. However, it will highlight one potential consequence of the attention currently being devoted to dynamic testing: the possibility that the increased focus on dynamic testing may inspire malware authors to devote more attention to circumventing products’ protection capabilities, rather than just their detection abilities.

This article will use a number of examples and scenarios to evaluate the risk associated with dynamic testing. It will also put forward a number of suggestions for ways in which testers can mitigate the risk.

Testing methods

Before evaluating the risks associated with dynamic testing we will first have a look at a number of other testing methods.

Static testing

Static testing is the most straightforward type of testing: an on-demand scan is run on a collection of malware. In order to produce meaningful results, any respectable static test these days must use a collection of malware containing thousands of samples, while the test sets used by testing bodies AV-Test and AV-Comparatives usually contain hundreds of thousands of samples and in some cases more than a million samples.

Since the test collections are so large, the results contain little (if any) useful detail that malware authors can use to help them fine-tune their creations.

Another thing to bear in mind when conducting static tests is the way in which the test collection is broken down. Certain less credible tests seem to have been performed on a big pile of unsorted malware samples – and although the tester may have internally enumerated the results of different sub test sets, this would still be considered bad testing practice.

Other, more credible tests make proper differentiation in their published test results. They sort their test collections into sub-sets – such as viruses, worms and trojans – and publish results for each sub-set. Some tests go even further and attempt to differentiate, for instance, between backdoors and spyware trojans.

Although this provides a greater level of detail , it still does not provide much useful information for the malware author. The only possible risk from static tests comes from those that look at how well products detect polymorphic/metamorphic malware. Such tests may highlight weaknesses in certain products.

However, polymorphic/metamorphic malware is inherently more difficult to detect than static malware – and a malware author who needs the results of a test to find this out is most likely not competent enough to write such malware anyway. Having said that, someone who is not competent enough to write a polymorphic piece of malware can now go to the underground and either buy such a piece of code or advertise for someone to create it.

Response time testing

Although response time testing is carried out only rarely these days, it is still worth looking at. It was most popular during the era of the big email worm outbreaks – NetSky, Bagle, Mydoom, Sober and Sobig are classic examples of that period.

In contrast to static testing, the size of the sample sets used for response time testing are very small. One type of response time testing measures the overall performance of AV vendors based on a larger, though still relatively small, test set [1] – this type doesn’t show the results for individual samples. The second type of testing measures detection on a per-sample basis [2].

Such specific, per-sample results could be a clear aid to malware authors. One can speculate that the published response time test results may have led certain malware authors to change their approach to make detection of their malware harder. One example is that of W32/Sober.K [3], which appended random garbage at the end of its file during installation in a deliberate attempt to slow down AV detection. It is quite possible that the author introduced this functionality after being unhappy with the response time results of earlier Sober variants.

Today’s response time testing results are published in a more generic fashion with little reference to specific samples. The risk of malware authors gaining too much information is therefore very low.

Retrospective testing

In retrospective testing, an out-of-date product is tested against up-to-date malware samples – but, other than its purpose (to investigate the product’s ability to detect unknown samples) and the age of the updates, retrospective testing is no different from static testing.

The level of risk that retrospective tests introduce is very low, just like regular static tests.

Dynamic testing

In dynamic testing, malware samples are introduced into the test system with the intent to execute them. Only a small number of samples can be used in these tests because the process of executing each one is very time consuming. AMTSO has published a document which explains the idea behind dynamic testing in greater detail [4].

Ideally, the samples are introduced onto the test system in the ‘right’ way – for instance via drive-by download. Even when automated, this is a very time-consuming task, and the fact that virtual machines need to be avoided in order to obtain valid test results does not help the matter.

The number of samples included in tests is currently in the dozens. In time, with better hardware and optimized processes, we can expect the number of samples being used to reach the hundreds. However, the risk to which the industry is exposed with the introduction of dynamic testing is far greater than that associated with any of the other popular testing methods.

A large part of the industry is spending a lot of time on educating people about (new) proactive technologies, and AMTSO has put a lot of work into compiling a best practices paper for dynamic testing [1]. What is collectively being said is that the focus of testers on products’ detection rates alone is outdated, and testers now need to consider their protective measures as well.

There’s little doubt that the malware authors are also listening.

Times change

Some five years ago, Symantec incorporated a protection feature called ‘anti-worm’ into its Norton products. This was a behavioural system used to catch email worms proactively. Symantec expected to have to update the technology no more than six months after its introduction in order for it to remain effective against evolving malware. However, the developers have never had to update it [5]. Malware authors either did not know about the technology, didn’t bother to circumvent it, or were unable to.

In May 2006, Kaspersky Lab launched its version 6 product line which featured a new-generation behaviour blocker. Six months later a patch had to be released for the behaviour blocker because new variants of the LdPinch trojan family were bypassing the technology that had initially been capable of catching them.

What was the difference between the two behaviour blockers? ‘Anti-worm’ was introduced in the era of big malware epidemics, driven mostly by fame-seeking authors. By 2006 the majority of malware out there was being driven by profit, including LdPinch. Another thing to bear in mind is that Kaspersky Lab is a Russian company and LdPinch was a Russian creation, targeted mainly at the Russian market.

However, there could be a third explanation for the difference (though it should be noted that the author of this article is by no means a marketing expert): it would seem that Kaspersky Lab invested more effort into the marketing of its behaviour blocker than Symantec did at the time it introduced ‘anti-worm’, thus malware authors were more aware of the Kaspersky product and made the effort to circumvent it.

Multi-scanners

Multi-scanners are another interesting demonstration of malware authors using legitimate services to gain information for their own purposes. Online multi-scanners enjoy great popularity these days, with JottiScan [6] and VirusTotal [7] being the most popular.

These websites provide a service whereby the user can upload a file and have it scanned by a whole range of scanners to see what the various products detect it as (and if they detect it at all).

These services also enjoy some popularity with malware authors who use them to test their latest creations and see whether they successfully avoid detection. While some anti-malware vendors provide the multi-scanner sites with their most recent scanners and ask the maintainer to use the most paranoid settings to detect as much malware as possible, there are also vendors who don’t offer their most recent scanner to these sites. They may also ask the maintainer to use lower than maximum settings, so the product will detect less malware than it is capable of in real-world use [8], thus not revealing its true detection capabilities.

Repercussions from the underground

These days malware that bypasses protection features is by no means rare. However, the vast majority of malware authors still focus solely on bypassing detection mechanisms.

Some anti-malware products have little in the way of protection measures, and bypassing their detection capabilities still means bypassing the entire product. Therefore it’s not so strange to see Win32 PE malware samples that are obfuscated in such a way that de-obfuscation takes roughly two minutes on a Core 2 Duo running at 2,500 MHz. However, the same malware samples can be detected proactively using behaviour blocker technology from two years ago [9].

There’s little doubt that the current noise regarding protection technologies and dynamic testing is causing some malware authors to pay extra attention to these types of technology. With the current buzz surrounding the topic, it’s likely that the interest of more malware authors will be piqued.

There are a number of scenarios likely to occur: firstly, it’s likely that new groups will form in the underg round that will focus on providing the means for malware to circumvent protection technologies. Secondly, there may be a new market for improved multi-scanners which will test products’ protection technologies as well as their detection abilities.

The big problem with malware bypassing protection technologies is the matter of fixing the holes the malware authors are exploiting. While vendors can ship a new signature update in hours or even minutes, fixing holes takes much longer – we’re talking about a response time of week(s) rather than days, let alone hours.

What can testers do?

Revealing per-sample test result details is a much more dangerous idea with dynamic testing than it is with response time testing. While there is a low-to-moderate risk in revealing too much detail with response time testing, the risk is very high with dynamic testing.

Having a limited sample set for testing means that the samples tested need to be very relevant. If testers are going to publicize the results for each such important sample, including how individual products perform against them, then this is extremely valuable information for the malware authors. It will show them against which products they need to improve their creations.

Virus Bulletin is not yet publishing dynamic testing results, but plans to us e information from its (upcoming) prevalence table to pick samples for testing. While the testers will start out just by mentioning malware families, they may end up disclosing specific malware names as well [10].

AV-Comparatives is also not yet publishing dynamic testing results, but intends to publish the names of the samples being used for its future dynamic tests.

As pointed out, this approach should be avoided. AV-Test, which is already performing dynamic tests, takes a better approach. Magazines are prohibited from disclosing the malware names or hashes of files that were used in the test. However, AV-Test will share the hashes or samples with the AV vendors that participated in the test [10].

Though slightly less transparent for end-users, this approach is by far preferable in terms of risk mitigation, while also allowing for any vendor to notify the tester if they find that any non-relevant samples have been used in the test set.

Conclusions

The adoption of dynamic testing brings with it some new challenges. Now, more than ever, testing can have real consequences for malware authors and their actions.

Security vendors need to take care that in their quest for education they do not lose sight of what really is important: the protection of users.

Care must be taken to avoid a situation in which education speeds up malware evolution and causes more problems than solutions. AMTSO’s first rule of the fundamental principles of testing document states that testing must not endanger the public [4].

The attention on protection technologies and dynamic testing is inevitably leading to increased awareness on all parts, including that of malware authors. It will be up to the industry as a whole, possibly in the form of AMTSO, to minimize the risk and ensure that testers do not reveal too much detail in their public test results.

Bibliography

[2] AV-Test W32/Sober variant response time results.http://www.pcmag.com/ article2/0,2817,1813927,00.asp.

[3] F-Secure W32/Sober.K write-up. http://www.f-secure.com/v-descs/sober_k.shtml.

[5] Kennedy, M. personal communication.

[8] Bosveld, J.; Canto, J. personal communications.

[9] Exact name of the sample can be obtained by contacting the author of this article.

[10] Clementi, A.; Hawes, J.; Marx, A. personal communications.

twitter.png
fb.png
linkedin.png
hackernews.png
reddit.png

 

Latest articles:

Nexus Android banking botnet – compromising C&C panels and dissecting mobile AppInjects

Aditya Sood & Rohit Bansal provide details of a security vulnerability in the Nexus Android botnet C&C panel that was exploited to compromise the C&C panel in order to gather threat intelligence, and present a model of mobile AppInjects.

Cryptojacking on the fly: TeamTNT using NVIDIA drivers to mine cryptocurrency

TeamTNT is known for attacking insecure and vulnerable Kubernetes deployments in order to infiltrate organizations’ dedicated environments and transform them into attack launchpads. In this article Aditya Sood presents a new module introduced by…

Collector-stealer: a Russian origin credential and information extractor

Collector-stealer, a piece of malware of Russian origin, is heavily used on the Internet to exfiltrate sensitive data from end-user systems and store it in its C&C panels. In this article, researchers Aditya K Sood and Rohit Chaturvedi present a 360…

Fighting Fire with Fire

In 1989, Joe Wells encountered his first virus: Jerusalem. He disassembled the virus, and from that moment onward, was intrigued by the properties of these small pieces of self-replicating code. Joe Wells was an expert on computer viruses, was partly…

Run your malicious VBA macros anywhere!

Kurt Natvig wanted to understand whether it’s possible to recompile VBA macros to another language, which could then easily be ‘run’ on any gateway, thus revealing a sample’s true nature in a safe manner. In this article he explains how he recompiled…


Bulletin Archive

We have placed cookies on your device in order to improve the functionality of this site, as outlined in our cookies policy. However, you may delete and block all cookies from this site and your use of the site will be unaffected. By continuing to browse this site, you are agreeing to Virus Bulletin's use of data as outlined in our privacy policy.