Anti-spam testing: frequently given answers

2009-02-01

Martijn Grooten

Virus Bulletin, UK

Editor: Helen Martin

Abstract

Martijn Grooten answers some of the common queries raised by vendors about the proposed test set-up for VB's upcoming anti-spam comparative testing.

Table of contents


Filtering here or filtering there?
Learn to get better
Pro-active filtering methods
What happens next?

Last month I outlined the proposed test setup for VB’s comparative anti-spam tests (see VB, January 2009, p.S1). Following the publication of the article we received a lot of feedback from vendors, researchers and customers alike. It is great to see so much interest in our tests, and even better to receive constructive comments and suggestions.

Of course, several queries have been raised about our proposals – this article answers three of the most commonly asked questions.

Filtering here or filtering there?

For customers who want to buy an anti-spam solution for their incoming email – generally embedded into a larger email suite – the choice is not simply one of comparing different vendors. They could choose a product that can be embedded into an existing mail server, or one that is a mail server in itself – in which case there is a further choice between products that come with their own hardware and products that need to be installed on an existing operating system. But there are also products where both email filtering and mail hosting take place at the vendor’s server; such products, labelled ‘Software as a Service’ (SaaS), are becoming increasingly popular.

Many vendors have asked whether our test will be able to accommodate SaaS products. The answer is yes – since the two major test criteria, the false positive rate and the false negative rate, can be measured for each of the product types mentioned above and can also be compared amongst them.

Of course, there are other metrics that describe a product’s performance, not all of which apply to all types of product. For instance, the average and maximum CPU usage of a product are important measures for those that need to be installed on the user’s machine, but are of little or no importance for products that provide their own hardware or are hosted externally. As a result, we aim to measure these aspects of performance in products for which they are relevant, but the measurements will not be part of the certification procedure.

Learn to get better

One of the properties of spam is that it is indiscriminate; one of the properties of ham is that it is not. A classic example is that of pharmaceutical companies – whose staff might have legitimate reasons for sending and receiving email concerning body-part-enhancing products, but may find such email content blocked by spam filters. Many spam filters, however, are not indiscriminate and can learn from feedback provided by the end-user. Some filters even rely solely on user feedback: by default, all email messages have a spam probability of 0.5 and by combining user feedback with, among other things, Bayesian and Markovian methods, the product will ‘learn’ what kind of emails are unwanted and should be filtered as spam.

However, for a number of reasons, we have decided to test all products out-of-the-box using their default settings and not to provide filters with any user feedback.

Firstly, providing feedback would complicate our test setup. In the real world, feedback is delivered to a learning filter whenever the user reads their email, which is generally multiple times during the day. In our setup, the ‘golden standard’ will be decided upon by our end-users at their leisure (meaning they do not have to make classification decisions under pressure, thus minimizing mistakes), so our feedback would not be representative of a real-world situation.

Secondly, the performance of a learning filter as perceived by the user will not depend solely on its ability to learn from user feedback, but at least as much on the quality of the feedback given. If deleting a message is easier/less time-consuming than reporting it as spam, users might just delete unwanted email from their inbox; messages that are wanted but do not need to be saved might be read in, but not retrieved from the junk mail folder; the ‘mark as spam’ button might be used as a convenient way of unsubscribing to mailing lists. The quality of the feedback given thus depends on the end-user’s understanding of how to provide feedback, as well as the ease with which they can provide it. We do not currently believe we can test this in a fair and comparable way. Of course, we will continue to look for possible ways to include learning filters in our tests.

Pro-active filtering methods

A wide range of anti-spam measures are based on the content of the email or the context in which it was sent, and most filters use a combination of such measures. However, many filters also take a more pro-active approach, where they try to frustrate the spammers, for instance by delaying their response to SMTP commands (‘tarpitting’) or by temporarily refusing email from unknown or unverifiable sources (‘greylisting’).

Such methods assume that legitimate senders will keep trying to get the message delivered, while many spammers will give up: apart from the fact that mail agents used by spammers are often badly configured, the spammers’ economic model is based on being able to deliver a large volume of messages in a short period of time and it will generally not be viable for them to keep trying.

From the receivers’ point of view, these methods are as good as any other to stop spam, but with two major drawbacks. Firstly, greylisting could cause significant delays to the delivery of some legitimate email, which could be disadvantageous in a business environment. Secondly, any such pro-active anti-spam method could result in false positives that are impossible to trace – which, again, is undesirable for a business that wants to be able to view all incoming emails, even those classified initially as spam.

Such methods also cause a problem for the tester: the efficiency of an anti-spam method can only be tested if both the spam catch rate and the false positive rate can be measured. This is impossible with pro-active methods, since these ‘block’ email before it is sent. This is one of the reasons why we will not be able to test against such methods with the setup that uses our own email stream.

We realize that this will be a problem for products that make extensive use of these methods, and as a compromise we are looking for ways to expose all products to the email stream sent to a spam trap, which is (almost) guaranteed to be spam only. Of course, this will not solve the problem of testing for false positives.

What happens next?

We will be running a trial test this month. During the trial it is possible (indeed probable) that the test configuration will be changed. The results, therefore, may not be representative of those that would have been derived from a real test. For this reason, we intend to publish the results of the trial without specifying which products achieved them.

The first real test will start towards the end of March; vendors and developers will be notified in due course of the deadline and conditions for submitting a product.

As always, we welcome comments, criticism and suggestions – and will continue to do so once the tests are up and running. Our goal is to run tests in which products are compared in a fair way, and which will produce results that are useful to end-users. Any suggestions for better ways in which our tests could achieve these goals will be given serious consideration (please email [email protected]).

Latest articles:

Nexus Android banking botnet – compromising C&C panels and dissecting mobile AppInjects

Aditya Sood & Rohit Bansal provide details of a security vulnerability in the Nexus Android botnet C&C panel that was exploited to compromise the C&C panel in order to gather threat intelligence, and present a model of mobile AppInjects.

Cryptojacking on the fly: TeamTNT using NVIDIA drivers to mine cryptocurrency

TeamTNT is known for attacking insecure and vulnerable Kubernetes deployments in order to infiltrate organizations’ dedicated environments and transform them into attack launchpads. In this article Aditya Sood presents a new module introduced by…

Collector-stealer: a Russian origin credential and information extractor

Collector-stealer, a piece of malware of Russian origin, is heavily used on the Internet to exfiltrate sensitive data from end-user systems and store it in its C&C panels. In this article, researchers Aditya K Sood and Rohit Chaturvedi present a 360…

Fighting Fire with Fire

In 1989, Joe Wells encountered his first virus: Jerusalem. He disassembled the virus, and from that moment onward, was intrigued by the properties of these small pieces of self-replicating code. Joe Wells was an expert on computer viruses, was partly…

Run your malicious VBA macros anywhere!

Kurt Natvig wanted to understand whether it’s possible to recompile VBA macros to another language, which could then easily be ‘run’ on any gateway, thus revealing a sample’s true nature in a safe manner. In this article he explains how he recompiled…

Bulletin Archive