2011-03-01
Abstract
In this month's VBSpam test 18 out of 19 full solutions achieved VBSpam certification. Martijn Grooten has the details.
Copyright © 2011 Virus Bulletin
Those who are familiar with our VB100 tests, where the criteria to pass the test are to have a 100% catch rate combined with zero false positives, may wonder why we don’t have similar criteria for the VBSpam tests. The reason goes deeper than the simple fact that no product has ever achieved this.
All of the tens of thousands of spam messages sent as part of this test were originally sent to fictional addresses that have never belonged to a real person or organization (spam traps). Messages sent to these addresses are, by definition, unwanted: there is no one to have wanted them in the first place.
However, that does not mean that a handful of messages in the spam corpus may not appear rather legitimate: for instance, this month’s spam corpus contained a genuine invite to join Facebook, and a newsletter that looks as if it will be of genuine interest to some recipients.
So while the right thing to do with these messages would be to block them (as, ultimately, a spam filter should block what is unwanted), this may not be the best thing to do. Blocking Facebook invitations – the vast majority of which are legitimate and wanted – will likely lead to false positives. Blocking a newsletter, even if its sender did not adhere to best practices, may also lead to complaints from end-users.
We have always believed that avoiding false positives is more important than blocking as much spam as possible, and it is good to see there were no false positives for any of the products with the top seven final scores.
Still, we acknowledge that false positives may occur from time to time, and sometimes for a good reason. Some of the legitimate emails that were missed during this test contained URLs on domains frequently used by spammers. And while the best thing to do with such messages would be to allow them through to the user’s inbox, marking them as spam is certainly understandable. As long as it does not occur frequently, this should not prevent a product from winning a VBSpam award.
In this test 18 out of 19 full solutions (hailing from 13 different countries – a nice record showing the global nature of the fight against spam) achieved VBSpam certification, eight of which had no false positives.
The VBSpam test methodology can be found at http://www.virusbtn.com/vbspam/methodology/. As usual, email was sent to the products in parallel and in real time, and products were given the option to block email pre-DATA. Four products chose to make use of this option.
As in previous tests, the products that needed to be installed on a server were installed on a Dell PowerEdge R200, with a 3.0GHz dual core processor and 4GB of RAM. The Linux products ran on SuSE Linux Enterprise Server 11; the Windows Server products ran on either the 2003 or the 2008 version, depending on which was recommended by the vendor.
To compare the products, we calculate a ‘final score’, which is defined as the spam catch (SC) rate minus five times the false positive (FP) rate. Products earn VBSpam certification if this value is at least 97:
SC - (5 x FP) ≥ 97
The test ran for 16 consecutive days, from around 12am GMT on Saturday 5 February 2011 until 12am GMT on Monday 21 February 2011.
The corpus contained 140,800 emails, 137,899 of which were spam. Of these, 80,755 were provided by Project Honey Pot and 57,144 were provided by Abusix; in both cases, the messages were relayed in real time, as were the 2,901 legitimate emails – a record number for our tests. As before, the legitimate emails were sent in a number of languages to represent an international mail stream.
Figure 1 shows the average catch rate of all full solutions throughout the test. As before, to avoid the average being skewed by a poorly performing product, we excluded the highest and lowest catch rate for each hour.
In the previous test we noticed the somewhat surprising fact that spam sent in plain text appeared to be significantly more difficult to filter than spam with both HTML and text in the body or with a pure HTML body. This month’s results showed that this was still the case – though, again, we stress that this does not necessarily mean that the messages are difficult to filter because of the plain text body. Still, developers looking into ways to improve their products’ performance may want to have a look at whether the filtering of text-only emails can be improved upon.
In this test, we again looked at the overall spam corpus and compared it with the sub-corpus of ‘difficult’ spam – defined as those messages missed by at least two different filters; the latter concerned slightly more than 1 in 42 messages.
This time we looked at the size of the emails (header plus body; in bytes) and how the distribution of sizes differed between the two corpora. Ordering the emails in both corpora by size, Figure 2 shows how quickly emails get bigger in each corpus.
The blue line corresponds to the full corpus, the red one to that of more difficult spam. The graph, with the bytes shown on the vertical axis in logarithmic scale should be read as follows: (exactly) 80% of all spam (blue line) was less than 3,072 bytes in size, while 80% of the difficult spam had a size of less than 7,222 bytes. In the latter corpus, just 57% was smaller than 3,072 bytes.
What the graph shows is that both very small and very large spam messages are harder to filter. It would be wrong to conclude that this difficulty in filtering is a consequence of the message size, but it is nevertheless something developers may want to keep in mind when trying to improve their products.
SC rate: 99.83%
FP rate: 0.00%
Final score: 99.83
Project Honey Pot SC rate: 99.84%
Abusix SC rate: 99.82%
The increase in the global volume of spam that was seen just after the beginning of the year would not have been felt by the customers of AnubisNetworks, as the product’s spam catch rate improved significantly too. Thankfully, this improvement was without any false positives, and with the third highest final score the product wins its fifth consecutive VBSpam award.
SC rate: 99.78%
FP rate: 0.00%
Final score: 99.78
Project Honey Pot SC rate: 99.78%
Abusix SC rate: 99.78%
A participant since the very first test, BitDefender continues to be the only product to have won a VBSpam award in every test. Despite the product’s impressive track record, its developers are not ones for resting on their laurels, as was demonstrated by an improvement in the product’s final score for the third time in a row. With no false positives, the product achieved the fifth highest final score and another well deserved VBSpam award.
SC rate: 99.62%
FP rate: 0.00%
Final score: 99.62
Project Honey Pot SC rate: 99.43%
Abusix SC rate: 99.89%
I was pleased to see eleven’s eXpurgate return to the test, having been absent since July 2010. Its developers explained that, for them, reducing false positives is an absolute priority and they try to achieve this, among other things, by classifying email into several categories: ham, spam, bulk, almost empty etc. Our test only distinguishes between ham and spam – as do many end-users – but this was not a problem for the product. It did not miss a single legitimate email and it won a VBSpam award with a very decent and improved final score.
SC rate: 99.82%
FP rate: 0.00%
Final score: 99.82
Project Honey Pot SC rate: 99.81%
Abusix SC rate: 99.85%
It is a good thing if a product performs well several tests in a row, and even better when it also manages to improve its performance. In this test, FortiMail both increased its spam catch rate slightly and eliminated false positives. The appliance saw its final score improve for the third time in a row and easily wins its 11th VBSpam award.
SC rate: 99.71%
FP rate: 0.00%
Final score: 99.71
Project Honey Pot SC rate: 99.72%
Abusix SC rate: 99.69%
Halon, a Swedish company, offers three spam-filtering solutions: a hosted solution, a hardware appliance and a virtual appliance; we tested the latter.
The product was installed easily on VMware and setting it up was equally straightforward. A nice, intuitive web interface lets the administrator change settings where necessary and I was particularly charmed by the fact that the product uses its own scripting language that can be used to fine-tune its performance. This will give more tech-savvy sysadmins the possibility to add their own rules to tailor the product specifically for their organization’s particular needs.
Of course, a user-friendly interface is only meaningful if the product itself performs well, but that was certainly the case here. With a good spam catch rate and no false positives at all, Halon Mail Security’s final score is among the highest in this test and earns the product a well-deserved VBSpam award.
SC rate: 98.32%
FP rate: 0.00%
Final score: 98.32
Project Honey Pot SC rate: 98.75%
Abusix SC rate: 97.73%
Kaspersky’s spam catch rate was significantly lower this month than in the previous test thanks to a run of bad days during the test, when the product’s performance dropped quite a bit (as can be seen from the high standard deviation in the table below). Thankfully, there was room for this, and as there were no false positives, the product ended up winning its tenth VBSpam award.
SC rate: 99.89%
SC rate pre-DATA: 98.34%
FP rate: 0.00%
Final score: 99.89
Project Honey Pot SC rate: 99.84%
Abusix SC rate: 99.95%
Libra Esva achieved the second highest final score in the previous test – and managed to repeat the achievement this month, once again combining a very high spam catch rate with a zero false positive rate. The Italian company earns its sixth VBSpam award in as many tests.
SC rate: 99.92%
FP rate: 0.21%
Final score: 98.88
Project Honey Pot SC rate: 99.87%
Abusix SC rate: 99.99%
As in the previous test, McAfee’s Email Gateway appliance achieved the second highest spam catch rate. Unfortunately, some of the domains that were seen in legitimate emails but which are also frequently used in spam messages, caused the product to generate a handful of false positives. Still, with a decent final score, the product earns its tenth consecutive VBSpam award.
SC rate: 99.20%
FP rate: 0.48%
Final score: 96.79
Project Honey Pot SC rate: 99.16%
Abusix SC rate: 99.27%
With 14 false positives, McAfee’s second appliance had the joint highest false positive rate. This would not automatically have denied the product a VBSpam award, but with a spam catch rate that was slightly below average, the product’s final score fell below the threshold of 97 and no certification can be awarded.
SC rate: 99.70%
FP rate: 0.48%
Final score: 97.29
Project Honey Pot SC rate: 99.63%
Abusix SC rate: 99.81%
MessageStream’s hosted solution missed a relatively high number of legitimate emails, but its spam catch rate was high enough to make up for that. There is certainly room for improvement, but in the meantime, MessageStream wins its 11th VBSpam award.
SC rate: 99.99%
FP rate: 0.14%
Final score: 99.30
Project Honey Pot SC rate: 99.99%
Abusix SC rate: 99.98%
OnlyMyEmail’s MX-Defender continues to amaze me with its spam catch rate, which this time saw it miss less than one in 8,500 spam emails. Unlike in the previous test, there were a few false positives this time, which reduced the final score, but it was still a very decent score and wins the product its third VBSpam award.
SC rate: 99.91%
FP rate: 0.00%
Final score: 99.91
Project Honey Pot SC rate: 99.87%
Abusix SC rate: 99.97%
Sophos’s previous test result contains a lesson for all users of spam filters. The product was set up to tag spam emails with an ‘X-Spam: yes’ header. What looked like and counted as its single false positive in the previous test was actually an email that already contained this header – probably added by an over-zealous outbound filter. While for products in our tests it shows the importance of adding more unique headers, all users of spam filters should know that even a very good spam filter can perform suboptimally because of a minor tweak in its settings.
And Sophos Email Appliance is a very good spam filter, as demonstrated in the current test. The third highest spam catch rate combined with no false positives at all gave it the highest final score and the product’s developers in the UK and Canada should consider themselves the winners of this test.
SC rate: 98.42%
FP rate: 0.21%
Final score: 97.39
Project Honey Pot SC rate: 99.06%
Abusix SC rate: 97.52%
SPAMfighter's developers will be keen to learn the reason for their product’s reduced spam catch rate – which they no doubt will be a little disappointed with. This was mostly due to a large number of missed spam in the Abusix corpus. Ultimately, they will be pleased to learn that the spam catch rate was still high enough that even a handful of false positives did not get in the way of them winning their ninth VBSpam award..
SC rate: 99.90%
FP rate: 0.24%
Final score: 98.69
Project Honey Pot SC rate: 99.94%
Abusix SC rate: 99.83%
SpamTitan continues to score one of the highest spam catch rates in the tests, and users of the virtual appliance will find few spam emails in their inboxes. In this test, the high SC rate came at the expense of seven blocked legitimate emails. While these are seven too many, they were not enough to prevent the product from winning yet another VBSpam award.
SC rate: 99.87%
FP rate: 0.07%
Final score: 99.52
Project Honey Pot SC rate: 99.84%
Abusix SC rate: 99.91%
Two related legitimate emails were missed on the first day of the test and these got in the way of Symantec’s virtual appliance achieving an even better final score. The product’s developers can console themselves with the fact that their final score was the highest among those products that had false positives, and they can add yet another VBSpam award to their unbroken series.
SC rate: 99.70%
SC rate pre-DATA: 99.15%
FP rate: 0.21%
Final score: 98.67
Project Honey Pot SC rate: 99.59%
Abusix SC rate: 99.86%
One of the rules in this test is that we count no more than four false positives per sending IP address. We believe that, in practice, multiple blocked messages from the same legitimate sender will either result in the sender finding a different way to communicate their message, or the recipient will adjust their filter – for instance by whitelisting the sender’s address.
This explains why The Email Laundry, which missed a few dozen legitimate emails from one sender, only scored six false positives in this test and thus easily won its sixth VBSpam award. More important was the fact that, well before the end of the test, and before we had had a chance to give the developers feedback on the product’s performance, emails from the sender in question were being accepted.
SC rate: 99.83%
FP rate: 0.17%
Final score: 98.96
Project Honey Pot SC rate: 99.76%
Abusix SC rate: 99.92%
Vade Retro Center scored its highest spam catch rate to date, and while it was sad to see that the number of false positives increased to five, a sixth consecutive VBSpam award was easily won by the product.
SC rate: 99.10%
FP rate: 0.00%
Final score: 99.10
Project Honey Pot SC rate: 98.87%
Abusix SC rate: 99.42%
For the fourth time in six tests, ORF did not miss a single legitimate email – a unique achievement among full solutions in this test. Even if the spam catch rate isn’t quite as high as that of some other products, the customers of the Hungarian-developed product will have little reason to look in their spam folders, and that may be just as important. A sixth VBSpam award is thus well deserved.
SC rate: 99.87%
SC rate pre-DATA: 76.11%
FP rate: 0.28%
Final score: 98.49
Project Honey Pot SC rate: 99.90%
Abusix SC rate: 99.82%
Once again, Webroot’s hosted anti-spam solution blocked the vast majority of spam emails, significantly reducing its customers’ need for bandwidth. Unfortunately, there were a number of false positives which meant the product achieved a slightly lower final score, but it still performed well enough to earn its 11th consecutive VBSpam award.
SC rate: 98.72%
SC rate pre-DATA: 97.97%
FP rate: 0.00%
Final score: 98.72
Project Honey Pot SC rate: 98.62%
Abusix SC rate: 98.85%
It is hard to think of spam filtering without the use of DNS blacklists and, thinking about those, it is hard to ignore Spamhaus. Another good performance showed that there is a reason for this: the ZEN IP-based blacklist blocked just short of 98% of all spam, while subsequent scanning of email bodies against the DBL domain blacklist blocked over one third of the remaining emails. Both were without false positives and the product – which is only a partial solution – won its eighth VBSpam award in as many tests.
It was pleasing to award VBSpam certification to 19 products this month and to find out that the stricter threshold we introduced in the previous test does not pose too many difficulties for the products. A number of new products are expected to join the field for the next test and they will have their work cut out if they are to match the standards set by the current entrants.
In the introduction to this review, I mentioned the difference between the ‘right’ decision and the ‘best’ decision for a spam filter to make on a particular email, and how these two usually, but not always, coincide. To overcome this problem, users and system administrators might want to whitelist or blacklist certain senders, IP addresses and domains. In future tests, we hope to be able to verify whether products have this option available and whether it works.
Performance tables from this test, and each of the 11 previous tests can be viewed on the redesigned VBSpam website at http://www.virusbtn.com/vbspam.
Later in March I will be discussing the subject of testing messaging filters in a slightly broader context at the eCrime Researchers Sync-Up, organized by the Anti-Phishing Working Group. Details of the event, which will be held 14–15 March in Dublin, can be found at http://www.ecrimeresearch.org/2011syncup/agenda.html.
The next VBSpam test will run in April. Developers interested in submitting their products should contact me on [email protected].