Virus Bulletin :: VBSpam methodology

Overview

Introduction

VBSpam is a continuously running performance test programme for email security solutions. Products can participate either publicly or privately in the programme; this methodology documents Virus Bulletin’s conduct in both cases.

Test objective

The public VBSpam tests seek to measure and compare the performance metrics of the publicly tested email security solutions. Recognizing overall excellence, participating products can also earn the VBSpam or VBSpam+ award upon meeting the relevant criteria.

Outside the public test periods (and for privately tested products), VBSpam seeks to provide the vendor with continuous feedback about the performance of their product.

Suitable products

VBSpam is designed to accommodate a range of products, including cloud/hosted solutions and on-premises solutions (installable, virtual appliances or physical appliances), as long as the product can accept, filter and return/forward emails from our sources through SMTP. In certain cases, VBSpam can also accept APIs or other complementary solutions in the test.

The test sorts solutions into two distinct categories, with slightly different rules applicable to each:

Full email security solutions (such as email security gateways, spam filters) that intend to provide a comprehensive coverage of the email threat landscape.
Complementary email security solutions, with limited coverage by design, e.g. DNSBLs or APIs.

As most performance metrics between these categories compare poorly, the test only compares products within the same category.

Test outline

The test exposes each tested product to both unwanted and legitimate emails, and records the product’s response to these emails. During this testing, Virus Bulletin also collects and publishes speed / throughput statistics for full email security solutions.

Test cases

VBSpam utilizes a variety of email sources for test cases:

Spam emails

Third-party real-time email feeds: a typical example would be Project Honey Pot and other similar services.
Virus Bulletin’s own threat intelligence: emails collected through our own spam traps and through other means.

Legitimate emails

Ham emails: email discussion list emails..
Newsletters: both commercial and non-commercial opt-in newsletters.

All emails used in the test are in the wild. Virus Bulletin does not create new emails, e.g. to simulate spear-phishing tactics. Some modifications to the in-the-wild emails are necessary to facilitate testing and to protect intellectual property, as detailed later on in this document.

Emails are forwarded to the tested product without undue delay upon receipt by our threat intelligence, in order to stay as close to real time as possible.

The legitimate emails used are predominantly written in English, whereas unwanted emails represent a wide variety of languages.

Note that full solutions and complementary solutions may be subjected to a slightly different mix of emails when there is a potential conflict of interest (for example, if the vendor that supplies the email feed also has its products publicly tested).

Emails are sorted into the following categories and subcategories:

Legitimate emails

Ham
Newsletters

Spam emails

Assorted: any unwanted email that does not fit into any of the more specific categories below.
Phishing: Phishing emails are spam emails containing a link that leads either to malware or to an attempt to steal credentials.
Malware: Malware emails are those with an attachment that is either malware itself or would likely download malware. It is possible that a password (often present in the email) would need to be entered in order for the malware to be executed.
Potentially unwanted emails: Potentially unwanted emails (to be read as ‘merely unwanted’) are emails that were sent to spam traps and where the content suggests that, while the recipient may not want it, the same email might be wanted by others. These form part of the spam category but are counted with a weight of 20% in the test.

Speed measurements

For each email sent to the tested products, the ‘speed’ (or ‘delay’) is measured as the time it takes for the email to be returned from the product: the counter starts when the delivery of the email starts and stops when the returned email is accepted. The speed is only relevant for emails from the ham corpus and is used to indicate whether products delay the delivery of emails, which would give them an advantage in spam filtering.

Because small delays don’t make a significant difference to the end-user's perspective, the reports and feedback only include the delays measured at the 10%, 50%, 95% and 98% percentiles, and these are indicated as being: less than 30 seconds (green); between 30 seconds and 2 minutes (yellow); between 2 minutes and 10 minutes (orange); and more than 10 minutes (red).

	(green) = up to 30 seconds
	(yellow) = 30 seconds to two minutes
	(orange) = two to ten minutes
	(red) = more than ten minutes

Emails may be delayed by network issues or transient SMTP errors (4xx). In such cases, delivery will be attempted six times, at 20-minute intervals. The delay introduced by this is added to the measured ‘speed’.

Note that speed measurements are only collected for full email security solutions.

Test results

The products’ responses are referenced against the respective body of the test cases and are sorted into the following categories:

True positive: unwanted test case identified as unwanted (e.g. spam)
True negative: legitimate test case identified as legitimate (legitimate email treated as such)
False positive: legitimate test case identified as unwanted (false alarm)
False negative: unwanted test case identified as legitimate (unwanted email missed)

Award criteria

Publicly tested full email security solutions may earn the VBSpam or the VBSpam+ award.

The VBSpam award signifies that the product demonstrates a good overall detection rate with a very small number of false positives, and also a good and consistent throughput.

The VBSpam+ award signifies excellence on the same metrics.

Eligibility for the award depends on a variety of metrics, most importantly the final score, which is calculated based on the weighted false positive rate and the spam catch rate. The tested product must also be a full email security solution, as complementary solutions do not currently receive either of the awards, even when they perform comparably.

The weighted false positive rate is the ratio of legitimate emails misclassified as spam to the total number of legitimate emails. Emails from the “newsletter” subcategory are assigned a lower weight as well as a cap, as emails in this subcategory are more ambiguous than ham emails.

Definitions:

<ham-total>: Total number of emails in the “ham” subcategory of the legitimate email corpus.
<newsletters-total>: Total number of emails in the “newsletter” subcategory of the legitimate email corpus.
<ham-false-positives>: Total number of false positive emails in the “ham” subcategory.
<newsletter-false-positives>: Total number of false positives in the “newsletter” subcategory.

FORMULA:
<weighted-false-positive-rate> = 
(<ham-false-positives> + 0.2 * min(<newsletter-false-positives> , 0.2 * <newsletters-total>)) / 
(<ham-total> + 0.2 * <newsletter-totals>)

The spam catch rate is calculated in a similar manner, with a lower weight assigned to “potentially unwanted” emails due to their ambiguous nature.

Definitions:

<spam-total>: Total number of spam emails.
<potentially-unwanted-total>: Total number of emails in the “potentially unwanted” subcategory of spam emails.
<false-negatives-total>: Total number of false negatives in the spam corpus.
<potentially-unwanted-false-positives>: Number of false negatives in the “potentially unwanted” category of spam emails.

FORMULA:
<spam-catch-rate> = 1 - <weighted-false-negative-rate>

<weighted-false-negative-rate> = 
(<false-negatives-total> - <potentially-unwanted-false-negatives> + 0.2 * (<potentially-unwanted-false-negatives>)) / 
(<spam-total> - <potentially-unwanted-total> + 0.2* <potentially-unwanted-total>)

The final score for the product is calculated in a manner that gives a disproportionate weight to false positives, to recognize the damage such misclassifications can do.

FORMULA:
<final-score> = <spam-catch-rate> - (5 x <weighted-false-positive-rate>)

A product must meet or exceed the criteria below to earn the VBSpam award:

the value of its final score is at least 98, and,
the ‘delivery speed colours’ at 10% and 50% are green or yellow, and,
the ‘delivery speed colours’ at 95% are green, yellow or orange.

Outstanding performers can earn the VBSpam+ award, for which the criteria are defined as:

at least 99.5% spam catch rate, and,
no false positives at all in the legitimate email body, and,
no more than 2.5% false positives among the newsletters, and,
the ‘delivery speed colours’ at 10% and 50% are green, and,
the ‘delivery speed colours’ at 95% and 98% are green or yellow.

Testing procedure

Overview

The product lifecycle in VBSpam begins with an initial product setup, typically done in cooperation with the vendor.

This is followed by continuous testing for the designated testing period. For publicly tested products that join the test on a commercial basis, the designated testing period is approximately one year, during which four periods are designated as official testing periods. Data obtained from these days serve as the basis for public, comparative test results and certification.

Test environment for full email security solutions

Full email security solutions in the test can be:

Hosted by Virus Bulletin as a virtual machine
Hosted by Virus Bulletin as a physical appliance
Hosted by the vendor (e.g. cloud)

VM hosting happens on a Type I (bare metal) hypervisor. VM resources such as vCPUs, memory and storage are allocated as per the requirements of the product.

Unrestricted outbound internet access is provided.

Remote access to the products hosted by Virus Bulletin is provided throughout their participation. Unless otherwise agreed, vendors are expected to set up and configure their products in the test environment, with assistance from Virus Bulletin.

Test environment for complementary security solutions

Complementary solutions are set up and managed by Virus Bulletin, in a virtualized environment. VM resources are allocated as required to facilitate fair testing.

Unrestricted outbound internet access is provided. However, unlike full email security solutions, no unsupervised remote access to the test environment is available for the vendors. This is because complementary solutions typically require the help of a host MTA to work, which may have built-in anti-spam capabilities. Virus Bulletin deems it crucial to retain full supervision of the MTA configuration to prevent any potential noise and comparability issues arising from the vendor accidentally or purposefully changing the configuration of the underlying MTA.

Introducing test cases to the test environment

Test cases (emails) are sent continuously to the tested products.

Each email is delivered in an individual SMTP transaction. Virus Bulletin seeks to keep modifications to the original, in-the-wild version of the emails to a minimum, however a number of differences between the original and in-test emails are inevitable. Some of these changes are identical to having a front-end SMTP server that rewrites the recipient information:

SMTP connections will be made from an IP address belonging to Virus Bulletin, instead of the original sending IP.
The SMTP transaction HELO/EHLO domain will be overwritten by that of the actual sending host in the Virus Bulletin infrastructure. Note
that the domain will be preserved in the Received: header, as described later on in this section.
The SMTP transaction recipient (also known as the SMTP envelope recipient) will be rewritten to [email protected], where “user” is an ID assigned by Virus Bulletin.

The more invasive changes are:

Any references to the original spam trap within the email MIME will be replaced in the same manner. Note that this might break any digital signatures, most notably DKIM. This replacement affects both the MIME header and the MIME parts and it happens both at mailbox user level (e.g. the subject “Notice for <spam-recipient>” may become “Notice for <rewritten-user>”) and domain level (e.g. “Your mailbox at <spamtrap-domain> is suspended” may become “Your mailbox at vbspamtest.com is suspended”).
Ham emails – which commonly originate from mailing lists – will be re-engineered to appear as if they were sent to the vbspamtest.com domain directly.
Further X-VBSpam-* headers may be added to the email to facilitate testing.

Some metadata about the original email will be retained through a Received: header, again as if the email were received first by a front-end SMTP server. A single new Received: header will be inserted into the email, in the following fashion:

Received: from <original-reverse-dns> (HELO <original-helo-domain> [<originating-ip>]) by <vb-mta>
(<vb-mta-software>) with [E]SMTP id <vb-message-id>; <date>

where

<original-reverse-dns> is the reverse DNS of the original sending IP, if available, otherwise the literal “Unknown”.
<original-helo-domain> is the original HELO/EHLO command argument domain.
<vb-mta> is the FQDN of the Virus Bulletin SMTP server that received the email.
<vb-mta-software> is the standard (and cosmetic) label for the SMTP server used by Virus Bulletin.
<vb-message-id> is Virus Bulletin’s own message ID for identifying the email and the tested product (it is not the original Message-ID).
<date> is the date time and time of the receipt.

Additionally, if the tested product supports the XCLIENT command, the original client IP address as well as the HELO/EHLO domain will be passed on to the tested product.

Recording product response

A tested product receives test case emails from the VBSpam infrastructure and the product’s response is recorded. These responses are sorted into two categories:

“spam”, i.e. the product declared the email to be unwanted, or,
“ham”, i.e. the product either did not act on the email, or explicitly designated it to be a “wanted” email.

VBSpam supports a wide variety of ways to detect the product response. In addition to those outlined below, Virus Bulletin accepts custom methods at its discretion.

Full solutions will typically be set up in a manner to receive test case emails from Virus Bulletin through SMTP and to return the filtered emails to a specific Virus Bulletin server for examination through SMTP. By default, the following rules are applied to this scenario:

The email will be considered to be marked as ‘spam’ if:

An SMTP 5xx error occurs while the email is being sent to the product, or
In case the SMTP transaction fails due to repeated SMTP 4xx transient errors, host unavailability, transaction interruption and the retry attempts (up to six times, at 20-minute intervals) are exhausted, or
The email is not returned within a reasonable time frame, or
The returned email has a specific MIME header, or
The returned email has a specific subject tag.

The email will be considered to be marked as ham’ if:

Neither of the ‘spam’ responses were observed, or
The returned email has a specific MIME header, or
The returned email has a specific subject tag.

Complementary solutions are typically tested in a very similar setup to the full solutions, with the hosting MTA or spam-filtering system configured to carry out actions on the email in a manner that can be detected by Virus Bulletin.

Test case validation

To closely simulate real-time conditions, emails from real-world sources are promptly introduced into the testing environment without unnecessary delay. Despite employing the best threat intelligence and meticulously crafted automation, some of the emails received by the tested products may not be relevant for the test. Therefore, Virus Bulletin regularly reviews and validates test case emails, discarding any that are deemed unsuitable.

Note that in this validation process, many perfectly suitable emails may also be discarded due to the limited capacity for manual review.

Feedback

As a general rule, feedback is provided to the participating products on a weekly basis. No feedback is given during the official test periods, during holidays, or when maintenance is taking place. The feedback provided is non-comparative by nature, i.e. the feedback by itself is not suitable to determine how a product ranks against other products in the test.

This feedback is for vendor’s own information only, and it is not permitted for the details to be made public either by the vendor or by Virus Bulletin.

Feedback includes:

performance metrics on the test case bodies
speed measurements, where applicable
test cases for false negatives and false positives, including

emails, as sent to the product (MIME)
email transaction logs
the header of the email as it was returned by the product, if applicable.

Note that Virus Bulletin may cap the number of emails shared at 400 per email category.

Disputes

Disputes may be submitted at any time, however for the official test period Virus Bulletin requires that public test participants submit their disputes promptly upon receipt of the feedback, to ensure a timely publication of the public report. A minimum of 10 calendar days will be permitted for the submission of disputes.

Disputes are evaluated on a case-by-case basis. The vendor is asked to provide supporting data or evidence, if any, along with their dispute. Although all efforts will be made to resolve disputed issues to the satisfaction of all parties, Virus Bulletin reserves the right to make the final decision.

To reflect the broad nature of real-life issues, the scope of the disputes is not limited.

Policies

Product build and configuration policies

Public test reports can only be representative for the reader if the testing is conducted using publicly available product versions. It is therefore required that vendors do not use any enhancement of the product that is not available for general audiences. We also encourage (but do not require) vendors to share with the public the configuration used for their product in the test.

Tests are usually conducted with the latest generally available version of a product or service being tested. Deviations from this policy will be documented in the report.

Private tests are not subject to these constraints.

Binary classification mapping

Email security products often classify emails into various categories.

However, the VBSpam test relies on binary classification – there is either a hit on a test case (“email not wanted”), or there is no hit (“legitimate”). It is the product vendor’s responsibility to map their own classifications into either category above. In lieu of such mapping from the vendor, the Virus Bulletin test team endeavour to set up the mapping themselves.

Withdrawal from the test (opting out)

Products participating in the public test cannot be withdrawn from the test once the official test period has started. Public interest dictates that a test report is to be published, regardless of whether or not it is favourable from the vendor’s perspective.

However, Virus Bulletin may, at its discretion, allow withdrawal of a product in extraordinary circumstances, when compelling reasons suggest that its inclusion in the report would bear no relevance to the public. Examples of such situations are: collected data is proven to be tainted by lab-specific technical issues; significant testing errors have occurred, such as deviations from protocol, etc. Note that technical issues that impact not just the particular test environment but a wider user base (e.g. cloud outage, faulty rule updates, etc.) at the time of the test do not qualify as a basis for withdrawal and Virus Bulletin may proceed to publishing the report.

Technical issue resolution

Virus Bulletin pledges to work with the vendor to resolve technical issues with the product and notify the vendor as soon as possible when such issues are detected.

Vendor commentary

Prior to the publication of a report, the vendor of the product may choose to provide commentary to be included in the report notes. This is to ensure that the vendor’s perspective receives a fair representation. Such commentaries can be useful when the report contents are disputed by the vendor. Commentaries are subject to reasonable length limits and editorial approval.

Product audit

Vendors of full email security solutions are provided with remote access to their products and may audit their configuration, state, or logs at any time.

Vendors of complementary solutions can request an audit of the product configuration, but they are not granted remote access at all times. Audit is primarily done through a manual verification of the desired configuration (as provided by the vendor) or through sharing the logs and data generated. Remote access can also be arranged at a time suitable for both parties. Remote access may be supervised by Virus Bulletin personnel, or the virtual machine state may be captured and restored after the remote access session is completed, to prevent any tampering.

Participant inclusion

The public test series features participants that are included on a commissioned basis. Equally, Virus Bulletin may choose to include a product at its own discretion. For these latter type of products, Virus Bulletin commits to:

Extend an invitation to the vendor of the product and offer to adopt a voluntary participant status, with ample time allowed to consider and prepare for the public test. Voluntary participant status grants the vendor the same range of rights and level of service as other vendors enjoy as described in this document, but only for the duration of the process required to set up, test and dismantle the product.
For products that do not adopt a voluntary participant status, Virus Bulletin will:

show diligence and reasonable care during the testing process.
allow the vendor to include vendor commentary, as described in the policies above.

Changelog

Version 3.0.1

Clarification on the Spam Catch Rate formula by introducing the Weighted False Negative Rate value

Version 3.0

Rewritten from scratch.

VBSpam methodology - ver.3.0.1

Overview

Introduction

Test objective

Suitable products

Test outline

Test cases

Speed measurements

Test results

Award criteria

Testing procedure

Overview

Test environment for full email security solutions

Test environment for complementary security solutions

Introducing test cases to the test environment

Recording product response

Test case validation

Feedback

Disputes

Policies

Product build and configuration policies

Binary classification mapping

Withdrawal from the test (opting out)

Technical issue resolution

Vendor commentary

Product audit

Participant inclusion

Changelog

VBSpam

Latest report

VBSpam for end-users

VBSpam for vendors

VBSpam methodology

VBSpam test schedule

VBSpam test archive

VB testing

VB100

VBSpam

VBWeb

Consultancy Services