VBSpam is a continuously running performance test programme for email security solutions. Products can participate either publicly or privately in the programme; this methodology documents Virus Bulletin’s conduct in both cases.
The public VBSpam tests seek to measure and compare the performance metrics of the publicly tested email security solutions. Recognizing overall excellence, participating products can also earn the VBSpam or VBSpam+ award upon meeting the relevant criteria.
Outside the public test periods (and for privately tested products), VBSpam seeks to provide the vendor with continuous feedback about the performance of their product.
VBSpam is designed to accommodate a range of products, including cloud/hosted solutions and on-premises solutions (installable, virtual appliances or physical appliances), as long as the product can accept, filter and return/forward emails from our sources through SMTP. In certain cases, VBSpam can also accept APIs or other complementary solutions in the test.
The test sorts solutions into two distinct categories, with slightly different rules applicable to each:
As most performance metrics between these categories compare poorly, the test only compares products within the same category.
The test exposes each tested product to both unwanted and legitimate emails, and records the product’s response to these emails. During this testing, Virus Bulletin also collects and publishes speed / throughput statistics for full email security solutions.
VBSpam utilizes a variety of email sources for test cases:
All emails used in the test are in the wild. Virus Bulletin does not create new emails, e.g. to simulate spear-phishing tactics. Some modifications to the in-the-wild emails are necessary to facilitate testing and to protect intellectual property, as detailed later on in this document.
Emails are forwarded to the tested product without undue delay upon receipt by our threat intelligence, in order to stay as close to real time as possible.
The legitimate emails used are predominantly written in English, whereas unwanted emails represent a wide variety of languages.
Note that full solutions and complementary solutions may be subjected to a slightly different mix of emails when there is a potential conflict of interest (for example, if the vendor that supplies the email feed also has its products publicly tested).
Emails are sorted into the following categories and subcategories:
For each email sent to the tested products, the ‘speed’ (or ‘delay’) is measured as the time it takes for the email to be returned from the product: the counter starts when the delivery of the email starts and stops when the returned email is accepted. The speed is only relevant for emails from the ham corpus and is used to indicate whether products delay the delivery of emails, which would give them an advantage in spam filtering.
Because small delays don’t make a significant difference to the end-user's perspective, the reports and feedback only include the delays measured at the 10%, 50%, 95% and 98% percentiles, and these are indicated as being: less than 30 seconds (green); between 30 seconds and 2 minutes (yellow); between 2 minutes and 10 minutes (orange); and more than 10 minutes (red).
(green) = up to 30 seconds | |
(yellow) = 30 seconds to two minutes | |
(orange) = two to ten minutes | |
(red) = more than ten minutes |
Emails may be delayed by network issues or transient SMTP errors (4xx). In such cases, delivery will be attempted six times, at 20-minute intervals. The delay introduced by this is added to the measured ‘speed’.
Note that speed measurements are only collected for full email security solutions.
The products’ responses are referenced against the respective body of the test cases and are sorted into the following categories:
Publicly tested full email security solutions may earn the VBSpam or the VBSpam+ award.
The VBSpam award signifies that the product demonstrates a good overall detection rate with a very small number of false positives, and also a good and consistent throughput.
The VBSpam+ award signifies excellence on the same metrics.
Eligibility for the award depends on a variety of metrics, most importantly the final score, which is calculated based on the weighted false positive rate and the spam catch rate. The tested product must also be a full email security solution, as complementary solutions do not currently receive either of the awards, even when they perform comparably.
The weighted false positive rate is the ratio of legitimate emails misclassified as spam to the total number of legitimate emails. Emails from the “newsletter” subcategory are assigned a lower weight as well as a cap, as emails in this subcategory are more ambiguous than ham emails.
Definitions:
FORMULA:
<weighted-false-positive-rate> =
(<ham-false-positives> + 0.2 * min(<newsletter-false-positives> , 0.2 * <newsletters-total>)) /
(<ham-total> + 0.2 * <newsletter-totals>)
The spam catch rate is calculated in a similar manner, with a lower weight assigned to “potentially unwanted” emails due to their ambiguous nature.
Definitions:
FORMULA:
<spam-catch-rate> =
(<false-negatives-total> - <potentially-unwanted-false-negatives> + 0.2 * (<potentially-unwanted-false-negatives>)) /
(<spam-total> - <potentially-unwanted-total> + 0.2* <potentially-unwanted-total>)
The final score for the product is calculated in a manner that gives a disproportionate weight to false positives, to recognize the damage such misclassifications can do.
FORMULA:
<final-score> = <spam-catch-rate> - (5 x <weighted-false-positive-rate>)
A product must meet or exceed the criteria below to earn the VBSpam award:
Outstanding performers can earn the VBSpam+ award, for which the criteria are defined as:
The product lifecycle in VBSpam begins with an initial product setup, typically done in cooperation with the vendor.
This is followed by continuous testing for the designated testing period. For publicly tested products that join the test on a commercial basis, the designated testing period is approximately one year, during which four periods are designated as official testing periods. Data obtained from these days serve as the basis for public, comparative test results and certification.
Full email security solutions in the test can be:
VM hosting happens on a Type I (bare metal) hypervisor. VM resources such as vCPUs, memory and storage are allocated as per the requirements of the product.
Unrestricted outbound internet access is provided.
Remote access to the products hosted by Virus Bulletin is provided throughout their participation. Unless otherwise agreed, vendors are expected to set up and configure their products in the test environment, with assistance from Virus Bulletin.
Complementary solutions are set up and managed by Virus Bulletin, in a virtualized environment. VM resources are allocated as required to facilitate fair testing.
Unrestricted outbound internet access is provided. However, unlike full email security solutions, no unsupervised remote access to the test environment is available for the vendors. This is because complementary solutions typically require the help of a host MTA to work, which may have built-in anti-spam capabilities. Virus Bulletin deems it crucial to retain full supervision of the MTA configuration to prevent any potential noise and comparability issues arising from the vendor accidentally or purposefully changing the configuration of the underlying MTA.
Test cases (emails) are sent continuously to the tested products.
Each email is delivered in an individual SMTP transaction. Virus Bulletin seeks to keep modifications to the original, in-the-wild version of the emails to a minimum, however a number of differences between the original and in-test emails are inevitable. Some of these changes are identical to having a front-end SMTP server that rewrites the recipient information:
The more invasive changes are:
Some metadata about the original email will be retained through a Received: header, again as if the email were received first by a front-end SMTP server. A single new Received: header will be inserted into the email, in the following fashion:
Received: from <original-reverse-dns> (HELO <original-helo-domain> [<originating-ip>]) by <vb-mta>
(<vb-mta-software>) with [E]SMTP id <vb-message-id>; <date>
where
Additionally, if the tested product supports the XCLIENT command, the original client IP address as well as the HELO/EHLO domain will be passed on to the tested product.
A tested product receives test case emails from the VBSpam infrastructure and the product’s response is recorded. These responses are sorted into two categories:
VBSpam supports a wide variety of ways to detect the product response. In addition to those outlined below, Virus Bulletin accepts custom methods at its discretion.
Full solutions will typically be set up in a manner to receive test case emails from Virus Bulletin through SMTP and to return the filtered emails to a specific Virus Bulletin server for examination through SMTP. By default, the following rules are applied to this scenario:
Complementary solutions are typically tested in a very similar setup to the full solutions, with the hosting MTA or spam-filtering system configured to carry out actions on the email in a manner that can be detected by Virus Bulletin.
To closely simulate real-time conditions, emails from real-world sources are promptly introduced into the testing environment without unnecessary delay. Despite employing the best threat intelligence and meticulously crafted automation, some of the emails received by the tested products may not be relevant for the test. Therefore, Virus Bulletin regularly reviews and validates test case emails, discarding any that are deemed unsuitable.
Note that in this validation process, many perfectly suitable emails may also be discarded due to the limited capacity for manual review.
As a general rule, feedback is provided to the participating products on a weekly basis. No feedback is given during the official test periods, during holidays, or when maintenance is taking place. The feedback provided is non-comparative by nature, i.e. the feedback by itself is not suitable to determine how a product ranks against other products in the test.
This feedback is for vendor’s own information only, and it is not permitted for the details to be made public either by the vendor or by Virus Bulletin.
Feedback includes:
Note that Virus Bulletin may cap the number of emails shared at 400 per email category.
Disputes may be submitted at any time, however for the official test period Virus Bulletin requires that public test participants submit their disputes promptly upon receipt of the feedback, to ensure a timely publication of the public report. A minimum of 10 calendar days will be permitted for the submission of disputes.
Disputes are evaluated on a case-by-case basis. The vendor is asked to provide supporting data or evidence, if any, along with their dispute. Although all efforts will be made to resolve disputed issues to the satisfaction of all parties, Virus Bulletin reserves the right to make the final decision.
To reflect the broad nature of real-life issues, the scope of the disputes is not limited.
Public test reports can only be representative for the reader if the testing is conducted using publicly available product versions. It is therefore required that vendors do not use any enhancement of the product that is not available for general audiences. We also encourage (but do not require) vendors to share with the public the configuration used for their product in the test.
Tests are usually conducted with the latest generally available version of a product or service being tested. Deviations from this policy will be documented in the report.
Private tests are not subject to these constraints.
Email security products often classify emails into various categories.
However, the VBSpam test relies on binary classification – there is either a hit on a test case (“email not wanted”), or there is no hit (“legitimate”). It is the product vendor’s responsibility to map their own classifications into either category above. In lieu of such mapping from the vendor, the Virus Bulletin test team endeavour to set up the mapping themselves.
Products participating in the public test cannot be withdrawn from the test once the official test period has started. Public interest dictates that a test report is to be published, regardless of whether or not it is favourable from the vendor’s perspective.
However, Virus Bulletin may, at its discretion, allow withdrawal of a product in extraordinary circumstances, when compelling reasons suggest that its inclusion in the report would bear no relevance to the public. Examples of such situations are: collected data is proven to be tainted by lab-specific technical issues; significant testing errors have occurred, such as deviations from protocol, etc. Note that technical issues that impact not just the particular test environment but a wider user base (e.g. cloud outage, faulty rule updates, etc.) at the time of the test do not qualify as a basis for withdrawal and Virus Bulletin may proceed to publishing the report.
Virus Bulletin pledges to work with the vendor to resolve technical issues with the product and notify the vendor as soon as possible when such issues are detected.
Prior to the publication of a report, the vendor of the product may choose to provide commentary to be included in the report notes. This is to ensure that the vendor’s perspective receives a fair representation. Such commentaries can be useful when the report contents are disputed by the vendor. Commentaries are subject to reasonable length limits and editorial approval.
Vendors of full email security solutions are provided with remote access to their products and may audit their configuration, state, or logs at any time.
Vendors of complementary solutions can request an audit of the product configuration, but they are not granted remote access at all times. Audit is primarily done through a manual verification of the desired configuration (as provided by the vendor) or through sharing the logs and data generated. Remote access can also be arranged at a time suitable for both parties. Remote access may be supervised by Virus Bulletin personnel, or the virtual machine state may be captured and restored after the remote access session is completed, to prevent any tampering.
The public test series features participants that are included on a commissioned basis. Equally, Virus Bulletin may choose to include a product at its own discretion. For these latter type of products, Virus Bulletin commits to:
Version 3.0