Virus Bulletin :: VB100 Methodology

This methodology version has been superseded by a newer version. Click here for the most recent version in effect.

Overview

Test Purpose

VB100 is a certification test for endpoint security solutions, which intends to evaluate the static detection capabilities of the tested products. Products that earn the VB100 award can be considered to meet a minimum standard of quality.

Please be advised that static detection is only one of the layers in today’s multi-layered endpoint security approaches – even an optional one. For this reason, we do not recommend using the VB100 test results alone to evaluate the real-world protection potential of a tested product. We encourage the reader also to consult other reputable independent tests and certifications for a more complete view of a product’s capabilities.

Testing Procedure

General Test Outline

The product is installed on a clean, dedicated Windows instance (the same OS image is used for all products). Initial configuration is performed as per the product default (exceptions apply, see Test Setup).

The product is then exposed to various malicious and clean samples, either in On-Access test mode (preferred) or in On-Demand test mode.

Note that neither malicious nor clean samples are executed in the VB100 test.

On-Access Test Mode

This is our default test mode. The product is exposed to samples through file operations to trigger on-access scanning. This is a two-step process:

On-write scanning: Sample files are copied from a Linux server machine to the test client machine through a Windows network share. The copy process is orchestrated by the server. Failed copy operations are considered to be detections.
On-read scanning: As some products will not scan on write, or may limit the scan scope, all successfully copied sample files are opened and read using a bespoke tool (which uses the standard OS APIs), which records the success of this operation. Failed opening or reading of files are considered detections.

On-Demand Test Mode

In case the product does not fully support on-access scanning, we may use on-demand scanning as a fallback, i.e. interact with the product to have it scan a file system folder that contains all samples. Detections are identified from the product logs.

Test Environment

All testing is performed on physical computers. These have average endpoint PC specifications, similar to those one would find in a business setting. Currently two test platforms are used for certification:

Windows 7 test platform

Operating system: Microsoft^Ⓡ Windows^Ⓡ 7 Pro N 64-bit
CPU: Core i3-4160 3.6 GHz
RAM: 4GB
Storage: 1TB + 500GB hard disk drive

Windows 10 test platform

Operating system: Microsoft^Ⓡ Windows^Ⓡ 10 Pro 64-bit
CPU Core i5-4440 3.10 GHz
RAM: 8GB
Storage: 1TB + 500GB hard disk drive

Sample Selection

Sample Types

The test uses valid, 32-bit or 64-bit Portable Executable (PE) binaries only. Executables (e.g. .exe, .scr) and dynamic link libraries (e.g. .dll, .cpl.) may both be used. Statically linked library (DLL) dependencies may not be available. Archive containers (e.g. .zip, .rar) are unpacked and only the resulting binaries are used. We also endeavour to exclude SFX (self-extracting archive) binaries and make some effort to detect and exclude SFXs created by common SFX tools like WinZip.

Malicious Sample Sets

We use the following malicious sets in various parts of the test:

WildList set: This set contains ‘in-the-wild’ (ItW) samples from the WildList Organization. The WildList is an extremely well-vetted set of malware recently observed in the wild by researchers. Set size: a few thousand samples total.
AMTSO RTTL set: The Real-Time Threat List (RTTL) is a repository of malware samples collected by experts from around the world. The repository is managed by the Anti-Malware Testing Standards Organization (AMTSO). This test uses a continuous feed of new samples. Set size: 1,200-1,300 samples on average (per day of use).
The ‘RAP/Response’ set: This set contains various samples of fresh malware collected by VB and received from third parties. Set size: 1,000-2,000 samples on average (per day of use).

Clean Sample Sets

The clean set is a set of clean files collected by VB, consisting of over 400,000 binaries, all harvested from popular software downloads available on the Internet. The set is regularly maintained with new additions, the purging of old files and removal of potentially suspicious binaries.

Test Stages

The VB100 tests consist of the following stages:

Setup
Test parts

Certification test
Reactive/Proactive test

Sample post-validation
Feedback and disputes
Publication

Setup

Image preparation

For new products, the latest Windows base image is restored on the test machine and the product is installed. The product is configured as per its default installation settings, with the following exceptions:
- Logging may be configured to allow the gathering of adequate information for test result analysis.
- Detection response may be configured in a way that facilitates automated testing (e.g. automatic blocking instead of prompting on malware detection; disinfection or quarantining disabled, etc.).
- For products tested in on-demand mode, on-access scanning is disabled (where possible).
An image of the system is captured in this state, which is then used in the test (and in subsequent tests). This process is repeated on both Windows test platforms.
Test images (including the Windows base image) are reviewed for Windows updates. Approximately every six months, Windows is fully updated and images are updated accordingly.

Image restoration

The latest test image is restored for the product.
VB test engineers make a reasonable effort to confirm that the product completes its own updates successfully before proceeding with the test.
Basic operational health is checked by scanning an instance of the EICAR Standard Anti-Virus Test File.

Other

The latest version of the WildList set is retrieved. This version is used throughout the test (all test parts, see next section).

Test Parts

There are four parts to each VB100 test.

Parts #1, #2 and #3: Each of the first three parts (or rounds) of the test consist of the Certification Test + the Reactive part of the Reactive/Proactive (RAP) Test
Part #4: the Proactive part of the Reactive/Proactive (RAP) Test

Certification Test

This test forms the basis of the certification.

As described in General Test Outline, products are subjected to malicious and clean samples.

In each of the test parts 1-3, the whole of the WildList set is used for testing.
In each of the test parts 1-3, approximately one third of the clean set is used (different clean files in each part)

Full Internet access is allowed for the tested products.

Note that, as the WildList is tested in full for all three test parts, products may miss the same WildList sample up to three times. This will count towards the final result each time.

Reactive/Proactive (RAP) Test

This is an additional test that aims to give an indication of how well the product’s static detection keeps up with new threats after losing Internet access. Note that the results of this test do not count towards certification.

The RAP test is conducted in a similar manner to the certification test. However, this test only uses the ‘RAP/Response’ set and the RTTL set, and Internet access is disabled during the Proactive part of the test.

During the Reactive part of the test, the product is allowed full Internet access and is subjected to a snapshot of the ‘RAP/Response’ set and the RTTL set, as captured in the 10 days leading up to the test date.
During the Proactive part of the test, Internet access is disabled, effectively ‘freezing’ the product and preventing access to updates and the cloud. Samples used are from the ‘RAP/Response’ set and the RTTL set, as captured in the 10 days elapsed since the product was ‘frozen’.

No clean files are used in this test.

Sample Post-Validation

All purportedly malicious samples used in the test are validated for maliciousness using a ‘multi-scanner’ approach. This involves scanning each sample using multiple anti-virus engines and using the classification consensus to determine if the sample is malicious or clean. Samples that fail to demonstrate their belonging to their respective sample set are examined manually. If necessary, the samples are discarded and the test results are adjusted as if these samples were never part of the test.

Feedback and Disputes

Every participant receives preliminary feedback on the performance of the product(s) tested on their behalf. This marks the beginning of a 5-day dispute period, during which participants can examine the results and dispute them, if necessary.

Feedback includes at least the following, per product, for the Certification Test:

List of samples classified as false positives, by hash, per operating system
List of samples classified as false negatives (i.e. misses), by hash, per operating system. In the case of an excessive number of false negatives, the number of samples shared may be capped.
A dispute period of at least 5 business days (as per UK business hours) starting from the issuing of initial feedback. (In some cases the dispute time may be extended at VB's discretion.)

Upon request, VB may also provide:

Log and other data files generated by the tested products.
Version numbers before and after the tests.
False positive or false negative binaries for the Certification Test.

Award Criteria

The VB100 Award is issued to products that pass all three parts of the Certification Test

without missing any of the WildList set (no false negatives)
without mistaking any clean file as malicious (no false positives)
on both platforms.

Please be advised that the granting of the award, or the failure to attain one, should be interpreted along with the purpose of the test, as described at the beginning of this methodology.

RAP Scoring

RAP figures are calculated as follows.

Basic figures:

Reactive A: Percentage of samples caught out of all 6-10-day-old malware samples.
Reactive B: Percentage of samples caught out of all 1-5-day-old malware samples.
Proactive A: Percentage of samples caught out of all samples collected 1-5 days after the product updates were frozen.
Proactive B: Percentage of samples caught out of all samples collected 6-10 days after the product updates were frozen.

Derived figures:

Reactive Average: (Reactive A + Reactive B) / 2. This figure signifies how effective a product is at catching the most recent threats when connected to the Internet. The closer to 100% the better.
Proactive Average: (Proactive A + Proactive B) / 2. This figure signifies how effective a product is at catching new threats when offline. The closer to 100% the better.
RAP Average: (Reactive Average * 2 + Proactive Average) / 3. This weighted average is the final score we calculate and assigns twice the weight to Reactive performance. The closer to 100% the better.

RAP charts published with the test visualize the four basic Reactive and Proactive components as bars in the background, the RAP Average as the highlighted number in the foreground and also display if there were any false positives in the Certification Test.

Issue Management and Disputes

General Policy

Acknowledging the challenges that arise from testing, VB employs a general policy for managing issues that occur during the test:

We thrive to be fair and provide equal conditions to all participants when resolving issues.
We cooperate with participants and make reasonable efforts to fix technical issues with the tested products if they affect testing.

Specific Cases

Based on past experience, we have formed the following the policies on dealing with some issues:

In case a product runs into significant technical issues that affect test performance, we may either re-run the affected test part (up to three test parts or times) or, as a final resort, allow the product to be withdrawn from the particular test.

Otherwise, products that commit to the test may not be withdrawn.
Products that do not support on-access detection of some or all samples may opt to be tested on demand.

Dispute Policy

Disputes are evaluated on a case-by-case basis. Participants are asked to provide supporting data or evidence, if any, along with their dispute. Although all efforts will be made to resolve disputed issues to the satisfaction of all parties, VB reserves the right to make the final decision regarding the dispute.

To reflect on the broad nature of real-life issues, the scope of the disputes is not limited. The following are a few examples from past disputes:

False negatives (e.g. on the basis of a sample being corrupted, greyware, etc.)
False positives (e.g. corrupted files, greyware, PUPs, etc.)
General performance issues (e.g. the product did not function as expected during the test)

Samples that are successfully disputed are removed from the test for all tested products.

Version History

Version 1.0: First published in this format.

VB100 Methodology - ver.1