Peter Košinár ESET
Juraj Malcho ESET
Richard Marko ESET
David Harley ESET
download slides (PDF)
As the number of security suites increases, so does the need for accurate tests to assess detection capability and footprint, but accuracy and appropriate methodology gets harder. Good tests help consumers to make better-informed choices, and help vendors to improve their software. But who really benefits when vendors tune products to look good in tests instead of maximizing their efficiency on the desktop?
Conducting detection testing may seem as simple as grabbing a set of (presumed) malware and scanning it. But achieving simplicity can be complicated. Aspiring detection testers typically have limited testing experience, technical skills and resources. Constantly recurring errors and mistaken assumptions weaken the validity of test results, especially when inappropriate conclusions are drawn, as when likely error margins in the order of whole percents are ignored, translating into exaggerated or even reversed ranking.
We examine (in much more detail than previous analyses) typical problems like inadequate, unrepresentative sizing of sample sets, limited diversity of samples, the inclusion of garbage and non-malicious files (false positives), set into the context of the 2010 malware scene.
Performance and resource consumption metrics (e.g. memory usage, CPU overhead) can also be dramatically skewed by incorrect methodology such as separating kernel and user data, and poor choice of 'common' file access.
We show how numerous methodological errors and inaccuracies can be amplified by misinterpretation of the results. We analyse historical data from different testing sources to determine their statistical relevance and significance, and demonstrate how easily results can drastically favour one tested product over the others.