Joe Telafici, Dmitry Gryaznov McAfee
AV vendors attempting to keep up with the current flood of malware are faced with some daunting problems. The number of incoming customer submissions, honeypot and crawler collections, collection-swapping samples and other sample sources has grown exponentially over the last several years. More than half of the malware ever created came into being, by our calculations, since 2005. Managing all this data puts a strain on storage, bandwidth, processes and personnel.
We have analysed the source, timing and frequency of incoming submissions by hashing and correlating every sample we have ever received. Analysis of this data shows some interesting trends in terms of where samples come from, both initially, and over time, and how they travel around the research community. The change from parasitic to static malware, the increasing use of packers and other obfuscation methods, and the use of open source malware development models has shifted our emphasis from classification and deep analysis to managing huge numbers of largely uninteresting variants on existing threats. As an industry, we are effectively DoS-ing ourselves by sending the same samples to each other over and over again, where they clog up networks and file servers, and add overhead to the already difficult task of keeping our customers protected.
Finally, we propose some solutions for this problem that should help to cut down on the amount of work we have to do collectively, and individually.