Graph, entropy and grid computing: automatic comparison of malware

Ismael Briones Vilar PandaLabs

Nowadays AV laboratories are saturated with huge collections of malware which are received daily. It's a fact that the industry needs better methods to automatically identify, analyse and classify these amounts of samples. AV laboratories cannot continue working as they did years ago (or even months ago).

In this paper we will explain an automated classification system to identify files with similar internal structures. We will use graph theory as a way to identify similar functions between malware samples. This system helps to minimize human error and false positive detection

Previous research with graph theory has proven to be useful in finding similarities between malware variants, however these systems don't have good performance. To solve the performance problem we will discuss some methods that can be used for this purpose: an algorithm, based in entropy and custom checksum, (in order to group similar files) and a grid computing system.