Tim Ebringer University of Melbourne
Li Sun RMIT University
download slides (PDF)
Entropy or randomness calculations can be a fast way to estimate whether a file is packed. However, most algorithms we have seen simply give an overall entropy value for the entire file. We present an algorithm that is not only fast, but preserves local detail, so a plot can be made of an entire file showing areas of high entropy and low entropy. Areas of low entropy (code and header, in the case of a PE file) show as lower points on the plot, whereas areas of high entropy (encrypted or compressed data) show as high areas.
We present local-detail preserving entropy plots for a variety of packers, showing that many appear to pack files in a distinctive manner. This seems to be because compressed data and the code that unpacks it tends to be placed in the same relative location in the packed file, leading to a kind of signature based on an 'entropy signal'. We give algorithms which have shown early promise in comparing the entropy signals of various packed files.
Finally, in what we believe to be a unique presentation, we are able to visualize the work of a packed program as it unpacks itself by placing breakpoints on decompression/decryption loops, dumping memory, and performing our detail-preserving entropy analysis on the dump. A YouTube video (link below) shows a UPX-packed file unpacking itself. The start of the video shows the entropy of a UPX-packed file as it is initially loaded into memory. As the video progresses, the uncompressed code is written into the empty section created by the loader. Note the interesting behaviour whereby UPX actually overrides its compressed data with code during the unpacking process. In this video, the y-axis is the amount of entropy, and the x-axis is memory address of the main UPX sections from low to high.
Unpacking video: https://www.youtube.com/watch?v=pcZpSyZuA-Q