2014-01-07
Abstract
In the latest of his ‘Greetz from Academe’ series, highlighting some of the work going on in academic circles, John Aycock focuses on computer science surveys, looking in particular at one on binary code obfuscations in packer tools.
Copyright © 2014 Virus Bulletin
January can be a long, cold month in which any distraction from winter is welcome. Unfortunately, not all Canadian cities come equipped with a crack smoking mayor whose buffoonish behaviour makes global headlines [1], so I’m forced to turn elsewhere for entertainment. Thus to while away the wintry hours, I started reflecting on the fact that novelty is the crack of academic researchers.
That may seem like a rather flippant comment, but there is a lot of truth in it. Academic research papers have to make clear the researchers’ contributions to furthering knowledge, and indicate how their research is novel and never before seen. There is a sweet spot, and ironically too much novelty can be a bad thing (unless the research cures cancer or proves that P=NP). Evolutionary ideas often play better than revolutionary ones, especially given endemic problems in the peer review process that precedes publication – but that’s an entirely separate discussion. The point is that new is considered to be good, whether it’s a little new or a lot of new; not new is definitely bad.
In my opinion, this attitude is a shame, because there is a need in the research ecosystem for researchers to come along and clean up after their novelty-addled colleagues. In some fields, this takes the form of replication of results – something which is extremely rare in computer science. Instead, the ‘cleaning up’ in computer science can take the form of surveys.
A good survey of an area of research is an invaluable resource. It places research work in context, it classifies all the work, and it provides a ‘one-stop shop’ for anyone wanting to learn about the area. Even though writing a survey is not new research per se, I can attest that it is insanely difficult to do, involving tracking down work, making sense of it, and figuring out how to organize it. Sometimes the survey itself even leads to new discoveries – classifying things and building taxonomies is a great way to discover what’s missing.
In the anti-malware world, we have some good examples of useful surveys: the late Peter Ször’s The Art of Computer Virus Research and Defense [2] and Vesselin Bontchev’s Ph.D. dissertation [3] come to mind. More generally, the journal Software: Practice and Experience will publish the occasional paper ‘where apparently well known techniques do not appear in the readily available literature’ [4]. As an example, there was a good (although now outdated) survey on buffer overflows [5] that appeared in the journal.
Some workshops, such as USENIX WOOT [6], allow what they call ‘systematization of knowledge’ papers, i.e. surveys – although they are treated as somewhat second class, non-refereed papers at the same time as being declared ‘highly valuable to our community’. (Unsurprisingly, with an academic disincentive like that, examples are not exactly plentiful.)
All of this is a long-winded way of arriving at another, and perhaps the most major, venue for computer science surveys. ACM Computing Surveys is a publication that excels in publishing surveys of areas of computer science. I would venture so far as to say that if a survey appears in ACM Computing Surveys, it’s probably worth reading.
While the surveys published in Computing Surveys don’t always focus on security, the most recent issue has one that does: Roundy and Miller’s ‘Binary-Code Obfuscations in Prevalent Packer Tools’ [7]. While there may not be any surprises in the paper for experienced malware analysts, it would make excellent background reading for new employees or less technical people in companies wanting to expand their knowledge.
The authors organize the obfuscations in terms of analysis tasks – a good approach, and one that provides additional information for an uninitiated reader beyond the obfuscations themselves. I am even unable to bemoan the ignorance of related work in the anti-malware community: Roundy’s affiliation is given as the University of Wisconsin and Symantec Research Labs, and among the paper’s 90 odd references are pointers to CARO, VB and AVAR.
However, the paper does suffer from a problem that is typical of journal publication in computer science: timeliness (or lack thereof). Journals are seen as archival in many areas of computer science, rather than a means to disseminate cutting-edge work – and for good reason. It can take literally years to publish a journal article. In Roundy and Miller’s case, Computing Surveys first received the paper in March 2012; after revisions, it was accepted in October 2012, a full year before it was published [7]. Obviously, the work reported in the paper would have been done some time before its submission, and indeed a 2008 article by Panda Security is used as the basis of what constitutes a ‘prevalent’ packer tool. The authors note this problem, saying up front on page one that their survey ‘will need to be periodically refreshed as obfuscation techniques continue to evolve’ [7]. Even with this limitation however, the paper would be a good January distraction for anyone needing to bring themselves up to speed in the area.
[1] Wikipedia. Rob Ford. http://en.wikipedia.org/w/index.php?title=Rob_Ford&oldid=584393736.
[3] Bontchev, V.V. Methodology of Computer Anti Virus Research. Ph.D. thesis, University of Hamburg, 1998.
[4] Wiley. Software: Practice and Experience Overview. http://onlinelibrary.wiley.com/journal/10.1002/(ISSN)1097-024X/homepage/ProductInformation.html.
[5] Lhee, K.-S.; Chapin, S.J. Buffer overflow and format string overflow vulnerabilities. Software: Practice and Experience 33(5), 2003, pp.423–460. http://dx.doi.org/10.1002/spe.515.
[6] USENIX WOOT 2013 call for papers. https://www.usenix.org/conference/woot13/call-for-papers.
[7] Roundy, K.A.; Miller, B.P. Binary-Code Obfuscations in Prevalent Packer Tools. ACM Computing Surveys 46(1), 2013, Article 4. http://dx.doi.org/10.1145/2522968.2522972.