2013-11-04
Abstract
Python obfuscation is relatively rare. In the latest of his ‘Greetz from academe’ series, highlighting some of the work going on in academic circles, John Aycock takes a look at a research paper in which the authors reverse engineered a 'hardened' Python application from Dropbox.
Copyright © 2013 Virus Bulletin
Some programming languages have an embarrassment of riches when it comes to code obfuscation. For JavaScript, of course, every few months sees a fresh analysis of malicious code, such as Peter Ferrie’s recent breakdown of JS/Proslikefan [1]. For C, code obfuscation is sport, with the International Obfuscated C Code Contest [2]. And Perl is… Perl is another programming language.
Python, however, has largely eluded the obfuscation craze. There are a few examples, including a beautiful Mandelbrot set generator whose code is shaped like a Mandelbrot set [3]; another post by the same author [4] contains links to some other scattered Python obfuscation examples, and there was a 2011 PyCon talk on the subject [5]. In the unlikely event that the bad guys ever decided to forsake JavaScript for Python, these few examples could turn out to be Useful Information.
All this means that when I see anything relating to Python obfuscation, it quickly gets my attention. That was the case with a paper from the 2013 USENIX Workshop on Offensive Technologies, called ‘Looking inside the (Drop) box’ [6], in which the authors detail their techniques for reverse engineering a ‘hardened’ Python application from Dropbox. It’s a paper that wouldn’t be out of place in the pages of VB and, much to my surprise, it turns out that I (very indirectly) helped with the work.
In the Dropbox case, the Python obfuscation was not at source level, but in the ‘frozen’ version that was shipped out. A frozen Python application is one where all the pieces of compiled Python bytecode are bundled together to allow a single file to be distributed. It’s essentially a form of (non malicious) packing, and a number of legitimate tools/scripts exist for this purpose – one even in the Python source distribution itself.
Dropbox’s frozen executable was modified to make reversing it more challenging, though [6]. The opcode values were altered, the code was encrypted, and the normal means to query bytecode were removed, amongst other things. The researchers ended up injecting a DLL into the Dropbox process to gain control, allowing them eventually to inject their own Python code into the hardened interpreter. A few steps later (all of which are detailed in the paper), they had acquired the Python bytecode.
Once the bytecode had been extracted, the authors used a tool called uncompyle2 to reconstitute the Python source code. Upon further examination [7], I discovered that the tool is based on a Python compilation framework I created, and a Python decompiler that I cobbled together in around 1999. It’s a small world, and it’s reassuring, as an academic, to know that occasionally something useful comes of my work.
Back to the reverse engineering: after the Python source code had been extracted, the researchers worked around Dropbox’s authentication and gathered up SSL data, using a technique they called ‘monkey patching’.
I must confess that I had never heard that term before, and it brought to mind either a roomful of monkeys with typewriters working on Shakespeare v2.0, or animals prone to flinging their own faeces. In neither case did it cast the reverse engineering in a terribly flattering light. Naturally, I turned to the arbiter of all that is true, Wikipedia, which helpfully informed me [8] – and I am not making this up – that the technique ‘has also been termed duck punching and shaking the bag’. The emphasis is theirs, believe me. So while monkey patching sounds bad, the alternatives are even more ghastly. But I digress.
Apparently, monkey patching is simply poking into a dynamic language at run time and modifying things. This allowed the paper’s authors to hook all the SSL objects in the Python code and dump out their data unencrypted. And thus Dropbox fell.
No monkeys, pythons, or ducks were harmed in the creation of this article.
[1] Ferrie, P. Fans like pro, too. Virus Bulletin, September 2013. http://www.virusbtn.com/vba/2013/09/vb201309-Proslikefan.
[2] The International Obfuscated C Code Contest. http://www.ioccc.org/.
[3] Preshing, J. High-resolution Mandelbrot in obfuscated Python. http://preshing.com/20110926/high-resolution-mandelbrot-in-obfuscated-python/.
[4] Preshing, J. Penrose tiling in obfuscated Python. http://preshing.com/20110822/penrose-tiling-in-obfuscated-python/.
[5] Healey, J. How to write obfuscated Python. PyCon 2011. http://blip.tv/pycon-us-videos-2009-2010-2011/pycon-2011-how-to-write-obfuscated-python-4899191.
[6] Kholia, D.; Wegrzyn, P. Looking inside the (Drop) box. 7th USENIX Workshop on Offensive Technologies, 2013.
[8] Monkey patch. http://en.wikipedia.org/w/index.php?title=Monkey_patch&oldid=575983320.