2015-03-05
Abstract
Both Android and Java malware, delivered via ZIP-based packages, have reached high volumes in the wild, and continue to grow at a rapid rate. In his VB2014 paper, Gregory Panakkal explores the ZIP file format, focusing specifically on APK files as handled by the Android OS. He also explores new malformations that can be applied to APK files to break typical AV engine unarchiving, thus bypassing content scanning, while keeping the APK valid for the Android OS.
Copyright © 2015 Virus Bulletin
2013 saw multiple high-profile vulnerabilities for Android, with the ‘Master Key’ Cryptographic Signature Verification Bypass vulnerability topping the charts. Several specially crafted malicious APKs exploiting this vulnerability appeared after proof-of-concepts (PoCs) were created by its initial discoverers. It was the difference in the two ZIP archive-handling implementations used by Android – one to validate the APK (using Java), and other to extract the contents of the APK (using C) – that led to this vulnerability.
ZIP is the de facto standard packaging format for delivering applications such as Android Package (APK) files, Java Archive (JAR) files, Metro App (APPX) files, and documents such as Office Open Format (DOCX, XLSX etc.) files. Both Android and Java malware, delivered via ZIP-based packages, have reached high volumes in the wild, and continue to grow at a rapid rate. Therefore, it is critical for anti virus engines to scan the contents of these files correctly, matching the behaviour observed in the target environment.
This paper explores the ZIP file format, focusing specifically on APK files as handled by the Android OS. It covers the existing design, and technical aspects of publicly disclosed vulnerabilities for Android. The paper also explores new malformations that can be applied to APK files to break typical AV engine unarchiving, thus bypassing content scanning, while keeping the APK valid for the Android OS. It briefly covers the concept of an amalgamated package (‘Chameleon ZIP’) that could be treated as an APK/JAR/DOCX file, based on the application that processes it, and the challenges this poses to the AV engine components that attempt to scan content based on recognized package type.
Android Package (APK) is a ZIP-based file containing Android-specific metadata, with a directory structure and package verification metadata similar to that of JAR (Java Archive). This design makes it easier for the Android Package Manager to locate, extract and validate the contents within the APK. (The following section may be skipped if the reader is familiar with the ZIP file format.)
The ZIP file format includes a central directory located at the end of the file. The central directory, containing names of files or directories and related information, acts as a reference to easily locate a file’s compressed data.
The three primary headers that are used by the ZIP format are as follows:
End of Central Directory (EOCD) header
An unarchiving application scans the last 64KB of the file in reverse order to locate the EOCD header’s magic value (0x06054b50). Once located, this header provides critical information relating to the location of the start of central directory file headers, the total header size, and the number of entries to expect.
Central Directory File Header (CDFH)
There is one central directory entry per archive file/directory. It consists of critical information such as the filename, the offset to the local header, the size of compressed and uncompressed data, the compression method used, etc. The CDFH starts with the magic value 0x02014B50.
Local File Header (LFH)
The Local File Header precedes the file’s compressed data, and contains basic information. If specific bits in the GeneralPurposeBitFlag are set, a DataDescriptor structure immediately follows the compressed data to indicate the CRC32, compressed and uncompressed data sizes. This is typically added by applications that do not know these values at the time of writing the LFH – in which case these fields are set to zero.
The overall layout of the various ZIP headers and data is shown in Figure 5.
Figure 5. ZIP file layout [1].
The Android Package file follows a predefined directory structure (shown inFigure 6 ) that enables the Android Package Manager to extract metadata and validate contents before installation.
Files under the META-INF directory found in the root of the archive enable the Android Package Manager to validate files found outside of this directory.
MANIFEST.MF: contains Base64-encoded SHA1 hashes of files with their relative location in the archive.
<CERT>.SF: contains a Base64-encoded SHA1 hash of MANIFEST.MF, and separate Base64-encoded SHA1 hashes of the file entries in MANIFEST.MF.
<CERT>.RSA: certificate file in X.509 format containing the developer’s public key and the signed blob of <CERT>.SF.
Prior to app installation, the Android Package Manager looks for specific files in the package that provide critical information and are required for the functioning of the app.
AndroidManifest.xml: A binary XML file specifying essential information about the app to the Android system, i.e. information the system must have before it can run any of the app’s code. This includes information such as the name of the app, the permissions required, libraries required, etc.
Classes.dex: A heavily optimized binary in DEX file format, encompassing the compiled Java classes, strings and data into one single executable file.
The Android installation process initiated by the user involves three main components of the Android system.
This is a daemon process that runs as root and listens on domain socket /dev/socket/installd for commands to install the APK with appropriate permissions. This component is a native program written in C/C++, and configured to start automatically on OS startup.
This runs on startup as part of the system_service process. This service, written in Java, listens for app installation intent. Once it receives a request, it cryptographically verifies the APK prior to invoking the Installd process to perform the on-disk installation.
The Android system uses two different implementations when processing an APK file. During verification, it uses the Java implementation (Package Manager Service), and for final extraction it uses a C++ based implementation (Installd/libdex). This has led to breaches in the APK’s cryptographic verification process, allowing installation of crafted trojanized APKs without breaking trust.
The first vulnerability abusing the use of multiple ZIP-archive-handling implementations was discovered by Bluebox Security and publicized as the ‘Master Key’ vulnerability [2]. A number of similar vulnerabilities have been discovered since then.
The bugs, and their exploitation, have been documented in various technical blogs (mentioned in the References section in detail). The key details of the three vulnerabilities that led to breaches of trust are briefly described below.
This vulnerability arises from the way in which the two ZIP handling modules (Java and C++) handle the occurrence of multiple files with the same name. It affects Android OS versions prior to 4.3 (JellyBean).
The Java implementation enumerates the ZIP file’s central directory file header and adds each entry to a LinkedHashMap. The filename is used as the key for the LinkedHashMap. In order to validate the ZIP entries, the Package Manager enumerates through the LinkedHashMap entries. If more than one entry with the same filename is added to the LinkedHashMap, it replaces the previous value. This means that only the last entry for a particular filename in the ZIP is validated.
The C++ implementation, used by the libdex (Installd) when encountering more than one entry with the same name, simply appends it to an in-memory data structure. An unchained hash table with linear probing is used as the lookup algorithm. So, in this case, when classes.dex is required to be extracted by the VM, the first matching entry is returned. This entry has never been validated by the JarVerifier.
The ZIP implementations processing the Local File Headers skip the FileName and ExtraField lengths specified in the LFH to locate the start of the compressed data. The vulnerability in Android OS versions prior to 4.4 (KitKat) is that the Java implementation treated this field as a signed integer, while the C++ implementation treated it correctly as an unsigned integer. If a very large unsigned integer is specified as the ExtraData size, it gets interpreted by the Java layer as a negative value. For example, an ExtraData size of 65,533 (0xFFFD) is interpreted as -3. So, while attempting to locate the start of the data, the read pointer ends up moving backwards. On the contrary, the C++ layer would jump forwards about 64KB to locate the data. This discrepancy in the two implementations can be exploited if one locates an APK containing a classes.dex file that is less than 64KB. Such a file (with padding as necessary) can be placed in the area following the LFH and the real file data. The Java layer ends up verifying the benign data, while the attacker-controlled classes.dex is extracted during installation.
This vulnerability is very similar to the one described above, with signed/unsigned treatment of the FileNameLen field in the Local File Header (see Figure 10).
The ZIP file format specifies at per-file level the compression algorithm (method) that the archiver has used to compress the file. This is a 16-bit field, i.e. 65,536 methods can be defined. Android’s application package file (APK), which uses the ZIP format, supports only two compression methods. One is without any compression, i.e. the STORED method (0x0000), and the other is the DEFLATE (0x0008) compression algorithm.
The Android OS makes certain assumptions when handling this field, and processes the ‘compressed’ content based on these assumptions. The pseudocode is given in Table 1.
Android version | C++ ZIP handling |
| |
---|---|---|---|
Android 4.4 and above | if Method=Stored RawDataCopy(...) else InflateAndCopy(...) |
| |
Android 4.3 and below | if Method=Stored RawDataCopy(...) else InflateAndCopy(...) |
|
Table 1. Pseudocode on processing ‘compressed’ content. (Refer to the Appendix for relevant Android ZIP-handling code snippets.)
In most cases, Android ZIP handling assumes the compression method to be DEFLATE if the method specified does not match with STORED. In earlier versions of Android (4.3 and below), Java ZIP-handling code checks against the method being DEFLATE, and assumes that the STORED method has been used if it does not match.
Anti-virus software typically handles archive file formats with more stringent checks than any unarchiver utility, or in this case the Android OS ZIP-handling layer. This behaviour can be abused by malicious files in order to circumvent the unarchiving process performed by the anti-virus scanning module.
A crafted APK can be constructed primarily by changing the compression method to a value other than STORED or DEFLATE. In this case, the Android OS will continue to treat it as either STORED or DEFLATE, but the AV scanner’s unarchiving module will be broken when attempting to process the compressed data as per the UNKNOWN method, thereby failing to detect the contents within the APK.
For the purpose of observing the effects of this malformation, repackaged APKs of DroidSheep [3] (a potentially unwanted Android app) were used to prepare PoCs. A significant number of anti-virus vendors detect versions of this app (typically on classes.dex). Table 2 shows the behaviour observed.
File name | Android OS <= 4.3 | Android OS >= 4.4 |
---|---|---|
Droidsheep_v15_DetectCheck.apk | Install – SUCCESS AV detection – SUCCESS | Install – SUCCESS AV detection – SUCCESS |
Droidsheep_v15_Crafted_43.apk | Install – SUCCESS AV detection – FAILED | Install – FAILED AV detection – FAILED |
Droidsheep_v15_Crafted_44.apk | Install – FAILED AV detection – FAILED | Install – SUCCESS AV detection – FAILED |
Table 2. Behaviour observed with the original and repackaged versions of DroidSheep.
Figure 11 and Figure 12 show Droidsheep_v15_DetectCheck.apk (the original APK without any modifications). The ZIP’s central directory header for classes.dex is highlighted, along with the compression method (Figure 12).
Figure 13 and Figure 14 show Droidsheep_v15_Crafted_44.apk (APK crafted to work with Android 4.4 and above). Here, changing the COMPRESSION_METHOD to SHRUNK (0x0001) allowed the data still to be treated as DEFLATED, and thus for installation to proceed on the Android OS. The method was modified to SHRUNK in both the ZIP’s local file header and the central directory header.
Finally, we look at Droidsheep_v15_Crafted_43.apk (APK crafted to work with Android 4.3 and below). It is trickier to construct an installable APK with an unhandled compression method considering the mutual reversal of checks in the C++ and Java ZIP-handling layers.
Crafting an installable APK involved the following steps:
Step 1: The directory structure required to construct the APK was prepared (see Figure 15).
Step 2: The classes.dex file to be included in the APK was compressed using an in-house-developed zlib deflate tool. The tool skips writing the zlib headers in the output compressed file, making it compatible with unzip’s inflate() function. The uncompressed version of classes.dex was replaced in the directory with the compressed version (Figure 16).
Step 3: The contents of the directory structure were then ZIP’ed in STORED mode. This resulted in all the files being packed without any compression (Figure 17).
Step 4: The APK was signed using the JarSigner tool. This resulted in the MANIFEST.MF and related files required for validation containing the SHA1 of the compressed classes.dex (rather than its uncompressed version, as is typically the case).
Step 5: The APK’s Central Directory Header and Local File Header then needed to be suitably modified, with the compression method changed to a value other than STORED or DEFLATE. In this case, we used the SHRUNK (0x0001) method. The uncompressed size was also changed to reflect the original uncompressed size of the classes.dex file (see Figure 18).
Installation of this crafted APK works on the target Android OS by satisfying the following criteria:
Java verification layer: This layer assumes that if the compression method specified is not DEFLATE, it is STORED. So when it encounters classes.dex with SHRUNK compression, it assumes the STORED method has been used, and matches the SHA1 hash (of the compressed data) against the one specified in the MANIFEST.MF file.
C++ ZIP layer: This layer assumes that if the compression method specified is not STORED, it is DEFLATE. So, when it encounters classes.dex with SHRUNK compression, it assumes the DEFLATE method has been used, extracting classes.dex to disk by uncompressing the pre-compressed data.
Android OS should place a stricter check on the compression method fields in order to block the installation of crafted APKs. This has been logged as Issue #69184.
Having identified the file as APK, anti-virus engines should choose either to heuristically flag the file if any unsupported compression method is specified in the Local File or Central Directory Header, or to extract the files based on assumptions similar to the ones implemented by the Android OS.
The ZIP file format forms the basis for various application packages, including Android Package (APK), Java Archive (JAR), Metro App (APPX) and Microsoft documents (DOCX) etc., which has created new challenges for the AV industry in recognizing the type of packages based on content. A package containing content from various package formats could be treated as APK/JAR/DOCX based on the application that processes it. Identifying the correct package type is critical for any automated analysis system that a security vendor might employ. However, an anti-virus scanner that defaults to extracting the ZIP contents as it sees it will not be affected by this concept package.
ZIP packages are typically checked for files with specific names at specific locations to help identify the package type.
Format | File names |
---|---|
JA | META-INF/MANIFEST.MF META-INF/*.SF META-INF/*RSA *.class |
APK | META-INF/MANIFEST.MF META-INF/*.SF META-INF/*RSA AndroidManifest.xml classes.dex |
DOCX | [Content_Types].xml Word docProps _rels |
APPX | AppxManifest.xml AppxBlockMap.xml |
Table 3. Packages are checked for specific filenames at specific locations.
Relevant file extensions, if available, are usually considered when making a decision on the package type. However, this may not always be the case.
An amalgamated package can easily be created that is valid (with appropriate extension) for the application that processes it (see Figure 19).
The same package can be installed as an Android app (with .apk extension), run as a regular Java app/applet (with .jar extension), opened by a document processor (with .docx extension), etc.
Figure 20 shows the directory tree structure from which we created the Chameleon ZIP. It contains a mix of files that are all part of the various file formats.
The APK and JAR files follow the same signing process, which makes them valid for the respective applications. The Android OS and JAR treat the irrelevant files just as other data files.
We observed that Microsoft Word has a requirement that it is able to identify and relate to all files within the amalgamated package renamed to DOCX. Therefore, attempting to open it in Word fails, with a message declaring the file to be corrupt. However, OpenOffice was able to successfully open and display the content without any errors or warnings.
Windows 8 Metro apps also require each known file’s integrity to be verified, so it may not be possible to cre-ate a valid APPX and APK/JAR due to the fact that the two latter formats require updating of independent files with hashes needed for validation. However, having the APPX-related files within the APK/JAR might be enough to throw off a package type analysing component.
Owing to its flexibility, versatility and popular use, the ZIP format has been the target of many types of manipulation in order to bypass anti-virus scanning and OS validation. Based on the history of the format’s misuse, and the multitude of major implementations that process the format, we expect this trend to continue. It is best that the anti-virus vendors remain on their toes.
Our internal analysis showed that a crafted APK with non-conformant compression method was affecting various anti virus vendors. Both their mobile and Windows products were affected. The technical details have been shared with the respective vendors.
The effect of Chameleon ZIP on automated systems was not evaluated due to practical restrictions. We suggest that the anti virus vendors evaluate their own systems based on the information provided in this paper.
[1] Sourced from http://en.wikipedia.org/wiki/ZIP_%28file_format%29.
[2] https://bluebox.com/technical/uncovering-android-master-key-that-makes-99-of-devices-vulnerable/.
C++ Source @ https://android.googlesource.com/platform/dalvik.git/+/android-4.4.2_r2/libdex/ZIPArchive.cpp
if (method == kCompressStored) { if (sysCopyFileToFile(fd, pArchive->mFd, uncompLen) != 0) goto bail; } else { if (inflateToFile(fd, pArchive->mFd, uncompLen, compLen) != 0) goto bail; }
Java Source @ https://android.googlesource.com/platform/libcore.git/+/android-4.4.2_r2/luni/src/main/java/java/util/zip/ZIPFile.java
if (entry.compressionMethod == ZIPEntry.STORED) { rafStream.endOffset = rafStream.offset + entry.size; return rafStream; } else { rafStream.endOffset = rafStream.offset + entry.compressedSize; int bufSize = Math.max(1024, (int) Math.min(entry.getSize(), 65535L)); return new ZIPInflaterInputStream(rafStream, new Inflater(true), bufSize, entry); }
C++ Source @ https://android.googlesource.com/platform/dalvik.git/+/android-4.2.2_r1/libdex/ZIPArchive.cpp
if (method == kCompressStored) { if (sysCopyFileToFile(fd, pArchive->mFd, uncompLen) != 0) goto bail; } else { if (inflateToFile(fd, pArchive->mFd, uncompLen, compLen) != 0) goto bail; }
Java Source @ https://android.googlesource.com/platform/libcore.git/+/android-4.2.2_r1/luni/src/main/java/java/util/zip/ZIPFile.java
if (entry.compressionMethod == ZIPEntry.DEFLATED) { int bufSize = Math.max(1024, (int)Math.min(entry.getSize(), 65535L)); return new ZIPInflaterInputStream(rafstrm, new Inflater(true), bufSize, entry); } else { return rafstrm;