Forcepoint Innovation Labs, UK
Excel Formula, or XLM – does it ever stop giving pain to researchers?
Last week I received a new sample using the xlsb file format that supposedly contained malicious code. I had a quick look, and wow – this was different. An initial check on VirusTotal (VT) showed that it hadn’t been uploaded to VT yet. So, with nothing to go on, I started looking into the sample.
Structurally, it’s a Microsoft Excel 2007+ document (ZIP) containing the following files:
Naturally, we look at the xl/macrosheets/sheet1.bin, right? First we need to enumerate these records. The xl/macrosheets/sheet1.bin looks like this:
How are the records stored? The answer is in Microsoft’s documentation. To establish the recordId, you read the first byte (0x81). Since the high-bit (0x80) is set, this means there is another byte to add to the recordId. Remove this bit for now and we get 0x01. The next byte is 0x01 and as the high-bit (0x80) isn’t set, we can use the value of the byte multiplied with 0x80. This means that the recordId is (1*128)+1 = 129 – which is BrtBeginSheet. To get the length you do the same, read the next byte (0x00) which means there is no high-bit (0x80), so there is no other byte. The rest of the seven bits say 0, so the record has no data.
The next record is BrtWsProp with recordId 147 and length 23.
recordId: (0x93 & 0x7F) + (0x01*0x80) = 147 (0x93)
length: (0x17 & 7x7F) = 23 (0x17)
Now you can parse all the records and get a nice list. Unfortunately, when parsing the records of the xl/macrosheets/sheet1.bin I see nothing unusual.
So we move on to look at the other sheets, what can we find here? Quite a lot actually. The records we are interested in, while we learn, are:
RecordId | Name | Description |
0 | BrtRowHdr | Tells you what row you currently are on |
8 | BrtFmlaString | Tells you about an embedded string and the pcode (parsed expression) to build this string |
11 | BrtFmlaError | Tells you the pcode (parsed-expression) |
Let’s have a brief look at the data we need.
Microsoft has documented this well in a PDF. To start with, it contains an eight-byte cell information structure, a variable XLWideString (which looks like a Unicode string), two bytes of grbitFlags, and then you get to the formula itself (CellParsedFormula structure).
The first one you’ll find is this:
After decoding, we get this:
RECORD: BrtFmlaString (Id 8,offset 58d), LENGTH: 30
col: 26, row: 20 | strlen=1 : "/"
1E 2F 00 PtgInt: 47
41 6F 00 PtgFunc: CHAR (111)
The record has no information about the row, so you need to get this from the BrtRowHdr record. When you get to the CellParsedFormula structure you parse it (as mentioned in my previous article).
This record also starts with a eight-byte cell structure, then a one-byte fErr, and two-byte grbitFlags before you reach the formula itself (CellParsedFormula structure).
When you parse the first record of this stream you’ll get:
RECORD: BrtFmlaError (Id 11,offset 1e3), LENGTH: 62
49 27 00 PtgMemFunc: 27
19 40 00 01 PtgAttrSpace: 0100
23 04 00 00 00 PtgName: index 4
23 14 00 00 00 PtgName: index 20
0F PtgIsect:
23 5D 00 00 00 PtgName: index 93
0F PtgIsect:
23 46 00 00 00 PtgName: index 70
0F PtgIsect:
23 15 00 00 00 PtgName: index 21
0F PtgIsect:
23 2F 00 00 00 PtgName: index 47
0F PtgIsect:
13 PtgUminus:
This is a simple structure, but for now we just want the row.
The first DWORD gives you the sequence you need (in this case, 2).
When you have parsed all these records from all these binary worksheets, you’ll end up with a virtual sheet that looks like this:
This is more informative, but it was a bit of work to get there. At least it is context you can relate to.
As I am writing this article I see that VT has received a copy of the sample, and that when it was first checked (on entry) a single engine was detecting it:
Kudos to Ikarus!
I think my little project is over – when I have a problem like this I can’t let it go until it’s solved, but now I can finally relax!
Get in touch with me if you need help! I think tools should give this kind of context automagically.