Flibi: evolution

2011-05-01

Peter Ferrie

Microsoft, USA

Editor: Helen Martin

Abstract

The Flibi virus demonstrated that a virus can carry its own ‘genetic code’, and if the building blocks of that code are mutated in some way, then interesting behaviours can arise. In this article Peter Ferrie demonstrates how some of Flibi's instructions might be removed by replacing them with functionally equivalent code sequences.

The Flibi virus demonstrated that a virus can carry its own ‘genetic code’ (see VB, March 2011, p.4), and if the codons [1] (the p-code form of the virus), the tRNA [2] (the translator function), or the corresponding amino acids [3] (the native code) are mutated in some way, then interesting behaviours can arise.

Each codon is used as a relative offset into a table of amino acids. There is a single pointer to the table. Mutation of a codon might cause a new amino acid to be produced, since it might now point to a different entry in the table. Mutation of the pointer would almost certainly be fatal since many codons would not be translated into the correct amino acids. Mutation of the amino acid itself might produce new behaviour, depending on the change. For example, a shift could become a rotate.

The virus has the ability to move a sequence of codons to a later position in the stream, and then fill the gap with no-operation instructions. (There is a bug in this code, which can result in attempting to copy more bytes than exist in the source.) In most cases, this simply results in the replacement of the codons at the destination. (There is an additional case where the destination is the same as the source, in which case the codons are deleted.) Of course, if the selected sequence appears at the end of the defined stream (there is a lot of slack space after the last meaningful codon), then the size of the defined stream will increase slightly each time that condition occurs. However, the size of the buffer remains fixed. Therefore, new sequences can only appear when the translator code is modified to increase the number of codons that are translated, thus ‘translating’ garbage beyond the original end of the stream. That garbage could potentially be modified over time to eventually produce meaningful functionality. Its location in the virus body would change over time as a result of the codon deletion, allowing the new amino acids to ‘migrate’ to a final position where they become truly useful. The human eye did not spring fully formed from the dust of the earth but was the result of gradual refinements in image accuracy. Something similar can occur here, where the sequence of amino acids does not need to work completely (or even at all) in order to be useful (or just retained). As unlikely as these things are, millions of computer years from now, we might see some of the following transformations.

The aim of this article is to demonstrate how some instructions from the original set might be removed by replacing them with functionally equivalent code sequences using the remaining instructions. One advantage of a smaller instruction set is that it allows an increase in the number of codons that can map to a single amino acid, thus making the body more resilient to corruption. Further, a sequence of instructions has a smaller risk of lethal mutation than a single instruction, because the risk is spread over a wider area.

We begin with a brief overview of the language itself. There are 45 commands in the release version (there were only 43 in the preview version). There are three general-purpose registers (‘A’, ‘B’ and ‘D’, which correspond to the ‘eax’, ‘ebp’ and ‘edx’ CPU registers); one temporary register, upon which all operations are performed (which corresponds to the ‘ebx’ CPU register); one ‘operator’ register, which holds the value for any operation that requires a parameter (which corresponds to the ‘ecx’ CPU register); and two buffer registers, one of which holds the destination for branching instructions (which corresponds to the ‘esi’ CPU register), and the other holds the destination for write instructions (which corresponds to the ‘edi’ CPU register).

The language supports the following commands:

_nopsA, _nopsB, _nopsD, _nopdA, _nopdB, _nopdD
_saveWrtOff, _saveJmpOff
_writeByte, _writeDWord
_save, _addsaved, _subsaved
_getDO, _getdata, _getEIP
_push, _pop, _pushall, _popall
_zer0
_mul, _div, _shl, _shr, _and, _xor
_add0001, _add0004, _add0010, _add0040, _add0100, _add0400, _add1000, _add4000, _sub0001
_nopREAL
_JnzUp, _JzDown, _JnzDown
_CallAPILoadLibrary, _CallAPIMessageBox, _CallAPISleep (release version), _call
_null (release version, it has no actual name)

This set can be reduced in several ways. The most obvious candidates for removal are the three API calls (two in the preview version - the ‘_CallAPISleep’ command was added to the release version because the API resolver code could not resolve the Sleep() API on certain platforms. The reason is described in detail in the previous article (VB, March 2011, p.4). The APIs can be called using the ‘_call’ command if the API addresses are placed in the data section in this way:

_getDO   ;get data offset
_addnnnn ;adjust ebx as appropriate to reach the required offset
_call    ;call the API

This leaves 42 commands remaining (41 in the preview version).

The ‘_zer0’ command can be removed by using this code:

_save ;ecx = ebx
_xor  ;ebx = 0

41 (40) commands now remain.

The ‘_subsaved’ command (which performs the action ‘ebx = ebx – ecx’) can be removed, and the ‘_addsaved’ command (which performs the action ‘ebx = ebx + ecx’) can be used instead, with a slight change. Specifically, the new value of the ‘ecx’ register is ‘-ecx’ (such that ‘ebx = ebx + -ecx’). However, there is no negate command, so an equivalent result must be achieved using the combination of operations that perform a ‘not’ and an ‘add 1’. The problem is that a ‘not’ operation uses the value ‘0xffffffff’, which requires many steps to construct. Given the existing instruction set, it would be simplest to place the value ‘0xffffffff’ in the data section. (It would be even simpler to introduce an instruction which performs a ‘mov ebx, 0xffffffff’.) It must be placed at the start of the data section, because the ‘_addnnnn’ commands can be removed, leaving no way to select another offset. This algorithm can then be used:

xor  ebx, 0xffffffff
inc  ebx

which we translate into this code:

_push
_getDO     ;get data offset
_getdata   ;fetch 0xffffffff
_xor       ;logically ‘not’ ebx
_add0001   ;increment result to complete negate
_save      ;replace ecx
_pop
_addsaved  ;ebx = ebx + ecx

40 (39) commands remain.

In the same way, the ‘_sub0001’ command can be removed by using this code:

_push
_getDO     ;get data offset
_getdata   ;ebx = 0xffffffff
_save      ;ecx = 0xffffffff
_pop
_addsaved  ;ebx = ebx - 1

39 (38) commands remain.

The ‘_addnnnn’ commands exist for convenience, but all of the commands apart from ‘_add0001’ can be constructed using the ‘_add0001’ command. Thus, the ‘_add0004’, ‘_add0010’, ‘_add0040’, ‘_add0100’, ‘_add0400’, ‘_add1000’ and ‘_add4000’ commands can be removed.

32 (31) commands remain.

The ‘_add0001’ command can also be removed, because the number ‘1’ can be recovered from the value ‘0xffffffff’ by using this code:

_getDO     ;get data offset
_getdata   ;ebx = 0xffffffff
_save      ;ecx = 0xffffffff

 ;here is a horrible trick:
 ;modern CPUs limit the shift-count to 0x1f by taking
 ;the low five bits for the count and simply discarding
 ;the rest of the value internally, this performs a 
 ;cl & 0x1f and it’s exactly what we need

_shr       ;ebx = ebx >> cl
_save      ;ecx = 1

From then on, the ‘_addsaved’ command can be used to increment the ‘ebx’ register as needed.

31 (30) commands remain.

Of course, it would require very many uses of the ‘_addsaved’ command in order to construct large values, but value construction can be accelerated by using the ‘_shl’ and ‘_xor’ commands.

For example, constructing the value ‘2’ is a matter of the following:

_shl     ;ebx = ebx << cl (ebx and ecx are ‘1’ from above)

Constructing the value ‘3’, beginning with the ‘ebx’ and ‘ecx’ registers holding the value ‘1’, as above, is a matter of the following:

_shl     ;ebx = ebx << cl
_xor     ;ebx = ebx ^ ecx

Constructing the value ‘4’, beginning with the ‘ebx’ and ‘ecx’ registers holding the value ‘1’, as above, is a matter of the following:

_shl    ;ebx = ebx << cl
_shl    ;ebx = ebx << cl

And so on. Given this algorithm, we can see that the value ‘0xffffffff’ is not the only possible ‘base constant’. The value ‘1’ could be used instead, since the value ‘0xffffffff’ could be produced from it in the following way:

_getDO     ;get data offset
_getdata   ;ebx = 1
_save      ;ecx = 1
_shl       ;ebx = 2
_xor       ;ebx = 3
_shl       ;ebx = 6
_xor       ;ebx = 7
_shl       ;ebx = 0x0e
_xor       ;ebx = 0x0f

... [54 steps]

_shl      ;ebx = 0xfffffffe
_xor      ;ebx = 0xffffffff
_save     ;ecx = 0xffffffff

Clearly, it is far simpler to go from ‘0xffffffff’ to ‘1’ than the other way around. Note that values can also be constructed using the ‘reverse’ of this technique, to reduce the number of shifts required. For example, constructing the value ‘0x80000000’ is a matter of the following:

_getDO     ;get data offset
_getdata   ;ebx = 0xffffffff
_save      ;ecx = 0xffffffff
_shl       ;ebx = 0x80000000

Constructing the value ‘0x40000000’ is a matter of the following:

_push
_shr       ;ebx = ebx >> cl (ebx = 0x80000000,  
           ;ecx = 0xffffffff from above)
_save      ;ecx = 1
_pop
_shr       ;ebx = 0x40000000

However, setting additional bits in the upper region requires more than just the ‘_xor’ command. Here are two examples that set the same value, one using the ‘_shl’ command and one using the ‘_shr’ command. To construct a value such as ‘0xf0000000’, beginning with the ‘ebx’ register holding the value ‘0x80000000’ and the ‘ecx’ register holding the value ‘0xffffffff’, as above, the following can be used:

_push
_shr       ;ebx = ebx >> cl
_save      ;ecx = 1
_pop       ;ebx = 0x80000000 again
_push
_shr       ;ebx = 0x40000000
_push
_shr       ;ebx = 0x20000000
_push
_shr       ;ebx = 0x10000000
_save      ;ecx = 0x10000000
_pop
_xor       ;ebx = 0x30000000
_pop
_xor       ;ebx = 0x70000000
_pop
_xor       ;ebx = 0xf0000000

Whereas, to construct the value ‘0xf0000000’, beginning with the ‘ebx’ and ‘ecx’ registers holding the value ‘0xffffffff’, as above, the following can be used:

_push
_shr       ;ebx = ebx >> cl
_save      ;ecx = 1
_shl       ;ebx = 2
_xor       ;ebx = 3
_shl       ;ebx = 6
_xor       ;ebx = 7
_shl       ;ebx = 0x0e
_shl       ;ebx = 0x1c
_save      ;ecx = 0x1c
_pop
_shl       ;ebx = 0xf0000000

Thus, depending on the value, the ‘_shl’ method is the simplest.

Astute readers will have noticed that none of the value constructions above use the ‘_addsaved’ command. This shows that constants can be constructed without using any form of ‘add’. However, it is also possible to perform the addition of arbitrary values without using any form of ‘add’, resulting in the removal of the ‘_addsaved’ command by using this algorithm (edx and ebp holding the values to add together):

eax = edx ^ ebp
do
{
     ebp = (ebp & edx) << 1
     edx = eax
     eax = edx ^ ebp
}
while (edx & ebp)
ebx = eax

which we translate into this code:

[construct here the first value to add, not shown]
_nopdD     ;edx = ebx
[construct here the second value to add, not shown]
_nopdB     ;ebp = ebx
_save      ;ecx = ebx
_nopsD     ;ebx = edx
           ;optional, depending on the order of the constructions above
_xor       ;ebx = edx ^ ebp
_nopdA     ;eax = edx ^ ebp
_getEIP
_push      ;top of do-while loop
           ;ebx points to a hidden ‘pop ebx’ instruction as part of 
           ;_getEIP so there is no explicit ‘pop’ instruction inside
           ;the loop that corresponds to this ‘push’ instruction
_saveJmpOff  ;esi = ebx
_nopsD  ;ebx = edx
_save        ;ecx = edx
_nopsB       ;ebx = ebp
_and         ;ebx = ebp & edx
_push
_getDO       ;get data offset
_getdata     ;ebx = 0xffffffff
_save        ;ecx = 0xffffffff
_shr         ;ebx = 1
_save        ;ecx = 1
_pop
_shl         ;ebx = (edx & ebp) << 1
_nopdB       ;ebp = (edx & ebp) << 1
_save        ;ecx = ebx
_nopsA       ;ebx = eax
_nopdD       ;edx = eax
_push
_xor         ;ebx = edx ^ ebp
_nopdA       ;eax = edx ^ ebp
_pop
_and         ;ebx = edx & ebp
_JnzUp       ;loop while ((edx & ebp) != 0)
_pop         ;discard loop address
_nopsA       ;ebx = eax

30 (29) commands remain.

The replacement code for the ‘_addsaved’ command requires the use of the base constant from the data section (and here, the value ‘1’ would result in shorter code).

The value ‘1’ can be constructed dynamically instead, in the following way:

_getEIP
_getdata   ;ebx=0xxxxxxx5b
_save      ;ecx=0xxxxxxx5b
_shl
_shr       ;ebx=0x1b
_save      ;ecx=0x1b
_addsaved  ;ebx=0x36
_addsaved  ;ebx=0x51
_addsaved  ;ebx=0x6c
_addsaved  ;ebx=0x87
_save      ;ecx=0x87
_shr       ;ebx=1

However, that algorithm prevents the removal of the ‘_addsaved’ command. The two concepts seem to be mutually exclusive.

It is unclear whether the ‘_nopREAL’ command could be removed, since there is no other single-byte command that might take its place in the event that a true ‘no-operation’ command were required. Its current purposes are to pad the unused slots following codon deletion and to fill the unused slot(s) that follow the ‘_JnzDown’ command (since the ‘_JnzDown’ command skips three slots). Note that the current implementation of the ‘_JnzDown’ command contains a bug, which is that the destination of the branch is not the start of a slot. Instead, the command branches to two bytes past the start of the slot. The result is that the ‘_nopREAL’ command must be used to fill that destination slot, otherwise a crash could occur because the branch might land in the middle of a command. However, the ‘_JnzDown’ command can be removed by using alternative code, and any non-stack and non-memory instruction can be used for tail padding. Thus it appears that, given its current uses, the ‘_nopREAL’ command can be removed.

29 (28) commands remain.

In the release version a ‘_null’ command exists, which emits a single zero into the stream, followed by the ‘nop’ padding. Its existence is the result of a bug. The execution of such an instruction is likely to cause an exception. It is possible on Windows XP and later to register a vectored exception handle using the existing language, and that could intercept the exception, but this is quite outside the ‘style’ of the language. The command can be removed without any problem.

28 commands remain.

The ‘_JnzDown’ command could be removed by using a careful implementation of ‘_JnzUp’ (given that the meaning is reversed), but perhaps not without the loss of some functionality. It requires knowledge of the location of a forward branch destination. This interferes with command reordering if the buffer size is fixed, because there might not be enough slots available to construct the required ‘add’ value (unless the maximum number of slots was reserved each time in order to construct any possible number). It does, however, extend the functionality in a different way, since the ‘_JnzDown’ command can skip only three commands at a time, requiring its use multiple times in order to execute larger conditional blocks. The ‘_JnzDown’ command also places severe restrictions on what can appear in those conditional blocks, since an arithmetic operation might clear the Z flag, causing the branch to be taken instead of skipped. In contrast, the use of the ‘_JnzUp’ command can skip an arbitrary number of commands without restriction. The difference can be demonstrated easily. We begin with some code that calls the GetTickCount() API to fetch a ‘random’ number (for ease of demonstration, the offset of the GetTickCount() API is set arbitrarily to the value ‘0x0c’), using the ‘_JnzDown’ command:

;construct pointer to GetTickCount()
;construct the value “0x0c”
_getDO     ;get data offset
_getdata   ;ebx = 0xffffffff
_save      ;ecx = 0xffffffff
_shr       ;ebx = 1
_save     ;ecx = 1
_shl      ;ebx = 2
_xor      ;ebx = 3
_shl      ;ebx = 6
_shl      ;ebx = 0x0c
_nopdB    ;ebp = 0x0c
_save     ;ecx = 0x0c
;add to data offset
_getDO    ;get data offset
_nopdD    ;edx = data offset
_xor      ;ebx = edx ^ ebp
_nopdA    ;eax = edx ^ ebp
_getEIP
_push    ;top of do-while loop
         ;ebx points to a hidden ‘pop ebx’ instruction as part 
         ;of _getEIP so there is no explicit ‘pop’ instruction 
         ;inside the loop that corresponds to this ‘push’ instruction
_saveJmpOff      ;esi = ebx
_nopsD     ;ebx = edx
_save     ;ecx = edx
_nopsB    ;ebx = ebp
_and      ;ebx = ebp & edx
_push
_getDO   ;get data offset
_getdata ;ebx = 0xffffffff
_save    ;ecx = 0xffffffff
_shr     ;ebx = 1
_save    ;ecx = 1
_pop
_shl     ;ebx = (edx & ebp) << 1
_nopdB   ;ebp = (edx & ebp) << 1
_save    ;ecx = ebx
_nopsA   ;ebx = eax
_nopdD   ;edx = eax
_push
_xor     ;ebx = edx ^ ebp
_nopdA   ;eax = edx ^ ebp
_pop
_and     ;ebx = edx & ebp
_JnzUp   ;loop while ((edx & ebp) != 0)
_pop     ;discard loop address
_nopsA   ;ebx = eax
;call GetTickCount()
_call

Then the choice is made, and the branch might be taken (seven in eight chances to take it):

;construct the value ‘7’
_getDO     ;get data offset
_getdata   ;ebx = 0xffffffff
_save      ;ecx = 0xffffffff
_shr       ;ebx = 1
_save      ;ecx = 1
_shl       ;ebx = 2
_xor       ;ebx = 3
_shl       ;ebx = 6
_xor       ;ebx = 7
_save      ;ecx = 7
;’and’ with result from GetTickCount()
_nopsA
_and       ;ebx = ebx & 7
_JnzDown
[conditional command 1]
[conditional command 2]
[conditional command 3]
_nopREAL   ;work around ‘_JnzDown’ bug

The replacement code might look something like this, beginning immediately after the call to the GetTickCount() API:

;save result from GetTickCount()
  _nopsA
  _push
  ;construct pointer to l2
  _getDO   ;get data offset
  _getdata ;ebx = 0xffffffff
  _save    ;ecx = 0xffffffff
  _shr     ;ebx = 1
  _save    ;ecx = 1
  ... [‘_shl’ and ‘_xor’ as needed to produce the value ((lines(l1...l2) * 8) + 3)]
  _nopdB   ;ebp = offset of l2
  _save    ;ecx = offset of l2
  _getEIP
l1:  _nopdD  ;edx = eip
  _xor     ;ebx = edx ^ ebp
  _nopdA   ;eax = edx ^ ebp
  _getEIP
  _push    ;top of do-while loop ebx points to a hidden ‘pop ebx’                          ;instruction as part of _getEIP                         ;so there is no explicit ‘pop’                           ;instruction inside the loop                   ;that corresponds to this ‘push’                            ;instruction
  _saveJmpOff    ;esi = ebx
  _nopsD   ;ebx = edx
  _save    ;ecx = edx
  _nopsB   ;ebx = ebp
  _and     ;ebx = ebp & edx
  _push
  _getDO   ;get data offset
  _getdata ;ebx = 0xffffffff
  _save    ;ecx = 0xffffffff
  _shr     ;ebx = 1
  _save    ;ecx = 1
  _pop
  _shl     ;ebx = (edx & ebp) << 1
  _nopdB   ;ebp = (edx & ebp) << 1
  _save    ;ecx = ebx
  _nopsA   ;ebx = eax
  _nopdD   ;edx = eax
  _push
  _xor     ;ebx = edx ^ ebp
  _nopdA   ;eax = edx ^ ebp
  _pop
  _and     ;ebx = edx & ebp
  _JnzUp   ;loop while ((edx & ebp) != 0)
  _pop     ;discard loop address
  _nopsA   ;ebx = eax
  _saveJmpOff
  ;restore result from GetTickCount()
  _pop
  _nopdA   ;eax = GetTickCount()
  ;construct the value ‘7’
  _getDO   ;get data offset
  _getdata ;ebx = 0xffffffff
  _save    ;ecx = 0xffffffff
  _shr     ;ebx = 1
  _save    ;ecx = 1
  _shl     ;ebx = 2
  _xor     ;ebx = 3
  _shl     ;ebx = 6
  _xor     ;ebx = 7
  _save    ;ecx = 7
  ;’and’ with result from GetTickCount()
  _nopsA   ;ebx = GetTickCount()
  _and     ;ebx = ebx & 7
  _JnzUp
  [conditional command 1]
  [conditional command 2]
  [conditional command 3]
  ...
  [conditional command n]
l2:      ;branch destination is here

27 commands remain. I

n the same way as for the ‘_JnzDown’ command, the ‘_JzDown’ command can be removed.

26 commands remain.

Normally, the ‘ecx’, ‘esi’ and ‘edi’ registers are write-only (technically, the ‘ecx’ register only becomes write-only after the ‘_addsaved’ command is removed), leaving the ‘eax’, ‘ebx’, ‘edx’ and ‘ebp’ registers as general-purpose registers. However, there is a way to read these registers again after they have been written. The ‘_pushall’ command pushes the registers onto the stack in this order: eax, ecx, edx, ebx, esp, ebp, esi, edi. The registers can then be popped individually from the stack, by using the ‘_pop’ command, in the following way:

_pushall   ;save all registers
_pop   ;edi
_pop   ;esi
_pop   ;ebp
_pop   ;esp (useful for reading stack parameters, using the 
       ;‘_getdata’ command, see below)
_pop   ;ebx
_pop   ;edx
_pop   ;ecx
_pop   ;eax

A smaller set of ‘_pop’ commands can be used to access particular registers, leaving the others for removal later, if necessary. The popped registers can also be modified and pushed back onto the stack, allowing the ‘_popall’ command to be used to pop all of them. This allows multiple values to be assigned simultaneously. By combining several of these tricks, it becomes possible to remove the ‘_mul’ command (edx:eax = eax * ebx). A working solution can be downloaded from http://pferrie.tripod.com/misc/flibi_mul.zip.

25 commands remain.

Interestingly, by reordering the register initialization code for the first addition block to remove one instruction, the code actually increases in size because the branch to l4 requires more instructions to construct it as a result. This brings us to a special-case problem of dynamic pointer construction. There is a particular problem when the code at l2 branches to l4 and the code at l3 branches to l1, but where l1 < l2 and l4 > l3, as shown here:

l1:  [code]
l2:  jz    l4
l3:  jnz   l1
l4:  [code]

First, construct the branch from l2 to l4:

l1:  [code]
     ;construct relative to l2 (two instructions)
     mov   reg, 1
     shl   reg, 1
l2:  jz    l2+reg
l3:  jnz   l1
l4:
[code]

Then construct the branch from l3 to l1:

l1:  [code]
     mov   reg, 1
     shl   reg, 1
l2:  jz    l2+reg
     ;construct relative to l3 (four lines)
     ;[code] at l1 is a single instruction to keep         ;the example simple
l3:  mov   reg, 1
     shl   reg, 1
     shl   reg, 1
     jnz   l3-reg
l4:  [code]

Now the branch at l2 is affected, and no longer points to l4, so reconstruct it:

l1:  [code]
     ;construct relative to l2 (five instructions)
     mov   reg, 1
     shl   reg, 1
     shl   reg, 1
     add   reg, 1
l2:  jz    l2+reg
l3:  mov   reg, 1
     shl   reg, 1
     shl   reg, 1
     jnz   l3-reg
l4:  [code]

But now the branch at l3 is affected, and no longer points to l1, so reconstruct it:

l1:  [code]
     mov   reg, 1
     shl   reg, 1
     shl   reg, 1
     add   reg, 1
l2:  jz    l2+reg
     ;construct relative to l3 (six instructions)
l3:  mov   reg, 1
     shl   reg, 1
     add   reg, 1
     shl   reg, 1
     jnz   l3-reg
l4:  [code]

Again, the branch at l2 is affected and no longer points to l4, so reconstruct it:

l1:  [code]
     ;construct relative to l3 (six instructions)
     mov   reg, 1
     shl   reg, 1
     add   reg, 1
     shl   reg, 1
l2:  jz    l2+reg
l3:  mov   reg, 1
     shl   reg, 1
     add   reg, 1
     shl   reg, 1
     jnz   l3-reg
l4:  [code]

Finally, the instructions are reordered but not inserted, and the combination works. The limitation is that the lines in the construction must converge on a multiple of each other. Such a value might not exist without the explicit insertion of ‘alignment’ lines. The ‘_nop’ command could be used for this purpose, but any ‘harmless’ instruction can be used, such as moving to/from the same register from/to which a value was just moved (more specifically, if the previous move instruction was from ebx to eax, then it is harmless to move from eax back into ebx). By combining several of these tricks, it becomes possible to remove the ‘_div’ command (eax, edx = edx:eax / ebx) as well. A working solution can be downloaded from http://pferrie.tripod.com/misc/flibi_div.zip.

24 commands remain.

The ‘_writeDWord’ command can be removed by using this algorithm:

mov  [edi], bl
inc  edi
shr  ebx, 8
mov  [edi], bl
inc  edi
shr  ebx, 8
mov  [edi], bl
inc  edi
shr  ebx, 8
mov  [edi], bl

which we translate into this code:

;construct the value ‘8’
_push
_getDO     ;get data offset
_getdata   ;ebx = 0xffffffff
_save      ;ecx = 0xffffffff
_shr       ;ebx = 1
_save      ;ecx = 1
_shl       ;ebx = 2
_shl       ;ebx = 4
_shl       ;ebx = 8
;save in ecx for later
_save      ;ecx = 8
_pop
;write byte 0
_writeByte
;increment edi
_pushall
_getDO     ;get data offset
_getdata   ;ebx = 0xffffffff
_save      ;ecx = 0xffffffff
_shr       ;ebx = 1
_nopdB     ;ebp = 1
_save      ;ecx = 1
_pop       ;ebx = edi
_nopdD     ;edx = edi
_xor       ;ebx = edx ^ ebp
_nopdA     ;eax = edx ^ ebp
_getEIP
_push     ;top of do-while loop
          ;ebx points to a hidden ‘pop ebx’ instruction as part 
          ;of _getEIP so there is no explicit ‘pop’ instruction 
          ;inside the loop that corresponds to this ’push’ instruction
_saveJmpOff  ;esi = ebx
_nopsD       ;ebx = edx
_save        ;ecx = edx
_nopsB       ;ebx = ebp
_and         ;ebx = ebp & edx
_push
_getDO       ;get data offset
_getdata     ;ebx = 0xffffffff
_save        ;ecx = 0xffffffff
_shr         ;ebx = 1
_save        ;ecx = 1
_pop
_shl         ;ebx = (edx & ebp) << 1
_nopdB       ;ebp = (edx & ebp) << 1
_save        ;ecx = ebx
_nopsA       ;ebx = eax
_nopdD       ;edx = eax
_push
_xor         ;ebx = edx ^ ebp
_nopdA       ;eax = edx ^ ebp
_pop
_and         ;ebx = edx & ebp
_JnzUp       ;loop while ((edx & ebp) != 0)
_pop         ;discard loop address
_nopsA       ;ebx = eax
;update edi
_push
_popall      ;edi = eax and rebalance stack
;shift ebx right by 8
_shr         ;ebx = ebx >> 8
;write byte 1
_writeByte
[repeat twice more, beginning with ‘increment edi’ from above, to write the remaining bytes]

Of course, if there were a command to write a new value for the stack pointer, then the stack could be moved to the destination address, and a ‘_push’ command could be used to write the value. However, there would need to be a corresponding command to read the previous value for the stack pointer in order to restore it afterwards. This is quite outside the ‘style’ of the language.

23 commands remain.

Another instruction that can be removed is the ‘_call’ command. A subroutine call is equivalent to pushing the return address onto the stack, and then jumping to the location of the subroutine. It can be replaced by the ‘_JnzUp’ command in the following way (again, calling the GetTickCount() API, as above):

;construct pointer to l2
  _getDO   ;get data offset
  _getdata ;ebx = 0xffffffff
  _save    ;ecx = 0xffffffff
  _shr     ;ebx = 1
  _save    ;ecx = 1
  ... [‘_shl’ and ‘_xor’ as needed to produce the value ((lines(l1...l2) * 8) + 3)]
  _nopdB   ;ebp = offset of l2
  _save    ;ecx = offset of l2
  _getEIP
l1:  _nopdD  ;edx = eip
  _xor       ;ebx = edx ^ ebp
  _nopdA     ;eax = edx ^ ebp
  _getEIP
  _push   ;top of do-while loop
          ;ebx points to a hidden ‘pop ebx’ instruction as part 
          ;of _getEIP so there is no explicit ’pop’ instruction inside 
          ;the loop that corresponds to this ‘push’ instruction
  _saveJmpOff    ;esi = ebx
  _nopsD   ;ebx = edx
  _save    ;ecx = edx
  _nopsB   ;ebx = ebp
  _and     ;ebx = ebp & edx
  _push
  _getDO   ;get data offset
  _getdata ;ebx = 0xffffffff
  _save    ;ecx = 0xffffffff
  _shr     ;ebx = 1
  _save    ;ecx = 1
  _pop
  _shl     ;ebx = (edx & ebp) << 1
  _nopdB   ;ebp = (edx & ebp) << 1
  _save    ;ecx = ebx
  _nopsA   ;ebx = eax
  _nopdD   ;edx = eax
  _push
  _xor     ;ebx = edx ^ ebp
  _nopdA   ;eax = edx ^ ebp
  _pop
  _and     ;ebx = edx & ebp
  _JnzUp   ;loop while ((edx & ebp) != 0)
  _pop     ;discard loop address
  _nopsA   ;ebx = eax
  ;save return address on stack
  _push
  ;construct pointer to GetTickCount()
  _getDO   ;get data offset
  _getdata ;ebx = 0xffffffff
  _save    ;ecx = 0xffffffff
  _shr     ;ebx = 1
  _save    ;ecx = 1
  _shl     ;ebx = 2
  _xor     ;ebx = 3
  _shl     ;ebx = 6
  _shl     ;ebx = 0x0c
  _nopdB   ;ebp = 0x0c
  _save    ;ecx = 0x0c
  _getDO   ;get data offset
  _nopdD   ;edx = data offset
  _xor     ;ebx = edx ^ ebp
  _nopdA   ;eax = edx ^ ebp
  _getEIP
  _push   ;top of do-while loop
          ;ebx points to a hidden ‘pop ebx' instruction as part 
          ;of _getEIP so there is no explicit ‘pop’ instruction inside 
          ;the loop that corresponds to this ‘push’ instruction
  _saveJmpOff    ;esi = ebx
  _nopsD   ;ebx = edx
  _save    ;ecx = edx
  _nopsB   ;ebx = ebp
  _and     ;ebx = ebp & edx
  _push
  _getDO   ;get data offset
  _getdata ;ebx = 0xffffffff
  _save    ;ecx = 0xffffffff
  _shr     ;ebx = 1
  _save    ;ecx = 1
  _pop
  _shl     ;ebx = (edx & ebp) << 1
  _nopdB   ;ebp = (edx & ebp) << 1
  _save    ;ecx = ebx
  _nopsA   ;ebx = eax
  _nopdD   ;edx = eax
  _push
  _xor     ;ebx = edx ^ ebp
  _nopdA   ;eax = edx ^ ebp
  _pop
  _and     ;ebx = edx & ebp
  _JnzUp   ;loop while ((edx & ebp) != 0)
  _pop     ;discard loop address
  _nopsA   ;ebx = eax
  _getdata ;ebx = offset of GetTickCount()
  _saveJmpOff    ;esi = offset of GetTickCount()
  ;clear Z flag
  _save    ;ecx = ebx
  _and     ;ebx = ebx & ebx (known non-zero from above)
  ;jump to GetTickCount()
  _JnzUp
l2:  ;execution resumes here

Local subroutines can be called in the same way; however there is no ‘return’ command. The equivalent for a ‘return’ command is the following:

;retrieve return address from stack
_pop
_saveJmpOff      ;esi = return address
;clear Z flag, if required
_save      ;ecx = ebx
_and       ;ebx = ebx & ebx (known non-zero from above)
;return to caller
_JnzUp

22 commands remain.

The following are two useful tricks just for the sake of interest. The first one demonstrates how to read parameters directly from the stack:

[push parameters here, not shown]
_pushall
_pop       ;edi
[_nopdA    ;eax = edi, if needed]
_pop       ;esi
[_nopdD    ;edx = esi, if needed]
_pop       ;ebp (discard)
_pop       ;esp
_push      ;esp
_push      ;ebp
[_nopsD    ;ebx = original esi, if needed]
_push
[_nopsA     ;ebx = original edi, if needed]
_push
_popall     ;ebp = esp
[add to ebp as needed to reach required variable]
_nopsB      ;ebx = ebp
_getdata    ;read from stack

Then, simply by replacing the ‘_getdata’ command with the ‘_call’ command, function pointers on the stack can be called.

The ‘_push’ command can be removed, but the replacement code is ugly. It would look like this:

_nopdA     ;place into eax in order to appear at the top of the stack
_pushall
_pop       ;discard edi
_pop       ;discard esi
_pop       ;discard ebp
_pop       ;discard esp
_pop       ;discard ebx
_pop       ;discard edx
_pop       ;discard ecx
;eax remains as the only register on the stack

21 commands remain.

The ‘_popall’ command can be removed. The ‘_popall’ command pops the registers from the stack in the following order: edi, esi, ebp, esp, ebx, edx, ecx, eax. The command can be replaced by popping and assigning the registers individually, in the following way:

_pop       ;edi
_saveWrtOff
_pop       ;esi
_saveJmpOff
_pop       ;ebp
_nopdB
_pop       ;esp (discard)
_pop       ;ebx
_pop       ;edx
_nopdD
_pop       ;ecx
_save
_pop       ;eax
_nopdA

20 commands remain.

The ‘_nopdB’, ‘_saveWrtOff’ and ‘_saveJmpOff’ commands can be removed if the ‘_push’ and ‘_popall’ commands are retained. Replacement of the ‘_saveWrtOff’ command would look like this:

_pushall
_pop       ;discard existing edi
[construct value to place into edi, not shown]
_push
_popall

Replacement of the ‘_saveJmpOff’ command would look like this:

_pushall
_pop       ;edi
[_nopdD    ;preserve edi if needed]
_pop       ;discard existing esi
[construct value to place into esi, not shown]
_push
[_nopsD     ;restore edi if needed]
_push
_popall

Replacement of the ‘_nopdB’ command would look like this:

_pushall
_pop       ;edi
[_nopdD    ;preserve edi if needed]
_pop       ;esi
[_nopdA    ;preserve esi if needed]
_pop       ;discard existing ebp
[construct value to place into ebp, not shown]
_push
[_nopsA    ;restore esi if needed]
_push
[_nopsD    ;restore edi if needed]
_push
_popall

19 commands remain.

Two other commands can be removed, but they cannot be replaced using existing instructions. Instead, the replacement code requires the introduction of another instruction. The two commands are ‘_shl’ and ‘_shr’. The replacement instruction is ‘_rot’ (‘rotate’). The direction of the rotate is not important, as long as it is known, since all values can be constructed by using it in conjunction with the ‘_and’ instruction. However, it requires the use of the value ‘1’ as the ‘base constant’. The value ‘1’ would be used to construct the values ‘0x7fffffff’ (if rotating shifts to the right) or ‘0xfffffffe’ (if rotating shifts to the left). This is the mask value that is used by the ‘_and’ command to zero the appropriate bit in order to simulate a shift. This is the simplest implementation that would rotate a value only once per use without reference to the value in the ‘ecx’ register. Multi-bit rotates could be supported, too, but then the ‘and’ mask would no longer be a constant. Instead, it would be specific to the number of bits that are being rotated. So, shifting the value in the ‘eax’ register left by ‘3’ times, using the single-bit rotate command, would look like this:

;construct the value ‘0xfffffffe’
_getDO     ;get data offset
_getdata   ;ebx = 1
_save      ;ecx = 1
_rot       ;ebx = 2
_xor       ;ebx = 3
_rot       ;ebx = 6
_xor       ;ebx = 7
_rot       ;ebx = 0x0e
_xor       ;ebx = 0x0f
[repeat seven more times, but omit the final xor]
_save      ;ecx = 0xfffffffe
;rotate left and zero the overflow bits 
_nopsA     ;ebx = eax
_rot       ;ebx = rol(ebx, 1)
_and       ;ebx = ebx & 0xfffffffe
_rot       ;ebx = rol(ebx, 1)
_and       ;ebx = ebx & 0xfffffffe
_rot       ;ebx = rol(ebx, 1)
_and       ;ebx = ebx & 0xfffffffe
_nopdA     ;eax = shl(eax, 3)

18 commands remain:

_nopsA, _nopsB, _nopsD, _nopdA, _nopdD
_writeByte
_save
_getDO, _getdata, _getEIP
_push _pop, _pushall, _popall
_rot, _and, _xor
_JnzUp

Many years from now, our distant descendants might stumble upon a codon stream that describes only 18 amino acids – and we might be looking at its origin. Imagine that.

Bibliography

[1] A codon is a trinucleotide sequence of DNA or RNA (the nucleic acids that contain the genetic instructions used in the development and functioning of living organisms) that corresponds to a specific amino acid. See http://www.genome.gov/Glossary/index.cfm?id=36.

[2] Transfer RNA, or tRNA, is a small RNA molecule that is involved in protein synthesis. See http://www.wiley.com/college/ boyer/0470003790/structure/tRNA/trna_intro.htm.

[3] Amino acids are the building blocks of proteins. See http://en.wikipedia.org/w/index.php?title=Amino_acid&oldid=412676887.

Latest articles:

Nexus Android banking botnet – compromising C&C panels and dissecting mobile AppInjects

Aditya Sood & Rohit Bansal provide details of a security vulnerability in the Nexus Android botnet C&C panel that was exploited to compromise the C&C panel in order to gather threat intelligence, and present a model of mobile AppInjects.

Cryptojacking on the fly: TeamTNT using NVIDIA drivers to mine cryptocurrency

TeamTNT is known for attacking insecure and vulnerable Kubernetes deployments in order to infiltrate organizations’ dedicated environments and transform them into attack launchpads. In this article Aditya Sood presents a new module introduced by…

Collector-stealer: a Russian origin credential and information extractor

Collector-stealer, a piece of malware of Russian origin, is heavily used on the Internet to exfiltrate sensitive data from end-user systems and store it in its C&C panels. In this article, researchers Aditya K Sood and Rohit Chaturvedi present a 360…

Fighting Fire with Fire

In 1989, Joe Wells encountered his first virus: Jerusalem. He disassembled the virus, and from that moment onward, was intrigued by the properties of these small pieces of self-replicating code. Joe Wells was an expert on computer viruses, was partly…

Run your malicious VBA macros anywhere!

Kurt Natvig wanted to understand whether it’s possible to recompile VBA macros to another language, which could then easily be ‘run’ on any gateway, thus revealing a sample’s true nature in a safe manner. In this article he explains how he recompiled…

Bulletin Archive