Flibi: Evolution

Peter Ferrie
Virus Bulletin, May 2011, page 6-15
ISSN 0956-9979
May 2011

Download PDF (92.83Kb) (You need to be registered on forum)
[Back to index] [Comments (0)]

The Flibi virus demonstrated that a virus can carry its own ‘genetic code’ (see VB, March 2011, p.4), and if the codons¹ (the p-code form of the virus), the tRNA² (the translator function), or the corresponding amino acids³ (the native code) are mutated in some way, then interesting behaviours can arise.

Each codon is used as a relative offset into a table of amino acids. There is a single pointer to the table. Mutation of a codon might cause a new amino acid to be produced, since it might now point to a different entry in the table. Mutation of the pointer would almost certainly be fatal since many codons would not be translated into the correct amino acids. Mutation of the amino acid itself might produce new behaviour, depending on the change. For example, a shift could become a rotate.

The virus has the ability to move a sequence of codons to a later position in the stream⁴, and then fill the gap with no-operation instructions. In most cases, this simply results in the replacement of the codons at the destination⁵. Of course, if the selected sequence appears at the end of the defined stream (there is a lot of slack space after the last meaningful codon), then the size of the defined stream will increase slightly each time that condition occurs. However, the size of the buffer remains fixed. Therefore, new sequences can only appear when the translator code is modified to increase the number of codons that are translated, thus ‘translating’ garbage beyond the original end of the stream. That garbage could potentially be modified over time to eventually produce meaningful functionality. Its location in the virus body would change over time as a result of the codon deletion, allowing the new amino acids to ‘migrate’ to a final position where they become truly useful. The human eye did not spring fully formed from the dust of the earth but was the result of gradual refinements in image accuracy. Something similar can occur here, where the sequence of amino acids does not need to work completely (or even at all) in order to be useful (or just retained). As unlikely as these things are, millions of computer years from now, we might see some of the following transformations.

The aim of this article is to demonstrate how some instructions from the original set might be removed by replacing them with functionally equivalent code sequences using the remaining instructions. One advantage of a smaller instruction set is that it allows an increase in the number of codons that can map to a single amino acid, thus making the body more resilient to corruption. Further, a sequence of instructions has a smaller risk of lethal mutation than a single instruction, because the risk is spread over a wider area.

We begin with a brief overview of the language itself. There are 45 commands in the release version (there were only 43 in the preview version). There are three general-purpose registers (‘A’, ‘B’ and ‘D’, which correspond to the ‘eax’, ‘ebp’ and ‘edx’ CPU registers); one temporary register, upon which all operations are performed (which corresponds to the ‘ebx’ CPU register); one ‘operator’ register, which holds the value for any operation that requires a parameter (which corresponds to the ‘ecx’ CPU register); and two buffer registers, one of which holds the destination for branching instructions (which corresponds to the ‘esi’ CPU register), and the other holds the destination for write instructions (which corresponds to the ‘edi’ CPU register).

The language supports the following commands:

_nopsA, _nopsB, _nopsD, _nopdA, _nopdB, _nopdD
_saveWrtOff, _saveJmpOff
_writeByte, _writeDWord
_save, _addsaved, _subsaved
_getDO, _getdata, _getEIP
_push, _pop, _pushall, _popall
_zer0
_mul, _div, _shl, _shr, _and, _xor
_add0001, _add0004, _add0010, _add0040, _add0100, _add0400, _add1000, _add4000, _sub0001
_nopREAL
_JnzUp, _JzDown, _JnzDown
_CallAPILoadLibrary, _CallAPIMessageBox, _CallAPISleep (release version), _call
_null (release version, it has no actual name)

This set can be reduced in several ways. The most obvious candidates for removal are the three API calls (two in the preview version⁶). The APIs can be called using the ‘_call’ command if the API addresses are placed in the data section in this way:

_getDO    ;get data offset

_addnnnn  ;adjust ebx as appropriate to reach the

          ;required offset

_call     ;call the API

This leaves 42 commands remaining (41 in the preview version).

The ‘_zer0’ command can be removed by using this code:

_save     ;ecx = ebx

_xor      ;ebx = 0

41 (40) commands now remain.

The ‘_subsaved’ command (which performs the action ‘ebx = ebx – ecx’) can be removed, and the ‘_addsaved’ command (which performs the action ‘ebx = ebx + ecx’) can be used instead, with a slight change. Specifically, the new value of the ‘ecx’ register is ‘-ecx’ (such that ‘ebx = ebx + -ecx’). However, there is no negate command, so an equivalent result must be achieved using the combination of operations that perform a ‘not’ and an ‘add 1’. The problem is that a ‘not’ operation uses the value ‘0xffffffff’, which requires many steps to construct. Given the existing instruction set, it would be simplest to place the value ‘0xffffffff’ in the data section⁷. It must be placed at the start of the data section, because the ‘_addnnnn’ commands can be removed, leaving no way to select another offset. This algorithm can then be used:

xor ebx, 0xffffffff

inc ebx

which we translate into this code:

_push

_getDO    ;get data offset

_getdata  ;ebx = 0xffffffff

_save     ;ecx = 0xffffffff

_pop

_addsaved ;ebx = ebx - 1

39 (38) commands remain.

The ‘_addnnnn’ commands exist for convenience, but all of the commands apart from ‘_add0001’ can be constructed using the ‘_add0001’ command. Thus, the ‘_add0004’, ‘_ add0010’, ‘_add0040’, ‘_add0100’, ‘_add0400’, ‘_add1000’ and ‘_add4000’ commands can be removed.

32 (31) commands remain.

The ‘_add0001’ command can also be removed, because the number ‘1’ can be recovered from the value ‘0xffffffff’ by using this code:

_getDO    ;get data offset

_getdata  ;ebx = 0xffffffff

_save     ;ecx = 0xffffffff

;here is a horrible trick:

;modern CPUs limit the shift-count to 0x1f by taking

;the low five bits for the count and simply discarding

;the rest of the value internally, this performs a

;cl & 0x1f and it’s exactly what we need

_shr      ;ebx = ebx >> cl

_save     ;ecx = 1

From then on, the ‘_addsaved’ command can be used to increment the ‘ebx’ register as needed.

31 (30) commands remain.

Of course, it would require very many uses of the ‘_addsaved’ command in order to construct large values, but value construction can be accelerated by using the ‘_shl’ and ‘_xor’ commands.

For example, constructing the value ‘2’ is a matter of the following:

_shl      ;ebx = ebx << cl (ebx and ecx are ‘1’

          ;from above)

Constructing the value ‘3’, beginning with the ‘ebx’ and ‘ecx’ registers holding the value ‘1’, as above, is a matter of the following:

_shl      ;ebx = ebx << cl

_xor      ;ebx = ebx ^ ecx

Constructing the value ‘4’, beginning with the ‘ebx’ and ‘ecx’ registers holding the value ‘1’, as above, is a matter of the following:

_shl      ;ebx = ebx << cl

_shl      ;ebx = ebx << cl

And so on. Given this algorithm, we can see that the value ‘0xffffffff’ is not the only possible ‘base constant’. The value ‘1’ could be used instead, since the value ‘0xffffffff’ could be produced from it in the following way:

_getDO    ;get data offset

_getdata  ;ebx = 1

_save     ;ecx = 1

_shl      ;ebx = 2

_xor      ;ebx = 3

_shl      ;ebx = 6

_xor      ;ebx = 7

_shl      ;ebx = 0x0e

_xor      ;ebx = 0x0f

... [54 steps]

_shl      ;ebx = 0xfffffffe

_xor      ;ebx = 0xffffffff

_save     ;ecx = 0xffffffff

Clearly, it is far simpler to go from ‘0xffffffff’ to ‘1’ than the other way around. Note that values can also be constructed using the ‘reverse’ of this technique, to reduce the number of shifts required. For example, constructing the value ‘0x80000000’ is a matter of the following:

_getDO    ;get data offset

_getdata  ;ebx = 0xffffffff

_save     ;ecx = 0xffffffff

_shl      ;ebx = 0x80000000

Constructing the value ‘0x40000000’ is a matter of the following:

_push

_shr      ;ebx = ebx >> cl (ebx = 0x80000000,

          ;ecx = 0xffffffff from above)

_save     ;ecx = 1

_pop

_shr      ;ebx = 0x40000000

However, setting additional bits in the upper region requires more than just the ‘_xor’ command. Here are two examples that set the same value, one using the ‘_shl’ command and one using the ‘_shr’ command. To construct a value such as ‘0xf0000000’, beginning with the ‘ebx’ register holding the value ‘0x80000000’ and the ‘ecx’ register holding the value ‘0xffffffff’, as above, the following can be used:

_push

_shr      ;ebx = ebx >> cl

_save     ;ecx = 1

_pop      ;ebx = 0x80000000 again

_push

_shr      ;ebx = 0x40000000

_push

_shr      ;ebx = 0x20000000

_push

_shr      ;ebx = 0x10000000

_save     ;ecx = 0x10000000

_pop

_xor      ;ebx = 0x30000000

_pop

_xor      ;ebx = 0x70000000

_pop

_xor      ;ebx = 0xf0000000

Whereas, to construct the value ‘0xf0000000’, beginning with the ‘ebx’ and ‘ecx’ registers holding the value ‘0xffffffff’, as above, the following can be used:

_push

_shr      ;ebx = ebx >> cl

_save     ;ecx = 1

_shl      ;ebx = 2

_xor      ;ebx = 3

_shl      ;ebx = 6

_xor      ;ebx = 7

_shl      ;ebx = 0x0e

_shl      ;ebx = 0x1c

_save     ;ecx = 0x1c

_pop

_shl      ;ebx = 0xf0000000

Thus, depending on the value, the ‘_shl’ method is the simplest.

Astute readers will have noticed that none of the value constructions above use the ‘_addsaved’ command. This shows that constants can be constructed without using any form of ‘add’. However, it is also possible to perform the addition of arbitrary values without using any form of ‘add’, resulting in the removal of the ‘_addsaved’ command by using this algorithm (edx and ebp holding the values to add together):

eax = edx ^ ebp

do

{

    ebp = (ebp & edx) << 1

    edx = eax

    eax = edx ^ ebp

}

while (edx & ebp)

ebx = eax

which we translate into this code:

[construct here the first value to add, not shown]

_nopdD    ;edx = ebx

[construct here the second value to add, not shown]

_nopdB    ;ebp = ebx

_save     ;ecx = ebx

_nopsD    ;ebx = edx

          ;optional, depending on the order of the

          ;constructions above

_xor      ;ebx = edx ^ ebp

_nopdA    ;eax = edx ^ ebp

_getEIP

_push     ;top of do-while loop

          ;ebx points to a hidden ‘pop ebx’

          ;instruction as part of _getEIP so there

          ;is no explicit ‘pop’ instruction inside

          ;the loop that corresponds to this ‘push’

          ;instruction

_saveJmpOff ;esi = ebx

_nopsD    ;ebx = edx

_save     ;ecx = edx

_nopsB    ;ebx = ebp

_and      ;ebx = ebp & edx

_push

_getDO    ;get data offset

_getdata  ;ebx = 0xffffffff

_save     ;ecx = 0xffffffff

_shr      ;ebx = 1

_save     ;ecx = 1

_pop

_shl      ;ebx = (edx & ebp) << 1

_nopdB    ;ebp = (edx & ebp) << 1

_save     ;ecx = ebx

_nopsA    ;ebx = eax

_nopdD    ;edx = eax

_push

_xor      ;ebx = edx ^ ebp

_nopdA    ;eax = edx ^ ebp

_pop

_and      ;ebx = edx & ebp

_JnzUp    ;loop while ((edx & ebp) != 0)

_pop      ;discard loop address

_nopsA    ;ebx = eax

30 (29) commands remain.

The replacement code for the ‘_addsaved’ command requires the use of the base constant from the data section (and here, the value ‘1’ would result in shorter code).

The value ‘1’ can be constructed dynamically instead, in the following way:

_getEIP

_getdata  ;ebx=0xxxxxxx5b

_save     ;ecx=0xxxxxxx5b

_shl

_shr      ;ebx=0x1b

_save     ;ecx=0x1b

_addsaved ;ebx=0x36

_addsaved ;ebx=0x51

_addsaved ;ebx=0x6c

_addsaved ;ebx=0x87

_save     ;ecx=0x87

_shr      ;ebx=1

However, that algorithm prevents the removal of the ‘_addsaved’ command. The two concepts seem to be mutually exclusive.

It is unclear whether the ‘_nopREAL’ command could be removed, since there is no other single-byte command that might take its place in the event that a true ‘no-operation’ command were required. Its current purposes are to pad the unused slots following codon deletion and to fill the unused slot(s) that follow the ‘_JnzDown’ command (since the ‘_JnzDown’ command skips three slots). Note that the current implementation of the ‘_JnzDown’ command contains a bug, which is that the destination of the branch is not the start of a slot. Instead, the command branches to two bytes past the start of the slot. The result is that the ‘_nopREAL’ command must be used to fill that destination slot, otherwise a crash could occur because the branch might land in the middle of a command.However, the ‘_JnzDown’ command can be removed by using alternative code, and any non-stack and non-memory instruction can be used for tail padding. Thus it appears that, given its current uses, the ‘_nopREAL’ command can be removed.

29 (28) commands remain.

In the release version a ‘_null’ command exists, which emits a single zero into the stream, followed by the ‘nop’ padding. Its existence is the result of a bug. The execution of such an instruction is likely to cause an exception. It is possible on Windows XP and later to register a vectored exception handle using the existing language, and that could intercept the exception, but this is quite outside the ‘style’ of the language. The command can be removed without any problem.

28 commands remain.

The ‘_JnzDown’ command could be removed by using a careful implementation of ‘_JnzUp’ (given that the meaning is reversed), but perhaps not without the loss of some functionality. It requires knowledge of the location of a forward branch destination. This interferes with command reordering if the buffer size is fixed, because there might not be enough slots available to construct the required ‘add’ value (unless the maximum number of slots was reserved each time in order to construct any possible number). It does, however, extend the functionality in a different way, since the ‘_JnzDown’ command can skip only three commands at a time, requiring its use multiple times in order to execute larger conditional blocks. The ‘_JnzDown’ command also places severe restrictions on what can appear in those conditional blocks, since an arithmetic operation might clear the Z flag, causing the branch to be taken instead of skipped. In contrast, the use of the ‘_JnzUp’ command can skip an arbitrary number of commands without restriction. The difference can be demonstrated easily. We begin with some code that calls the GetTickCount() API to fetch a ‘random’ number (for ease of demonstration, the offset of the GetTickCount() API is set arbitrarily to the value ‘0x0c’), using the ‘_JnzDown’ command:

;construct pointer to GetTickCount()

;construct the value “0x0c”

_getDO    ;get data offset

_getdata  ;ebx = 0xffffffff

_save     ;ecx = 0xffffffff

_shr      ;ebx = 1

_save     ;ecx = 1

_shl      ;ebx = 2

_xor      ;ebx = 3

_shl      ;ebx = 6

_shl      ;ebx = 0x0c

_nopdB    ;ebp = 0x0c

_save     ;ecx = 0x0c

;add to data offset

_getDO    ;get data offset

_nopdD    ;edx = data offset

_xor      ;ebx = edx ^ ebp

_nopdA    ;eax = edx ^ ebp

_getEIP

_push     ;top of do-while loop

          ;ebx points to a hidden ‘pop ebx’

          ;instruction as part of _getEIP

          ;so there is no explicit ‘pop’

          ;instruction inside the loop

          ;that corresponds to this ‘push’

          ;instruction

_saveJmpOff ;esi = ebx

_nopsD    ;ebx = edx

_save     ;ecx = edx

_nopsB    ;ebx = ebp

_and      ;ebx = ebp & edx

_push

_getDO    ;get data offset

_getdata  ;ebx = 0xffffffff

_save     ;ecx = 0xffffffff

_shr      ;ebx = 1

_save     ;ecx = 1

_pop

_shl      ;ebx = (edx & ebp) << 1

_nopdB    ;ebp = (edx & ebp) << 1

_save     ;ecx = ebx

_nopsA    ;ebx = eax

_nopdD    ;edx = eax

_push

_xor      ;ebx = edx ^ ebp

_nopdA    ;eax = edx ^ ebp

_pop

_and      ;ebx = edx & ebp

_JnzUp    ;loop while ((edx & ebp) != 0)

_pop      ;discard loop address

_nopsA    ;ebx = eax

;call GetTickCount()

_call

Then the choice is made, and the branch might be taken (seven in eight chances to take it):

;construct the value ‘7’

_getDO    ;get data offset

_getdata  ;ebx = 0xffffffff

_save     ;ecx = 0xffffffff

_shr      ;ebx = 1

_save     ;ecx = 1

_shl      ;ebx = 2

_xor      ;ebx = 3

_shl      ;ebx = 6

_xor      ;ebx = 7

_save     ;ecx = 7

;’and’ with result from GetTickCount()

_nopsA

_and      ;ebx = ebx & 7

_JnzDown

[conditional command 1]

[conditional command 2]

[conditional command 3]

_nopREAL  ;work around ‘_JnzDown’ bug

The replacement code might look something like this, beginning immediately after the call to the GetTickCount() API:

;save result from GetTickCount()

_nopsA

_push

;construct pointer to l2

_getDO     ;get data offset

_getdata   ;ebx = 0xffffffff

_save      ;ecx = 0xffffffff

_shr       ;ebx = 1

_save      ;ecx = 1

... [‘_shl’ and ‘_xor’ as needed to produce the value ((lines(l1...l2) * 8) + 3)]

_nopdB     ;ebp = offset of l2

_save      ;ecx = offset of l2

_getEIP

l1: 

_nopdD    ;edx = eip

_xor      ;ebx = edx ^ ebp

_nopdA    ;eax = edx ^ ebp

_getEIP

_push     ;top of do-while loop

          ;ebx points to a hidden ‘pop ebx’

          ;instruction as part of _getEIP

          ;so there is no explicit ‘pop’

          ;instruction inside the loop

          ;that corresponds to this ‘push’

          ;instruction

_saveJmpOff ;esi = ebx

_nopsD    ;ebx = edx

_save     ;ecx = edx

_nopsB    ;ebx = ebp

_and      ;ebx = ebp & edx

_push

_getDO    ;get data offset

_getdata  ;ebx = 0xffffffff

_save     ;ecx = 0xffffffff

_shr      ;ebx = 1

_save     ;ecx = 1

_pop

_shl      ;ebx = (edx & ebp) << 1

_nopdB    ;ebp = (edx & ebp) << 1

_save     ;ecx = ebx

_nopsA    ;ebx = eax

_nopdD    ;edx = eax

_push

_xor      ;ebx = edx ^ ebp

_nopdA    ;eax = edx ^ ebp

_pop

_and      ;ebx = edx & ebp

_JnzUp    ;loop while ((edx & ebp) != 0)

_pop      ;discard loop address

_nopsA    ;ebx = eax

_saveJmpOff

;restore result from GetTickCount()

_pop

_nopdA    ;eax = GetTickCount()

;construct the value ‘7’

_getDO ;get data offset

_getdata  ;ebx = 0xffffffff

_save     ;ecx = 0xffffffff

_shr      ;ebx = 1

_save     ;ecx = 1

_shl      ;ebx = 2

_xor      ;ebx = 3

_shl      ;ebx = 6

_xor      ;ebx = 7

_save     ;ecx = 7

;’and’ with result from GetTickCount()

_nopsA    ;ebx = GetTickCount()

_and      ;ebx = ebx & 7

_JnzUp

[conditional command 1]

[conditional command 2]

[conditional command 3]

...

[conditional command n]

l2:       ;branch destination is here

27 commands remain.

In the same way as for the ‘_JnzDown’ command, the ‘_JzDown’ command can be removed.

26 commands remain.

Normally, the ‘ecx’, ‘esi’ and ‘edi’ registers are write-only (technically, the ‘ecx’ register only becomes write-only after the ‘_addsaved’ command is removed), leaving the ‘eax’, ‘ebx’, ‘edx’ and ‘ebp’ registers as general-purpose registers. However, there is a way to read these registers again after they have been written. The ‘_pushall’ command pushes the registers onto the stack in this order: eax, ecx, edx, ebx, esp, ebp, esi, edi. The registers can then be popped individually from the stack, by using the ‘_pop’ command, in the following way:

_pushall ;save all registers

_pop      ;edi

_pop      ;esi

_pop      ;ebp

_pop      ;esp (useful for reading stack parameters,

          ;using the ‘_getdata’ command, see below)

_pop      ;ebx

_pop      ;edx

_pop      ;ecx

_pop      ;eax

A smaller set of ‘_pop’ commands can be used to access particular registers, leaving the others for removal later, if necessary. The popped registers can also be modified and pushed back onto the stack, allowing the ‘_popall’ command to be used to pop all of them. This allows multiple values to be assigned simultaneously. By combining several of these tricks, it becomes possible to remove the ‘_mul’ command (edx:eax = eax * ebx). A working solution can be downloaded from http://pferrie.tripod.com/misc/flibi_mul.zip.

25 commands remain.

Interestingly, by reordering the register initialization code for the first addition block to remove one instruction, the code actually increases in size because the branch to l4 requires more instructions to construct it as a result. This brings us to a special-case problem of dynamic pointer construction. There is a particular problem when the code at l2 branches to l4 and the code at l3 branches to l1, but where l1 < l2 and l4 > l3, as shown here:

l1: [code]

l2: jz l4

l3: jnz l1

l4: [code]

First, construct the branch from l2 to l4:

l1: [code]

    ;construct relative to l2 (two instructions)

    mov reg, 1

    shl reg, 1

l2: jz l2+reg

l3: jnz l1

l4:

[code]

Then construct the branch from l3 to l1:

l1:  [code]

     mov reg, 1

     shl reg, 1

l2:  jz l2+reg

     ;construct relative to l3 (four lines)

     ;[code] at l1 is a single instruction to keep

     ;the example simple

l3:  mov reg, 1

     shl reg, 1

     shl reg, 1

     jnz l3-reg

l4:  [code]

Now the branch at l2 is affected, and no longer points to l4, so reconstruct it:

l1:  [code]

     ;construct relative to l2 (five instructions)

     mov reg, 1

     shl reg, 1

     shl reg, 1

     add reg, 1

l2:  jz l2+reg

l3:  mov reg, 1

     shl reg, 1

     shl reg, 1

     jnz l3-reg

l4:  [code]

But now the branch at l3 is affected, and no longer points to l1, so reconstruct it:

l1:  [code]

     mov reg, 1

     shl reg, 1

     shl reg, 1

     add reg, 1

l2:  jz l2+reg

     ;construct relative to l3 (six instructions)

l3:  mov reg, 1

     shl reg, 1

     add reg, 1

     shl reg, 1

     jnz l3-reg

l4:  [code]

Again, the branch at l2 is affected and no longer points to l4, so reconstruct it:

l1:  [code]

     ;construct relative to l3 (six instructions)

     mov reg, 1

     shl reg, 1

     add reg, 1

     add reg, 1

l2:  jz l2+reg

l3:  mov reg, 1

     shl reg, 1

     add reg, 1

     shl reg, 1

     jnz l3-reg

l4:  [code]

Finally, the instructions are reordered but not inserted, and the combination works. The limitation is that the lines in the construction must converge on a multiple of each other. Such a value might not exist without the explicit insertion of ‘alignment’ lines. The ‘_nop’ command could be used for this purpose, but any ‘harmless’ instruction can be used, such as moving to/from the same register from/to which a value was just moved (more specifically, if the previous move instruction was from ebx to eax, then it is harmless to move from eax back into ebx). By combining several of these tricks, it becomes possible to remove the ‘_div’ command (eax, edx = edx:eax / ebx) as well. A working solution can be downloaded from http://pferrie.tripod.com/misc/flibi_div.zip.

24 commands remain.

The ‘_writeDWord’ command can be removed by using this algorithm:

mov [edi], bl

inc edi

shr ebx, 8

mov [edi], bl

inc edi

shr ebx, 8

mov [edi], bl

inc edi

shr ebx, 8

mov [edi], bl

which we translate into this code:

;construct the value ‘8’

_push

_getDO    ;get data offset

_getdata  ;ebx = 0xffffffff

_save     ;ecx = 0xffffffff

_shr      ;ebx = 1

_save     ;ecx = 1

_shl      ;ebx = 2

_shl      ;ebx = 4

_shl      ;ebx = 8

;save in ecx for later

_save     ;ecx = 8

_pop

;write byte 0

_writeByte

;increment edi

_pushall

_getDO    ;get data offset

_getdata  ;ebx = 0xffffffff

_save     ;ecx = 0xffffffff

_shr      ;ebx = 1

_nopdB    ;ebp = 1

_save     ;ecx = 1

_pop      ;ebx = edi

_nopdD    ;edx = edi

_xor      ;ebx = edx ^ ebp

_nopdA    ;eax = edx ^ ebp

_getEIP

_push     ;top of do-while loop

          ;ebx points to a hidden ‘pop ebx’ instruction

          ;as part of _getEIP so there is no explicit

          ;‘pop’ instruction inside the loop that

          ;corresponds to this ’push’ instruction

_saveJmpOff ;esi = ebx

_nopsD    ;ebx = edx

_save     ;ecx = edx

_nopsB    ;ebx = ebp

_and      ;ebx = ebp & edx

_push

_getDO    ;get data offset

_getdata  ;ebx = 0xffffffff

_save     ;ecx = 0xffffffff

_shr      ;ebx = 1

_save     ;ecx = 1

_pop

_shl      ;ebx = (edx & ebp) << 1

_nopdB    ;ebp = (edx & ebp) << 1

_save     ;ecx = ebx

_nopsA    ;ebx = eax

_nopdD    ;edx = eax

_push

_xor      ;ebx = edx ^ ebp

_nopdA    ;eax = edx ^ ebp

_pop

_and      ;ebx = edx & ebp

_JnzUp    ;loop while ((edx & ebp) != 0)

_pop      ;discard loop address

_nopsA    ;ebx = eax

;update edi

_push

_popall ;edi = eax and rebalance stack

;shift ebx right by 8

_shr ;ebx = ebx >> 8

;write byte 1

_writeByte

[repeat twice more, beginning with ‘increment edi’

from above, to write the remaining bytes]

Of course, if there were a command to write a new value for the stack pointer, then the stack could be moved to the destination address, and a ‘_push’ command could be used to write the value. However, there would need to be a corresponding command to read the previous value for the stack pointer in order to restore it afterwards. This is quite outside the ‘style’ of the language.

23 commands remain.

Another instruction that can be removed is the ‘_call’ command. A subroutine call is equivalent to pushing the return address onto the stack, and then jumping to the location of the subroutine. It can be replaced by the ‘_JnzUp’ command in the following way (again, calling the GetTickCount() API, as above):

;construct pointer to l2

_getDO    ;get data offset

_getdata  ;ebx = 0xffffffff

_save     ;ecx = 0xffffffff

_shr      ;ebx = 1

_save     ;ecx = 1

... [‘_shl’ and ‘_xor’ as needed to produce the value ((lines(l1...l2) * 8) + 3)]

_nopdB    ;ebp = offset of l2

_save     ;ecx = offset of l2

_getEIP

l1: 

_nopdD    ;edx = eip

_xor      ;ebx = edx ^ ebp

_nopdA    ;eax = edx ^ ebp

_getEIP

_push     ;top of do-while loop

          ;ebx points to a hidden ‘pop ebx’ instruction

          ;as part of _getEIP so there is no explicit

          ;’pop’ instruction inside the loop that

          ;corresponds to this ‘push’ instruction

_saveJmpOff ;esi = ebx

_nopsD    ;ebx = edx

_save     ;ecx = edx

_nopsB    ;ebx = ebp

_and      ;ebx = ebp & edx

_push

_getDO    ;get data offset

_getdata  ;ebx = 0xffffffff

_save     ;ecx = 0xffffffff

_shr      ;ebx = 1

_save     ;ecx = 1

_pop

_shl      ;ebx = (edx & ebp) << 1

_nopdB    ;ebp = (edx & ebp) << 1

_save     ;ecx = ebx

_nopsA    ;ebx = eax

_nopdD    ;edx = eax

_push

_xor      ;ebx = edx ^ ebp

_nopdA    ;eax = edx ^ ebp

_pop

_and      ;ebx = edx & ebp

_JnzUp    ;loop while ((edx & ebp) != 0)

_pop      ;discard loop address

_nopsA    ;ebx = eax

;save return address on stack

_push

;construct pointer to GetTickCount()

_getDO    ;get data offset

_getdata  ;ebx = 0xffffffff

_save     ;ecx = 0xffffffff

_shr      ;ebx = 1

_save     ;ecx = 1

_shl      ;ebx = 2

_xor      ;ebx = 3

_shl      ;ebx = 6

_shl      ;ebx = 0x0c

_nopdB    ;ebp = 0x0c

_save     ;ecx = 0x0c

_getDO    ;get data offset

_nopdD    ;edx = data offset

_xor      ;ebx = edx ^ ebp

_nopdA    ;eax = edx ^ ebp

_getEIP

_push     ;top of do-while loop

          ;ebx points to a hidden ‘pop ebx’

          ;instruction as part of _getEIP so

          ;there is no explicit ‘pop’

          ;instruction inside the loop that

          ;corresponds to this ‘push’ instruction

_saveJmpOff ;esi = ebx

_nopsD    ;ebx = edx

_save     ;ecx = edx

_nopsB    ;ebx = ebp

_and      ;ebx = ebp & edx

_push

_getDO    ;get data offset

_getdata  ;ebx = 0xffffffff

_save     ;ecx = 0xffffffff

_shr      ;ebx = 1

_save     ;ecx = 1

_pop

_shl      ;ebx = (edx & ebp) << 1

_nopdB    ;ebp = (edx & ebp) << 1

_save     ;ecx = ebx

_nopsA    ;ebx = eax

_nopdD    ;edx = eax

_push

_xor      ;ebx = edx ^ ebp

_nopdA    ;eax = edx ^ ebp

_pop

_and      ;ebx = edx & ebp

_JnzUp    ;loop while ((edx & ebp) != 0)

_pop      ;discard loop address

_nopsA    ;ebx = eax

_getdata  ;ebx = offset of GetTickCount()

_saveJmpOff ;esi = offset of GetTickCount()

;clear Z flag

_save     ;ecx = ebx

_and      ;ebx = ebx & ebx (known non-zero from

          ;above)

;jump to GetTickCount()

_JnzUp

l2: ;execution resumes here

Local subroutines can be called in the same way; however there is no ‘return’ command. The equivalent for a ‘return’ command is the following:

;retrieve return address from stack

_pop

_saveJmpOff ;esi = return address

;clear Z flag, if required

_save     ;ecx = ebx

_and      ;ebx = ebx & ebx (known non-zero from above)

;return to caller

_JnzUp

22 commands remain.

The following are two useful tricks just for the sake of interest. The first one demonstrates how to read parameters directly from the stack:

[push parameters here, not shown]

_pushall

_pop      ;edi

[_nopdA   ;eax = edi, if needed]

_pop      ;esi

[_nopdD   ;edx = esi, if needed]

_pop      ;ebp (discard)

_pop      ;esp

_push     ;esp

_push     ;ebp

[_nopsD   ;ebx = original esi, if needed]

_push

[_nopsA   ;ebx = original edi, if needed]

_push

_popall   ;ebp = esp

[add to ebp as needed to reach required variable]

_nopsB    ;ebx = ebp

_getdata  ;read from stack

Then, simply by replacing the ‘_getdata’ command with the ‘_call’ command, function pointers on the stack can be called.

The ‘_push’ command can be removed, but the replacement code is ugly. It would look like this:

_nopdA    ;place into eax in order to appear at the

          ;top of the stack

_pushall

_pop      ;discard edi

_pop      ;discard esi

_pop      ;discard ebp

_pop      ;discard esp

_pop      ;discard ebx

_pop      ;discard edx

_pop      ;discard ecx

;eax remains as the only register on the stack

21 commands remain.

The ‘_popall’ command can be removed. The ‘_popall’ command pops the registers from the stack in the following order: edi, esi, ebp, esp, ebx, edx, ecx, eax. The command can be replaced by popping and assigning the registers individually, in the following way:

_pop      ;edi

_saveWrtOff

_pop      ;esi

_saveJmpOff

_pop      ;ebp

_nopdB

_pop      ;esp (discard)

_pop      ;ebx

_pop      ;edx

_nopdD

_pop      ;ecx

_save

_pop      ;eax

_nopdA

20 commands remain.

The ‘_nopdB’, ‘_saveWrtOff’ and ‘_saveJmpOff’ commands can be removed if the ‘_push’ and ‘_popall’ commands are retained. Replacement of the ‘_saveWrtOff’ command would look like this:

_pushall

_pop      ;discard existing edi

[construct value to place into edi, not shown]

_push

_popall

Replacement of the ‘_saveJmpOff’ command would look like this:

_pushall

_pop      ;edi

[_nopdD   ;preserve edi if needed]

_pop      ;discard existing esi

[construct value to place into esi, not shown]

_push

[_nopsD   ;restore edi if needed]

_push

_popall

Replacement of the ‘_nopdB’ command would look like this:

_pushall

_pop      ;edi

[_nopdD   ;preserve edi if needed]

_pop      ;esi

[_nopdA   ;preserve esi if needed]

_pop      ;discard existing ebp

[construct value to place into ebp, not shown]

_push

[_nopsA   ;restore esi if needed]

_push

[_nopsD   ;restore edi if needed]

_push

_popall

19 commands remain.

Two other commands can be removed, but they cannot be replaced using existing instructions. Instead, the replacement code requires the introduction of another instruction. The two commands are ‘_shl’ and ‘_shr’. The replacement instruction is ‘_rot’ (‘rotate’). The direction of the rotate is not important, as long as it is known, since all values can be constructed by using it in conjunction with the ‘_and’ instruction. However, it requires the use of the value ‘1’ as the ‘base constant’. The value ‘1’ would be used to construct the values ‘0x7fffffff’ (if rotating shifts to the right) or ‘0xfffffffe’ (if rotating shifts to the left). This is the mask value that is used by the ‘_and’ command to zero the appropriate bit in order to simulate a shift. This is the simplest implementation that would rotate a value only once per use without reference to the value in the ‘ecx’ register. Multi-bit rotates could be supported, too, but then the ‘and’ mask would no longer be a constant. Instead, it would be specific to the number of bits that are being rotated. So, shifting the value in the ‘eax’ register left by ‘3’ times, using the single-bit rotate command, would look like this:

;construct the value ‘0xfffffffe’

_getDO    ;get data offset

_getdata  ;ebx = 1

_save     ;ecx = 1

_rot      ;ebx = 2

_xor      ;ebx = 3

_rot      ;ebx = 6

_xor      ;ebx = 7

_rot      ;ebx = 0x0e

_xor      ;ebx = 0x0f

[repeat seven more times, but omit the final xor]

_save     ;ecx = 0xfffffffe

          ;rotate left and zero the overflow bits

_nopsA    ;ebx = eax

_rot      ;ebx = rol(ebx, 1)

_and      ;ebx = ebx & 0xfffffffe

_rot      ;ebx = rol(ebx, 1)

_and      ;ebx = ebx & 0xfffffffe

_rot      ;ebx = rol(ebx, 1)

_and      ;ebx = ebx & 0xfffffffe

_nopdA    ;eax = shl(eax, 3)

18 commands remain:

_nopsA, _nopsB, _nopsD, _nopdA, _nopdD
_writeByte
_save
_getDO, _getdata, _getEIP
_push _pop, _pushall, _popall
_rot, _and, _xor
JnzUp

Many years from now, our distant descendants might stumble upon a codon stream that describes only 18 amino acids – and we might be looking at its origin. Imagine that.

1 A codon is a trinucleotide sequence of DNA or RNA (the nucleic acids that contain the genetic instructions used in the development and functioning of living organisms) that corresponds to a specific amino acid. See http://www.genome.gov/Glossary/index.cfm?id=36.

2 Transfer RNA, or tRNA, is a small RNA molecule that is involved in protein synthesis. See http://www.wiley.com/college/boyer/0470003790/structure/tRNA/trna_intro.htm.

3 Amino acids are the building blocks of proteins. See http://en.wikipedia.org/w/index.php?title=Amino_acid&oldid=412676887.

4 There is a bug in this code, which can result in attempting to copy more bytes than exist in the source.

5 There is an additional case where the destination is the same as the source, in which case the codons are deleted.

6 The ‘_CallAPISleep’ command was added to the release version because the API resolver code could not resolve the Sleep() API on certain platforms. The reason is described in detail in the previous article (VB, March 2011, p.4).

7 It would be even simpler to introduce an instruction which performs a ‘mov ebx, 0xffffffff’.

[Back to index] [Comments (0)]

VX Heavens

Flibi: Evolution