Peter Ferrie
Virus Bulletin, May 2011, page 6-15
ISSN 0956-9979
May 2011
The Flibi virus demonstrated that a virus can carry its own ‘genetic code’ (see VB, March 2011, p.4), and if the codons1 (the p-code form of the virus), the tRNA2 (the translator function), or the corresponding amino acids3 (the native code) are mutated in some way, then interesting behaviours can arise.
Each codon is used as a relative offset into a table of amino acids. There is a single pointer to the table. Mutation of a codon might cause a new amino acid to be produced, since it might now point to a different entry in the table. Mutation of the pointer would almost certainly be fatal since many codons would not be translated into the correct amino acids. Mutation of the amino acid itself might produce new behaviour, depending on the change. For example, a shift could become a rotate.
The virus has the ability to move a sequence of codons to a later position in the stream4, and then fill the gap with no-operation instructions. In most cases, this simply results in the replacement of the codons at the destination5. Of course, if the selected sequence appears at the end of the defined stream (there is a lot of slack space after the last meaningful codon), then the size of the defined stream will increase slightly each time that condition occurs. However, the size of the buffer remains fixed. Therefore, new sequences can only appear when the translator code is modified to increase the number of codons that are translated, thus ‘translating’ garbage beyond the original end of the stream. That garbage could potentially be modified over time to eventually produce meaningful functionality. Its location in the virus body would change over time as a result of the codon deletion, allowing the new amino acids to ‘migrate’ to a final position where they become truly useful. The human eye did not spring fully formed from the dust of the earth but was the result of gradual refinements in image accuracy. Something similar can occur here, where the sequence of amino acids does not need to work completely (or even at all) in order to be useful (or just retained). As unlikely as these things are, millions of computer years from now, we might see some of the following transformations.
The aim of this article is to demonstrate how some instructions from the original set might be removed by replacing them with functionally equivalent code sequences using the remaining instructions. One advantage of a smaller instruction set is that it allows an increase in the number of codons that can map to a single amino acid, thus making the body more resilient to corruption. Further, a sequence of instructions has a smaller risk of lethal mutation than a single instruction, because the risk is spread over a wider area.
We begin with a brief overview of the language itself. There are 45 commands in the release version (there were only 43 in the preview version). There are three general-purpose registers (‘A’, ‘B’ and ‘D’, which correspond to the ‘eax’, ‘ebp’ and ‘edx’ CPU registers); one temporary register, upon which all operations are performed (which corresponds to the ‘ebx’ CPU register); one ‘operator’ register, which holds the value for any operation that requires a parameter (which corresponds to the ‘ecx’ CPU register); and two buffer registers, one of which holds the destination for branching instructions (which corresponds to the ‘esi’ CPU register), and the other holds the destination for write instructions (which corresponds to the ‘edi’ CPU register).
The language supports the following commands:
This set can be reduced in several ways. The most obvious candidates for removal are the three API calls (two in the preview version6). The APIs can be called using the ‘_call’ command if the API addresses are placed in the data section in this way:
This leaves 42 commands remaining (41 in the preview version).
The ‘_zer0’ command can be removed by using this code:
41 (40) commands now remain.
The ‘_subsaved’ command (which performs the action ‘ebx = ebx – ecx’) can be removed, and the ‘_addsaved’ command (which performs the action ‘ebx = ebx + ecx’) can be used instead, with a slight change. Specifically, the new value of the ‘ecx’ register is ‘-ecx’ (such that ‘ebx = ebx + -ecx’). However, there is no negate command, so an equivalent result must be achieved using the combination of operations that perform a ‘not’ and an ‘add 1’. The problem is that a ‘not’ operation uses the value ‘0xffffffff’, which requires many steps to construct. Given the existing instruction set, it would be simplest to place the value ‘0xffffffff’ in the data section7. It must be placed at the start of the data section, because the ‘_addnnnn’ commands can be removed, leaving no way to select another offset. This algorithm can then be used:
which we translate into this code:
39 (38) commands remain.
The ‘_addnnnn’ commands exist for convenience, but all of the commands apart from ‘_add0001’ can be constructed using the ‘_add0001’ command. Thus, the ‘_add0004’, ‘_ add0010’, ‘_add0040’, ‘_add0100’, ‘_add0400’, ‘_add1000’ and ‘_add4000’ commands can be removed.
32 (31) commands remain.
The ‘_add0001’ command can also be removed, because the number ‘1’ can be recovered from the value ‘0xffffffff’ by using this code:
From then on, the ‘_addsaved’ command can be used to increment the ‘ebx’ register as needed.
31 (30) commands remain.
Of course, it would require very many uses of the ‘_addsaved’ command in order to construct large values, but value construction can be accelerated by using the ‘_shl’ and ‘_xor’ commands.
For example, constructing the value ‘2’ is a matter of the following:
Constructing the value ‘3’, beginning with the ‘ebx’ and ‘ecx’ registers holding the value ‘1’, as above, is a matter of the following:
Constructing the value ‘4’, beginning with the ‘ebx’ and ‘ecx’ registers holding the value ‘1’, as above, is a matter of the following:
And so on. Given this algorithm, we can see that the value ‘0xffffffff’ is not the only possible ‘base constant’. The value ‘1’ could be used instead, since the value ‘0xffffffff’ could be produced from it in the following way:
Clearly, it is far simpler to go from ‘0xffffffff’ to ‘1’ than the other way around. Note that values can also be constructed using the ‘reverse’ of this technique, to reduce the number of shifts required. For example, constructing the value ‘0x80000000’ is a matter of the following:
Constructing the value ‘0x40000000’ is a matter of the following:
However, setting additional bits in the upper region requires more than just the ‘_xor’ command. Here are two examples that set the same value, one using the ‘_shl’ command and one using the ‘_shr’ command. To construct a value such as ‘0xf0000000’, beginning with the ‘ebx’ register holding the value ‘0x80000000’ and the ‘ecx’ register holding the value ‘0xffffffff’, as above, the following can be used:
Whereas, to construct the value ‘0xf0000000’, beginning with the ‘ebx’ and ‘ecx’ registers holding the value ‘0xffffffff’, as above, the following can be used:
Thus, depending on the value, the ‘_shl’ method is the simplest.
Astute readers will have noticed that none of the value constructions above use the ‘_addsaved’ command. This shows that constants can be constructed without using any form of ‘add’. However, it is also possible to perform the addition of arbitrary values without using any form of ‘add’, resulting in the removal of the ‘_addsaved’ command by using this algorithm (edx and ebp holding the values to add together):
which we translate into this code:
30 (29) commands remain.
The replacement code for the ‘_addsaved’ command requires the use of the base constant from the data section (and here, the value ‘1’ would result in shorter code).
The value ‘1’ can be constructed dynamically instead, in the following way:
However, that algorithm prevents the removal of the ‘_addsaved’ command. The two concepts seem to be mutually exclusive.
It is unclear whether the ‘_nopREAL’ command could be removed, since there is no other single-byte command that might take its place in the event that a true ‘no-operation’ command were required. Its current purposes are to pad the unused slots following codon deletion and to fill the unused slot(s) that follow the ‘_JnzDown’ command (since the ‘_JnzDown’ command skips three slots). Note that the current implementation of the ‘_JnzDown’ command contains a bug, which is that the destination of the branch is not the start of a slot. Instead, the command branches to two bytes past the start of the slot. The result is that the ‘_nopREAL’ command must be used to fill that destination slot, otherwise a crash could occur because the branch might land in the middle of a command.However, the ‘_JnzDown’ command can be removed by using alternative code, and any non-stack and non-memory instruction can be used for tail padding. Thus it appears that, given its current uses, the ‘_nopREAL’ command can be removed.
29 (28) commands remain.
In the release version a ‘_null’ command exists, which emits a single zero into the stream, followed by the ‘nop’ padding. Its existence is the result of a bug. The execution of such an instruction is likely to cause an exception. It is possible on Windows XP and later to register a vectored exception handle using the existing language, and that could intercept the exception, but this is quite outside the ‘style’ of the language. The command can be removed without any problem.
28 commands remain.
The ‘_JnzDown’ command could be removed by using a careful implementation of ‘_JnzUp’ (given that the meaning is reversed), but perhaps not without the loss of some functionality. It requires knowledge of the location of a forward branch destination. This interferes with command reordering if the buffer size is fixed, because there might not be enough slots available to construct the required ‘add’ value (unless the maximum number of slots was reserved each time in order to construct any possible number). It does, however, extend the functionality in a different way, since the ‘_JnzDown’ command can skip only three commands at a time, requiring its use multiple times in order to execute larger conditional blocks. The ‘_JnzDown’ command also places severe restrictions on what can appear in those conditional blocks, since an arithmetic operation might clear the Z flag, causing the branch to be taken instead of skipped. In contrast, the use of the ‘_JnzUp’ command can skip an arbitrary number of commands without restriction. The difference can be demonstrated easily. We begin with some code that calls the GetTickCount() API to fetch a ‘random’ number (for ease of demonstration, the offset of the GetTickCount() API is set arbitrarily to the value ‘0x0c’), using the ‘_JnzDown’ command:
Then the choice is made, and the branch might be taken (seven in eight chances to take it):
The replacement code might look something like this, beginning immediately after the call to the GetTickCount() API:
27 commands remain.
In the same way as for the ‘_JnzDown’ command, the ‘_JzDown’ command can be removed.
26 commands remain.
Normally, the ‘ecx’, ‘esi’ and ‘edi’ registers are write-only (technically, the ‘ecx’ register only becomes write-only after the ‘_addsaved’ command is removed), leaving the ‘eax’, ‘ebx’, ‘edx’ and ‘ebp’ registers as general-purpose registers. However, there is a way to read these registers again after they have been written. The ‘_pushall’ command pushes the registers onto the stack in this order: eax, ecx, edx, ebx, esp, ebp, esi, edi. The registers can then be popped individually from the stack, by using the ‘_pop’ command, in the following way:
A smaller set of ‘_pop’ commands can be used to access particular registers, leaving the others for removal later, if necessary. The popped registers can also be modified and pushed back onto the stack, allowing the ‘_popall’ command to be used to pop all of them. This allows multiple values to be assigned simultaneously. By combining several of these tricks, it becomes possible to remove the ‘_mul’ command (edx:eax = eax * ebx). A working solution can be downloaded from http://pferrie.tripod.com/misc/flibi_mul.zip.
25 commands remain.
Interestingly, by reordering the register initialization code for the first addition block to remove one instruction, the code actually increases in size because the branch to l4 requires more instructions to construct it as a result. This brings us to a special-case problem of dynamic pointer construction. There is a particular problem when the code at l2 branches to l4 and the code at l3 branches to l1, but where l1 < l2 and l4 > l3, as shown here:
First, construct the branch from l2 to l4:
Then construct the branch from l3 to l1:
Now the branch at l2 is affected, and no longer points to l4, so reconstruct it:
But now the branch at l3 is affected, and no longer points to l1, so reconstruct it:
Again, the branch at l2 is affected and no longer points to l4, so reconstruct it:
Finally, the instructions are reordered but not inserted, and the combination works. The limitation is that the lines in the construction must converge on a multiple of each other. Such a value might not exist without the explicit insertion of ‘alignment’ lines. The ‘_nop’ command could be used for this purpose, but any ‘harmless’ instruction can be used, such as moving to/from the same register from/to which a value was just moved (more specifically, if the previous move instruction was from ebx to eax, then it is harmless to move from eax back into ebx). By combining several of these tricks, it becomes possible to remove the ‘_div’ command (eax, edx = edx:eax / ebx) as well. A working solution can be downloaded from http://pferrie.tripod.com/misc/flibi_div.zip.
24 commands remain.
The ‘_writeDWord’ command can be removed by using this algorithm:
which we translate into this code:
Of course, if there were a command to write a new value for the stack pointer, then the stack could be moved to the destination address, and a ‘_push’ command could be used to write the value. However, there would need to be a corresponding command to read the previous value for the stack pointer in order to restore it afterwards. This is quite outside the ‘style’ of the language.
23 commands remain.
Another instruction that can be removed is the ‘_call’ command. A subroutine call is equivalent to pushing the return address onto the stack, and then jumping to the location of the subroutine. It can be replaced by the ‘_JnzUp’ command in the following way (again, calling the GetTickCount() API, as above):
Local subroutines can be called in the same way; however there is no ‘return’ command. The equivalent for a ‘return’ command is the following:
22 commands remain.
The following are two useful tricks just for the sake of interest. The first one demonstrates how to read parameters directly from the stack:
Then, simply by replacing the ‘_getdata’ command with the ‘_call’ command, function pointers on the stack can be called.
The ‘_push’ command can be removed, but the replacement code is ugly. It would look like this:
21 commands remain.
The ‘_popall’ command can be removed. The ‘_popall’ command pops the registers from the stack in the following order: edi, esi, ebp, esp, ebx, edx, ecx, eax. The command can be replaced by popping and assigning the registers individually, in the following way:
20 commands remain.
The ‘_nopdB’, ‘_saveWrtOff’ and ‘_saveJmpOff’ commands can be removed if the ‘_push’ and ‘_popall’ commands are retained. Replacement of the ‘_saveWrtOff’ command would look like this:
Replacement of the ‘_saveJmpOff’ command would look like this:
Replacement of the ‘_nopdB’ command would look like this:
19 commands remain.
Two other commands can be removed, but they cannot be replaced using existing instructions. Instead, the replacement code requires the introduction of another instruction. The two commands are ‘_shl’ and ‘_shr’. The replacement instruction is ‘_rot’ (‘rotate’). The direction of the rotate is not important, as long as it is known, since all values can be constructed by using it in conjunction with the ‘_and’ instruction. However, it requires the use of the value ‘1’ as the ‘base constant’. The value ‘1’ would be used to construct the values ‘0x7fffffff’ (if rotating shifts to the right) or ‘0xfffffffe’ (if rotating shifts to the left). This is the mask value that is used by the ‘_and’ command to zero the appropriate bit in order to simulate a shift. This is the simplest implementation that would rotate a value only once per use without reference to the value in the ‘ecx’ register. Multi-bit rotates could be supported, too, but then the ‘and’ mask would no longer be a constant. Instead, it would be specific to the number of bits that are being rotated. So, shifting the value in the ‘eax’ register left by ‘3’ times, using the single-bit rotate command, would look like this:
18 commands remain:
Many years from now, our distant descendants might stumble upon a codon stream that describes only 18 amino acids – and we might be looking at its origin. Imagine that.
1 A codon is a trinucleotide sequence of DNA or RNA (the nucleic acids that contain the genetic instructions used in the development and functioning of living organisms) that corresponds to a specific amino acid. See http://www.genome.gov/Glossary/index.cfm?id=36.
2 Transfer RNA, or tRNA, is a small RNA molecule that is involved in protein synthesis. See http://www.wiley.com/college/boyer/0470003790/structure/tRNA/trna_intro.htm.
3 Amino acids are the building blocks of proteins. See http://en.wikipedia.org/w/index.php?title=Amino_acid&oldid=412676887.
4 There is a bug in this code, which can result in attempting to copy more bytes than exist in the source.
5 There is an additional case where the destination is the same as the source, in which case the codons are deleted.
6 The ‘_CallAPISleep’ command was added to the release version because the API resolver code could not resolve the Sleep() API on certain platforms. The reason is described in detail in the previous article (VB, March 2011, p.4).
7 It would be even simpler to introduce an instruction which performs a ‘mov ebx, 0xffffffff’.
[Back to index] [Comments (0)]