Mutation Engine Report

Tarkan Yetiser
June 1992

P.O. Box 9393, Baltimore, MD 21228, U.S.A., (410) 247-7117

This report is provided to satisfy the curiosity of the public. We were approached by some third parties to perform an analysis on MtE. We would like to share the results of our analysis with everyone. If you find an error or inaccuracy in this report, please feel free to contact us. All constructive criticism is welcome. We thank all those who took the time to read and bring inaccurate or ambiguous parts of this report to our attention.

Mutation Engine and Viruses
How to Catch Viruses and MtE-based Viruses
Mutation Types and Detection Algorithms
1. Sample Decryptor Disassembly
Live Tests and Results
1. Comments on Test Results
A Simple Message

I. Mutation Engine and Viruses

We have analyzed the so-called MtE (Mutation Engine by a "Dark Avenger" from Bulgaria), and sample viruses based on it; namely, Pogue and Dedicated. We have also conducted tests to examine what kind of a potential this miscreant has, and collected empirical data on how popular scanners deal with the MtE. We have also implemented a little program (CatchMtE) that can recognize MtE-based code using an algorithmic technique. The program in executable form is available free of charge as a service to the public. Due to possible misuse, the source code and a more detailed (at the opcode and bit-mask level) analysis with decryptor samples and algorithms necessary to detect MtE will be made available in a limited fashion. Under no circumstances, actual virus samples will be provided; except the missed samples can be sent to known anti-viral product developers who wish to enhance their programs.

For those who are not familiar with the MtE, some preliminary information will be presented first:

MtE is NOT a virus per se, but an object module that can be linked into a virus to give it polymorphic capabilities. MtE expects to be called as a routine that can encrypt a certain portion of code and can generate a suitable decryption routine. It uses a random number generator to vary each mutation so that it will not be possible to recognize the new variant by using simple scanning techniques. The random number generator is not part of the MtE object module. A sample pseudo-random number generator is included with the archive Dark Avenger distributes. A virus writer could also supply his own random number generator.

It's not trivial to take an existing virus and link it with MtE and turn it into a polymorphic virus. The concern is that having MtE available greatly reduces the effort needed to craft such a virus. It's more like an "off-the-shelf" component that other virus writers can use. Many of those virus writers would probably not be able to code a polymorphic virus from scratch. Analysis of many viruses indicate that their authors are not that skilled at all. It is this majority that could benefit from something like MtE.

Though all this may sound ordinary, MtE got so much attention not because it is just another encryptive virus but because it can provide even simple viruses with a feature that makes it difficult to scan for them. MtE is just like a library routine that you link into your virus and call when needed. It is a little over 2K in an object module named MTE.OBJ. A person who calls himself "Dark Avenger" claims to have developed MtE, and distributes it by uploading to BBSes in Bulgaria. The archive contains a fairly detailed documentation on how to use MtE, and even includes a demonstration virus, a non-resident COM infector known as "Dedicated". Shortly after MtE made its appearance, a modified copy of this virus called "Fear" is also seen. We are informed that a few other MtE-based viruses have been recently reported.

Why this person is engaged in such potentially harmful activity, or how he/she gets away with it is not something we know about. Curious individuals who would like to learn more about the history of virus production in Bulgaria and other social as well as technical issues are invited to read an excellent paper written by anti-virus researcher Mr. Vesselin Bontchev of Virus Testing Center, University of Hamburg. The paper is titled "Bulgarian Virus Factory", and it is available via anonymous FTP. It provides insight into some of the cultural aspects of the virus underground in Bulgaria. Mr. Bontchev's contribution to anti-virus research is much appreciated; otherwise, we probably would have never known what goes on inside the Bulgarian virus factories.

II. How to Catch Viruses and MtE-based Viruses

Scanning for many known viruses is usually a trivial task. You disassemble a sample, extract a sequence of bytes that would exist in each infected executable object, put it into a pattern matching engine, and then look for that pattern in executable objects that that virus is known to target. This method proved to be quite useful in fighting many viruses seen in the wild. Assuming a carefully chosen scan string, you can find the virus easily without too many false positives. Not so for polymorphic viruses.

These viruses try to defeat common scanning methods. They keep their body encrypted to defy analysis, and encrypt the new copy inserted into an executable object using a different key so that it will "look" as if a different virus infection has occurred. However, even these viruses require a plaintext code that will decrypt the rest of the virus. Scanners can use strings extracted from the plaintext portion of the virus to identify them. It is usually necessary to include wildcard bytes (don't-care bytes) to be able to deal with the varying parts of the decryption routine. Naturally, false alarms are more likely to occur. MtE-based viruses are more advanced than such viruses seen before.

We would like to emphasize that the contents of each mutation and the corresponding decryption routine MtE generates is far too variable to extract a simple (or even wildcard) scan string. It is necessary to analyze the MtE itself as well as many sample mutations. After that, certain characteristics of the code MtE generates can be used as telltale signs to detect its presence. Avoiding false positives while maintaining 100% detection ratio is quite difficult.

Armed with an 80x86 instruction set guide (we used Turbo Assembler 3.0 Quick Reference Guide), and a good disassembler (we used Mr. Zandt's DIS86 available via anonymous FTP), and a few known viruses based on MtE (Pogue and Dedicated with payload removed), we analyzed the MtE code, and the mutations generated. Tests were conducted on a 40Mhz 386 with a 100 meg HD and MS-DOS 5.0, and a 4.77Mhz IBM/XT with a 30 meg HD and PC-DOS 3.3 installed. A simple program that generated decoys (small, fully functional programs) was used to create a large number of samples.

In the case of Pogue, the virus was allowed to remain resident and infect each decoy program as it is created. Since the Dedicated virus is not resident, it was necessary to create decoys first and then infect them by running the virus (infects in the current directory). After the tests, we archived the samples and stored them on floppy diskettes, and removed them from the hard drives of the test machines.

In the Intel 80x86 architecture, it is possible to express a computation in very dissimilar ways. This is possible because certain registers can be substituted in place of another one and still achieve the same result. For example, you can index an array by using SI, DI, BP or BX registers. Or you could XOR a certain value at a given memory location by loading that value in AX, BX, CX or DX first, and performing the XOR on that register, and then putting the result back into memory, etc. Even other possibilities exist. When stepping through elements in an array, you can increment the index register by ADDing to it, INCing it, or ADDing and then SUBtracting from it. It should be clear that such flexibility helps MtE significantly. Of course, variability is something string scanners do not handle too well, since there are many combinations to search for.

MtE goes even further than that. The size of the decryption routine is also variable, making it infeasible to assume certain things that would hold for many polymorphic viruses. It also sets up a lengthy sequence of redundant instructions before the decryptor enters the decryption loop.

For over 90% of the mutations, MtE generates a convoluted 16-bit XOR-type encryption; however, in many cases it uses indirect ways to apply the XOR mask to a memory value. For example, it computes the mask, and then gets the value to be decrypted into a register, applies the mask and puts the result back into that memory location. Besides, memory access is done using many different instructions such as MOV and XCHG. Sometimes, it uses a SUB and NEG instruction sequence instead of XOR. There are also many redundant instructions peppered freely throughout the decryptor.

In some cases (5.5%), MtE generates a decryptor with a null effect. The decryptor does not actually decrypt anything, and the virus code is in plaintext. The frequency of such cases seems to depend on the random number generator. It is funny to note that some popular scanners misidentify such extreme cases where the virus is not even encrypted. To handle these mutations, it is sufficient to extract a signature from the MtE itself. It is also possible to extract one from known MtE-based viruses and identify the virus directly. At any rate, a scan string from MtE itself should be used in case a future virus creates a plaintext variant.

We must also mention that even these plaintext mutations contained a fully working copy of MtE. They successfully propagated and generated encrypted mutations in future generations. MtE appears to generate correct code in all cases. The deviation between new generations started using plaintext parents and new generations started using encrypted parents was negligible.

III. Mutation Types and Detection Algorithms

MtE generates 3 "types" of mutations. They are as follows:

Double-reference (detectable using Method-1) ( ~ 93.0% )
Single-reference (detectable using Method-2) ( ~ 1.5% )
Plaintext or no-reference ( ~ 5.5% )

By implementing two algorithms and one scan string for the plain mutations, it is possible to recognize MtE-based viruses while keeping false positives to an acceptable level. We have one such program that achieved 100% hit rate during our tests. Some others also claim 100% hit rate; and we have tested them as well.

Double-reference mutations contain two memory accesses. The first one usually (but not always) fetches a word to be decrypted. The second one updates the same memory location after the word is decrypted. We must mention that neither fetching nor updating is always by simple MOVs; in fact, XCHG as well as SUB and NEG instructions are used sometimes.

Single-reference mutations contain only one memory reference. The instruction that decrypts using one memory access is XOR. It simply applies an XOR-mask, computed in a very convoluted way, to a memory location. There are some irregularities in this type of mutations. Popular scanners we tested had difficulty in dealing with this type of samples.

Our CatchMtE implements extra checks to reduce false alarms while maintaining 100% hit rate on this type of mutations. Many self-modifying programs (especially the compressed ones) trigger a false alarm, since they may contain similar memory references while they are modifying themselves. However, there is one characteristic of MtE that easily gives it away and simplifies positive identification. Once you verify the presence of that characteristic, you can achieve 100% hit rate even if MtE is enhanced! We will not go into further detail here for security reasons.

In the case of plaintext mutations, there are two subcategories:

Null-effect decryptor with no memory reference
Null-effect decryptor with memory reference but with a 0-key

The mutations in the first category contain nothing more than a series of MOVs, ROLs, ANDs and so on. Their overall effect is null since no reference to the code to be decrypted (in plaintext in this case) is present. The body of the virus as well as the MtE itself are both in plaintext and can be disassembled directly.

The mutations in the second category are a little more interesting in that the decryptor actually fetches from and stores words to memory. In binary arithmetic, if you XOR a value with 0, that value does not change; just like if you multiply a number by 1, you get the same number. Again, the overall effect of the decryptor is null since it uses a 0-key. The virus code as well as the MtE itself are both in plaintext.

A more detailed analysis of mutation types is not made public due to possible misuse of such information. We will only show here a sample decryptor and its commented disassembly.

A. Sample Decryptor Disassembly

Following is a sample decryption routine MtE generated during a live Pogue test. The rest of the disassembly is omitted for security reasons. This sample was missed by SCAN 91, and F-PROT 2.04; though F-PROT 2.04a detects it. Following code is NOT a virus.

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; Virus      : Pogue mutation, encrypted, resident COM infector, 21h,1Ch
; Analyst    : -
; Date       : June 1992
; Type       : Single-reference
; File name  : PMUT1220.COM, in PDF
; File size  : 3608
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
                ***** START of DISASSEMBLY *****
COM_PROG_ENTRY:

; In determining the program entry point, it is necessary to use a
; "sliding-window" approach since the first instruction is not a JMP
; or CALL. Second byte is a JMP (e9) and the offset follows it.

0000:0100      4d                  dec            bp
0000:0101      e9 fc 01            jmp            DECRYPTOR_ENTRY (0300)

; junk deleted

; In this mutation, BX register is used to step thru the code word at
; a time. Note the way XOR-mask is computed on the first iteration,
; DI will be 7DCCh the first time, and it will change as BX is incremented.
; We are looking at a piece of code that uses BASED-addressing.
; Since the XOR instruction at 0000:030c includes a constant offset 0f18h,
; we must add that value to the initial value of BX, which is set to f420h
; before the loop. Now if you add 0f18h and f420h, you will get 0338h! Why?
; Because BX is a 16-bit register, and it will wrap around after 64K with
; the carry flag set during the process. CF is not important here.
; In other words, the block that is being decrypted is at 0000:0338h.
; The constant 0dee is XORed with the current value of index, and another
; constant 7c02 is subtracted to obtain the final XOR-mask to be applied
; to the word at 0f18+BX.

DECRYPTOR_ENTRY:

0000:0300      bb 20 f4            mov            bx,f420

DEC_LOOP_TOP:

0000:0303      bf ee 0d            mov            di,0dee
0000:0306      33 fb               xor            di,bx
0000:0308      81 ef 02 7c         sub            di,7c02
0000:030c      31 bf 18 0f         xor            [0f18+bx],di

; Following is an example of redundant instructions peppered throughout the
; decryptors. After these two instructions, BX will not change, and SI
; is not used anyway. In a non-virus program, you would not often see such a
; sequence.

0000:0310      8b f3               mov            si,bx
0000:0312      8b de               mov            bx,si

; Now BX will be incremented by 2 to step word at a time
; This is the simple case; in some mutations, the two INC instructions
; are not used at all.

0000:0314      43                  inc            bx
0000:0315      43                  inc            bx

0000:0316      75 eb               jnz            DEC_LOOP_TOP (0303)

; Now how do we calculate the number of iterations? Not too difficult.
; Above you see an implicit terminating condition for the loop. Again,
; it uses the fact that BX is a 16-bit register, and when it hits 64K
; it will wrap around to 0, in which case the loop will be done.
; Since the initial BX is F420h, and we are stepping 2 bytes at a time,
; the number of iterations is: (64K - F420h + 1) / 2 = 1520 in decimal
; In other words, the size of the block being decrypted is 3040 bytes.
; Let's check to see if this makes sense numerically.
; The file size on this sample was 3608 bytes. Adjusting for 100h ORG,
; we can see that 0218h bytes are above the loop end.
;
; 3608 - 536 (0218h) = 3072 bytes from loop end to end of file
; 3608 - 568 (0238h) = 3040 bytes in encrypted block
;
; There is a discrepancy of 32 bytes? Not really, they are padding bytes
; as documented by Dark Avenger in MtE! 32 (MAX_ADD_LEN) bytes, as it says.
; This is only one of the ways it achieves variable length.
; Following are the padding bytes, which are all valid instructions!

0000:0318      bd 22 b9            mov            bp,b922
0000:031b      b1 03               mov            cl,03
0000:031d      d3 c5               rol            bp,cl
0000:031f      8b fd               mov            di,bp
0000:0321      bd ae 4f            mov            bp,4fae
0000:0324      81 e5 25 e7         and            bp,e725
0000:0328      8b dd               mov            bx,bp
0000:032a      8b ee               mov            bp,si
0000:032c      81 f5 60 fb         xor            bp,fb60
0000:0330      8a cb               mov            cl,bl
0000:0332      d3 c5               rol            bp,cl
0000:0334      8b cf               mov            cx,di
0000:0336      d3 c5               rol            bp,cl

; from this point on, the code is encrypted
0000:0338
                ***** END of DISASSEMBLY *****
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

IV. Live Tests and Results

      Test #1     Base Virus Name: Dedicated

                             SCAN 91     F-PROT 2.04       CatchMTE 1.0

      by Name (1)             67                69          60
      as MtE  (2)             933               931         940
      misidentified           -0                -0          N/A
      missed                  -0                -0          -0
      Hit Rate                100%              100%        100%

(1)  SCAN91 --> [Mut],  F-PROT 2.04 --> Dedicated, CatchMTE --> Dedicated
(2)  SCAN91 --> [DAME], F-PROT 2.04 --> MtE,       CatchMTE --> MtE-based

      Test #2     Base Virus Name: Pogue

                             SCAN 91     F-PROT 2.04       CatchMTE 1.0

      by Name (1)             0               0                 56
      as MtE  (2)             935           936                944
      misidentified (3)       -65           -61                N/A
      missed                  -0             -3                 -0
      Hit Rate                93.5%          93.6%             100%

(1)  SCAN91 --> N/A,  F-PROT 2.04 --> N/A,    CatchMTE --> Pogue
(2)  SCAN91 --> [DAME], F-PROT 2.04 --> MtE,  CatchMTE --> MtE-based
(3)  SCAN91 --> [7S], F-PROT 2.04 --> Gotcha, CatchMTE --> N/A

      Test #3     Base Virus Name: Pogue

                             SCAN 91     F-PROT 2.04a       CatchMTE 1.0

      by Name (1)             0               0                141
      as MtE  (2)            2399          2398               2259
      misidentified (3)        ?              ?                N/A
      missed                  -1             -2                 -0
      Hit Rate              99.96%       99.92%               100%

(1)  SCAN91 --> N/A,  F-PROT 2.04 --> N/A,    CatchMTE --> Pogue
(2)  SCAN91 --> [DAME], F-PROT 2.04 --> MtE,  CatchMTE --> MtE-based
(3)  Not counted

A.Comments on Test Results

The amount of time it took each program to complete the tests are not published not to distract from the main purpose of these tests: to determine if they can achieve 100% hit rate.

It seems that both F-PROT 2.04 and SCAN 91 misidentify some Pogue mutations that are in plaintext. F-PROT "quickscan" missed ALL mutations. You are advised to use SECURE scan mode of this product. The extra speed comes with 0% hit rate on MtE-based viruses!

In test #2, F-PROT 2.04 missed three encrypted Pogue mutations. We examined these samples and found them to be of Single-reference type, and detectable using Method-2. The samples worked as expected. We can only speculate that F-PROT lacks Method-2 detection algorithm and uses a heuristic in such cases.

One of those three that were missed was called "suspicious" and guessed to be a variant of the Gotcha virus. Surprisingly, Virx 2.3 missed one of these same mutations. We did not include Virx 2.3 in our full test suite.

Further testing was done with F-PROT 2.04a. This new release caught all the ones that the previous one missed in test #2; but still some others, of single-reference type again, were missed in test #3. SCAN 91 also missed 1 mutation completely. This mutation was NOT one of the two that F-PROT 2.04a missed; they were identified as [DAME].

It should be noted that misidentification of 6% of Pogue mutations is a little alarming. All these misidentified mutations were found to be working and capable of generating new mutations. Some people may take an MtE-based infestation more seriously than an infection by "Seventh Son" virus. This leaves a chance for an unrecognized mutation to escape detection. We must remember that it takes only one virus to continue the infection unless other precautions such as integrity checking are in place.

Someone brought to our attention that TBSCAN 3.3 can use MtE AVR module in VIRSCAN.DAT (revision 920617) to detect MtE-based viruses. A copy was downloaded from an FTP-site, and it was tested against the sample set in test #3. It detected no viruses at all!

One of our researchers mentioned that this product is highly optimized, and that it may not be checking for MtE if the file size is less than a certain number of bytes. Samples in test #3 were of varying sizes: as small as 3500 bytes and never larger than 4000 bytes. When we tested this scanner with a few larger samples (11K), TBSCAN 3.3 seemed to be able to detect MtE. We did not conduct comprehensive tests using TBSCAN.

This reminded us legendary algorist and computer scientist Dr. Donald Knuth's famous saying: "Premature optimization is the root of much programming evil". Improving the performance of one's scanner is always desirable, as long as one does not try to "cut corners" and sacrifice reliability. F-PROT quickscan also missed all MtE mutations, but at least F-PROT has a secure mode that catches most of them. TBSCAN did not have such a mode, in fact, it has an even faster mode that missed the samples faster!

It is not acceptable for a scanner not to find viruses that it has scan strings for (or algorithms in this case). The primary function of a virus scanner is to find viruses known to that scanner. If it performs this task in a speedy manner, that's nice indeed; and it may even give it a competitive edge. Failing to perform its primary function, on the other hand, defeats the purpose of such a tool.

V. A Simple Message

It is dangerous to assume that scanning is adequate since there are some products that can detect MtE-based viruses 100% of the time. We identified at least two ways to make MtE less predictable. Of course, such information will not be disseminated. However, considering the availability of MtE to the hackers all around the world, and the "glory" Dark Avenger will enjoy due to media hype, it's only a matter of time such improvements will be discovered by irresponsible individuals. Besides, this may start a new trend among virus writers to create things like MtE. Keeping up with new virus signatures was hard enough (though manageable), but keeping up with many mutation engines is not going to be trivial. Unfortunately, locking up these "mutant engineers" is not a practical solution, and not even legally possible in many parts of the world.

The message is clear. The first line of defense against viruses is NOT using scanners. Although they proved to be very useful, you are highly encouraged to consider other approaches such as integrity checkers as a first line of defense. Even the less sophisticated integrity checkers have a better chance to catch mutating viruses, long before their developers get a chance to analyze the virus samples. The reason is that viruses have a tendency to modify existing code to propagate in most cases. Their spread can be controlled using a non-virus-specific solution that concentrates on the main characteristic of most viruses. Such an approach is not only more cost-effective but also more secure. If your company still relies only on a virus scanner to protect its PC-based computing resources against viruses, you are walking on thin ice.

[Back to index] [Comments (0)]

VX Heavens