Tie your seatbelts, prepare for long adventure through virus history. I will list basic principles of war between viruses and antiviruses to show you how the story was going on. Most probably I will not be able to keep it in chronological order but I try to use logical order, to show main technologies and counteractions on both sides.
The story begins long long time ago (sounds like a fairtale, isn't it?) when first viruses were written. Doesn't matter which one exactly it was, the more important is that some of them appears on user's computers. At that time this war begins and it is continuing and growing up to now.
The Begining
No matter how big invetion were first self-replicating algorhitms, viruses
are not first programs that were able to do so. It started with worms and
other hardly-clasifieable pieces of code a time before virii. But viruses
make change and normal people having computers becomes infected.
The very first viruses all follows one of two basic schemes. File and boot
viruses and some of them survives up to now. Old boot viruses are quite simple:
they are spread in boot sector of floppy discs, and on booting from such
a floppy it copies itself into partition table and becomes resident (useful
if there is no hard disc in computer). Once it is resident, it infects
bootsectors of all floppies beeing writen. Thats all folks, all it fits into
one sector. Michelangello or Stoned fits into this class.
File viruses like Jerusalem uses simple appending parasitic infection, infects
com or exe files (or both of them). When infected file is executed, it
usualy becomes memory resident and infects all executed files since. Some
of them even don't have double-infection check (like Jerusalem) and often
runned programs can become quite long. I think all you know basic principles
so I'm not going to explain such a trivial things.
At that time situation was quite easy. May be some of you seen, for example scan19 - yes it detects 19 viruses! There were really few viruses at that time. How to deal with tivial viruses? Well, first antiviruses were really stupid and slow. Any program is and unique sequence of instructions - that something what every programmer understands. But what one (aver) can do if he (usualy) doesn't understand file structures? The result were simple algorithms very simmilar to searching for text in text editor - a whole(!) file is checked for specified string. This is origin of name "scan-string" which is a fixed sequence of bytes choosed from virus body. Moreover, some of first antiviruses scans file as many times as many strings they have. One may guess it is quite unefficient and slow. Sure! But at that time disks were really small (and computers slow as well). This technology was biggest invetion in order to fight viruses ever, I can say. It survives up to today but in modified forms - as viruses are still using fixed code (plain or encrypted or whatever) and they can be easily identified this way.
Antiviruses are bussiness. A big bussiness if one have a look at NAI. Beginigs
were quite different, as many independent (free) antiviruses were available
just to help people. But one can't stay competition with big money - look
at Microsoft to see why. Today, to keep track of a big number of new viruses
a many peoples are needed to work on antivirus for a full-time, and everyone
needs money. And people have to buy (or support) antiviruses as they affraid
of virus. Many people around the world things that viruses have to
destroy something - thats why they don't like viruses. But noone cares
that Windows crashes caused much more destruction than viruses. Because
it is normal. Weird, isn't it?
Well, this fear of viruses was started with biggest computer virus hoax ever,
initiated by McAffee - in order to make money, of course. It was Michelangello
couple years ago, may be some of you remember it: McAffee informed about
upcomming big computer dissaster caused by extremly dangerouse virus
Michelangello. They estimated 20 milions of destroyed computers at activation
date. 20 milions were too big number even in those days as there weren't
as many computers around the world as today. This hoax comes from publisher
to publisher and it grew bigger and bigger - and information about this computer
apocalypse appears in many countries. I remember dady of my schoolfellow
forbid him to turn on his computer (Sinclair ZX Spectrum with 8 bit Z80 cpu!)
because a virus can came to is through network (power network of 220V!) and it
can be destructed. Wow! Unbelieveable, isn't it? Even more that repair
disc destroyed by Michelangello tooks few seconds with diskedit. But noone
mentioned it in this hoax, of course. As activation day passed, everyone
understoods I hope, too few computers were destructed (comparing to 20M)
but this hoax succeed: people starts really affraid of viruses, and
antiviruses are sold worldwide - they become a big bussiness.
Old techniques of scanning (scan-strings)
I already mentioned first scanning methods, based on scan-strings (sequence
of bytes selected from virus body). If they are found in file, it is marked
as infected. Some of first antiviruses scans whole file for such a string,
but later on they scanned only some specified area usualy used by viruses:
begining of the file, end of file, and/or around exe's entry-point or com's
first jump target. Usualy aproximately 6kB were (or are) scanned - it is
quite little to load it fastly and quite enought for most of viruses - at
least part of body should be there. Scan-strings are checked at every position
in loaded buffer, scanning is at suitable speed.
Here should I put little discussion about scan-strings and how to choose them:
at first, I will mention other forms of scan strings later on. Choosing
scan-string is not as trivial as one may guess. At first, such a string
have to be in loaded buffer in any case. Scan string should be as short as
possible (in order to save space and scanning time), but as long as possible
at the same time (in order to detect only this virus with no false possitives).
This sequence should be typical for virus (preferable this virus only),
and not to be found in any other regular file. If does, it is called false
possitive identification. It is rather difficult to have no false possitives
with many short strings - as there are many programs and one simply can't
have them all.
An example of really bad scan-string is e.g. E800005x, it is short, really
typical for viruses. All you know basic opcodes I assume from head, but
I'll translate it to: call $+2, pop xx. But it can be found nearly in any
virus, and in many regular programs written in assembler. Hope you got the
point.
Another discussion is if this string should identify more viruses at once,
or one-and-only. If it identifies for example huge part of Jerusalem family,
it is advantage that it may identify also new mutations. But it is not
suitable identification for cleaning, as they partialy differ from version
to version. Today's trend is to have as exact identification as possible.
But even today it is not possible. This leads to another extrema, typical
for Dr.Solomon's Toolkit: to identify versions even it there are not. An
example: virus named Z (pure fiction), there is only one version in real,
but in aver's collection it is separated into Z.A and Z.B, and solomon
identifies them as two versions but most of others not. But if you take
Z.B infected file and replicate it, it is caught by solomon as Z.A. What's
going on? Well, they have selected scan-string also from host file.
Usualy noone takes care as avers in many cases are not replicating files
and only some selected samples are travelling round the world - so they
may get 100% hit rate on some virus - but only in virus collections
avers have and not in real life. Remember this: they have only few samples
(usualy), and whey are not active (not executed). The Tremor story later
on tells you why.
Early battles - fooling simple scanners
Situation stabilizies: there were some viruses, but avers weren't able
to beat them completly. Moreover, once it is bussines, they don't want to
win this battle totaly as there will be no war anymore and no bussines
anymore. Think with me: scanners were available and finding viruses at suitable
rate. Of course there are still peoples not using updated scanners all the
time so viruses can survive, but new viruses once they are found are added
to scanners and can be easily identified. Too bad for virus-writer spending
days or weeks to create nice piece of code to be breaked in a minutes. You
have to invent something. There were two answers - stealth and encryption.
Stealth counter-attack
Now let's think how scanners works in that time: scanner runned on computer
infected with virus opens each file and checks it for some id string. How
you can hide? You can become "invisible" once you have total control over
computer (elementary under DOS) and hide files beeing scanned. This is called
stealth (due to U.S. Bombers B-2 called "Steath" - invisible for radars)
and we may talk about two implementations for files: disinfection on-fly (each
opened file is disinfected and again infected on closing) and true stealth
(all file operations are checked and modified). And for boot viruses
a sector redirecting is used.
Computer is infected with stealth virus. Virus is active in memory, user runs
his favorite scanner and it is searching for strings in files - but as it
opens files with viruses, it can't find anything as virus hides itself.
Nice, isn't it?
Memory scanning
Imagine you are an aver (as you have think for both sides, otherwise you can't
rule this war) - what would you do with stealth viruses? Simplest answer is to
scan memory as well, and if virus is found, ask user to boot from clean floppy
and run scanner this way - then there is no virus in memory and all is as
before with regular viruses. Easy easy.
Memory scanning in old times was simmilar to file scanning. All memory is
checked for same strings as files, if found - a virus is reported in memory.
To speedup the things some antiviruses doesn't scan whole memory but only
possible locations - they may skip ROMs, antivirus itself, etc. But it differs
from one implementation to another. Memory scanning is not a big technical
mirracle.
Once virus is found, some antiviruses were able to patch virus to be inactive
and to continue without need to boot from clean floppy. But due to many viruses
appearing later it is not usual to do so today as there are too many viruses
and you can't write such a routines for every of them. AVP, for example
performs such a activity even now, but only a for few most common viruses.
However it is quite userful for lazy users. Inactivating can be done easily
by replacing virus handlers with jump to original entry-point of hooked
interrupt. Also usualy a virus body is erased (except jumps of hooked
interrupts in order to keep interrupt chain functional) not to report virus
again. It must be done in interrupt-shield (cli) of course to protect for
asynchronouse break-downs.
Another idea how to partialy inactivate virus in memory presented by some antiviruses is known-entry-point methhod. There are two basic interrupts under dos: int 21h for files and int 13h for sectors (boot viruses). If you know the original entry point (you know this version of dos or you have stored this entry-point at installation process) you may find out if some virus is in memory and you can access functions without virus' influence. Of course, for int 13h you must check not for real interrupt pointer as it points to DOS, but for internal pointer in DOS that points to ROM as boot viruses are loaded (and hooks int13) before DOS does it. But this technology in general has many weak points and it is forgotten today. As even legal programs may redirect those interrupts because Microsoft designed its "OS" this way. For example caches, networks, etc redirects this interrupt. Novell netware for example uses redirecting int21 instead of MS's recomended network redirector facility for network implementation (because it is implemented in versions 4+). If you call int21 entrypoint directly you can't scan Novell's disks. This technology caused many crashes and is unusable in generic case, you may check my another article about this: why not to use direct disk access which deals with these things.
Encryption
Once stealth viruses can be found in memory, another tryies comes with
encryption expetiments. It started with first encrypted viruses that had
main virus body encrypted but there must be at least short decryption routine.
And this routine is still a fixed sequence of bytes - and it can be identified
with a scan-string. One may guess there is no improovement. Actually, not
a big one, but it starts development in this direction.
Wild-card scan-strings
Situation complicates a bit. Avers are forced to you one scan string,
for example only fixed 16 bytes of decryptor. Btw: some stupid avers
choosed scan-strings from virus body - e.g. xored each time with another value,
so they were able to catch only samples they have, but nothing else :-)
Well, let's think about simple
xor routine, quite fixed, however there are several variable bytes: encryption
constant (let's talk about one byte) and starting offset. As they are not
at the same place, the 16 bytes of decryptor (pure example) is broken into
3 chunks of fixed bytes, biggest of them let's say 6 bytes long. And avers
have a problem: 6 bytes are really not enought for scan-string, as they
are not absolutely unique - part of unvaluable loop can be found in other
programs (see discussion on scan-strings above). Oops, how to deal with it?
(Think once again as aver) What would you do? Once you have some technology
implemented, functional and tested it is best for you to use it at maximum.
Scan-strings ... well, how about wildcards? Thats it: all you need is to have
one-byte substitution like '?' in shell patterns. In this case you can
have still 16 bytes long scan-string with 3 variable bytes. It fits the
requirements and all is as before - you have a scan-string to identify
virus, all is okay. The most important is they were able to deal with it,
but it tooks some time - and it gives viruses possibility to be spread.
This is first implementation of wildcards in avir's scan-string history,
but not last change in scan-string methodology of course...
Another problem that appears here is encryptor vs. body dilema. Once it
identifies virus by encryptor only, it can't make a difference between
versions, moreover it can't make difference between different viruses
with same (or roughly same) decryptor. Well, cleaning problem can be solved
by easy de-xoring by cleaning routine - you must to do so if you want to clean
encrypted virus - and you can check the difference after decrypting. But
this is important change in methodology - as there no exact identification
before cleaning and identification must be done once again at cleaning
process in different conditions (a cleaning routine or scanner executed once
again can do it). This problem still remains and I will return to it later on
with MtE.
Variabilizing encryption
Avers handled encryption with wild-cards, you have to think about something
new again, unless you want to be caught in a days. Once virus have some
simple encryptor, you can improove it a bit: you can increase variability
not to be handled by '?' wildcards by inserting of nop's or any simple
junk instructions. Then your decryption instructions are not at fixed
distances and simple wildcards will but be able to handle them. For example,
if you have two fixed instructions together, a scan-string can be choosed
from both of them. But if you insert 1-5 nops, scan-string with '?' will not
deal with it (unless there are 5 scan-strings ;-) Simple, and it can't be
handled by current methods.
More wild-cards
How avers can find such encryptors? They implemented another type of wildcard
for it - '*' equivalent for variable number of random bytes. It tooks some
aditional work but it as handled. Depends on implementation how many random
bytes they allow - if it is fixed, or a limits are included in scan-string,
or whatever. Scan-strings becomes differ from avir to avir. They were still
able to handle all viruses with scan-strings but there become a big number
of strings to be used that slows down scanning itself (today it looks like
a kids game but viruses were at much the lower level than now too).
Some avirs starting using some hierarchy in strings, methods of strings
and substrings (smaller set for generic identification and if found, more
detailed set), pre-sorting of strings into radix-tables, etc. It depends
but all of them follows basic principles and fulfills the requirements.
Interesting idea to speed-up scanning process is single-point scan-string,
checked at fixed relative offset to some important file position (e.g.
entrypoint, file start or file end). Such a string can be shorter, as it is
checked only once at fixed offset (comparing to strings checked in whole loaded
part of file) that decrease possibility of false attacks (and it saves memory
as well). It is much faster to scan for such a strings, and it is easier
to distinguish between versions. If a single-point string is well choosen
it can be only 4-6 bytes long, comparing to 12-18 bytes of regular string.
Way from variable encryption to metamorphism
Once you are modifing a decryptor with some trivial junk instructions (there
is no reason to put there some harder one, as all was needed to beat fixed
strings, and nops can do that as well as other instructions) you can do even
something more. Scan string is fixed sequence of bytes, but if you change
and indexing register, it becomes different sequence of bytes. Decryptors
started to change in every infection, changine indexing register, decrypting
instruction, loop method, etc. Encryption scheme is pretty visible, however
it slightly increases byte-level variability up to level when even wild-card
scan-strings can't be used at all, or they can't be used at suitable
reliability.
This is something we can call variable encryptors or metamorphism - everyone
call it in different way, avers clasify it even as a low polymorphical engines
in order to show how clever they are. However, now there were no matter of
junk instructions (are there or not) once valueable bytes of decryptor
instructions can't be checked. It was presented in many forms in viruses
and it requests new answer from avers.
Algorithmic scanners
is the name of technology they present. As tries with mask for scan-string
(to filter-out part of byte beeing variable) doesn't show suitable results,
a something new had to be found. Scanners started to use (parallely) short
routines to distinguish if piece of code is a known decryptor or not. It
checks for some code sequences or forms, if it fits hard-coded requirements,
file is reported as infected by virus. Usualy they had as many algorithm
routines as many decryptors they want to recognize. As scheme of encryptors
they are checking for follows really easy rules, it can be tested with
satisfieable results for positive infection.
Simple encrypted viruses were checked this way. But most of top avirs
are not using this for trivial virii now (some of them does, i.e. Avast! which always
(ok, usualy) rates in VB's 100% award group - but you can see tests -
it can't identify even simply encrypted viruses exactly). So most of top
avirs are using some kind of tracing (i.e. emulating) because it is required
today to handle many of complicated viruses,
in some sort of generic decryptor - routine which is able to decrypt simply
encrypted viruses (or more complicated, it again depends on
implementation).
Inoculating
It was another interesting things antiviruses offers in old times - may be
some of you remember for example TNT Antivirus (it is gone) that does it.
Functionality is simple - viruses usualy uses some marks to tag which file
was already infected, not to infect it again. (all this is nearly same
for boots/mbrs). All you are using some variable set in file, or virus
body (some bytes) already found in file, or changing time/date of file.
By inoculating those atributes are set and virus will not infect it again.
Sounds nice, but unfunctional in general :) In that time there were not
as many viruses, but it becomes imposible too - you simply can't inoculate
files agains all viruses. If they are checking for seconds of modification
there can be two different viruses that set it to two different values,
so you can't cover both of them. Yes, viruses aren't testing files only for
their flags, but for some limits too. But some of them you can't fake -
for example some values in exe header, or overlays (program might become
unfunctional).
These are the reasons why it can't be used for large number of viruses
or for all viruses - it can be done for one or some small number of viruses.
Moreover, noone todays spends a lot of time with analyzis of viruses today
- most of them are analysed in a short time, and you have to know them
completely to do inoculation, otherwise it may damage inoculated files.
Well, in other words I don't think there are any reasons to take care about
inoculaton today.
"The Final solution"
This way some av companies called their antivirus systems another time ago.
They presented "the final antivirus that can deal with every virus without
knowing it". Sounds good, isn't it? And it is more-less true. Do you have any
idea what is it? Well, checksumming, thats it. Idea is simple, and it works
in many cases: all files that can be infected are chekcsummed (some kind
of crc is calculated), plus filesize, some bytes from header, entrypoint,
etc are backuped. Then, if virus infects some file (it must not be stealth)
a change in lenght or contens is detected. First checksummers were really
slow, as they checked crc of whole file, and it takes some time to load
it. But it can be speeded-up rapidly by checksumming only important areas
(header, entrypoint, fileend) with same success. Well, once some change
is detected, file can be repaired by trying some of available repairing
schemas (typicaly there are only few of them how viruses inserts itself
into file) and if result of some of them matches original crc-s, file
is successfully repaired.
Sounds nice, but it has several problems (lucky, lucky): at first it can't
even detect stealth viruses. But if they are not in memory, they are still
valueable. Another big problem are lazy users - because most of them are
using (and downloading) antivirus only when they have some suspect or
if they are really infected - and there is no sense to make crc snapshot
of infected files ;) And finally, there are still viruses (and this is how
you can avoid checksummer's success) that are not infecting files standardly,
or modifies some bytes deep in host code, or whatever that doesn't match
implemented schemas.
Checksummers didn't get a big success for these reasons, but they are still useable in many cases and even more, with combination of heuristical cleaner they can be more efficient. But there are still lazy users which are not using antivirus until they are infected. Because of it this can't be a really big weapon against viruses in global. But there are still antiviruses are using it, and can reach a big efficiency of detecting and cleaning.
MtE breakthrough - polymorphism
Dark Avenger, world most famouse virus writer from Bulgaria, become famouse
mostly because of a 3kB long object file he released. It is known as
MtE 0.9 beta (short name of Self-Mutating Engine) which made many avers not
to sleep for many nights. This smashing
breakthrought was many times plagiated by some virus coders, but I think
it was never (or nearly never) as good as MtE. What it was? Imagine situation
of scanners before: all were based on scan-strings with wildcards, at most
some easy checking routines (usualy not). But Dark Avenger informed the
world in FidoNet message-group (I don't think that some of current guys on
scene remembers fido) about his library that that can encrypt virus in
4.2 bilion different ways (4G you should understand) that can beat all
scanners. Moreover it was real.
MtE started new era called polymorphism. It was able to generate a decryptor
containing many instructions withou visible schematics in it. Random maps
of registers, several accessing modes, fake codeflow alternatives, all was
so unusual. Only schematic thing was end of loop - usualy dec/dec/jnz
sequence, as avers decided there is always this sequence. Most of them thinks
it also now because Bontchev said so, but there isn't :) I got this result on
thousands of generated samples I made - with some probability it creates other
loop instructions sequence.
MtE 0.9 was distributes with sample virus - non-resident com/exe infector,
which was many time patched by lamers that are not able to write its own
virus (with MtE library or not) and many very-simmilar viruses with MtE
appears. A usual name of sample virus is MtE:Dedicated, because it contains
a string: "This virus is dedicated to Sarah Gordon who wanted to have a virus
named after her." (hope I remember it right). Here we have - famouse and
hated (and foolish) Sarah - even guys from av scene doesn't like her
theoretical stuff, but they can't say it clearly as we can :) She became
famouse (except it is a woman ;-) due to her investigation about virus
writers and their origins. Funny but unusable and she sure hopes it is
forgotten now ;) Well, but back to MtE:
Also library MtE 1.0 appears in the world, but it was nearly forgotten
as it doesn't bring new features, and most viruses are using original
version 0.9.
Algorithmic scanners once again
MtE goes above the limits of old antiviruses and presents some completely
new idea they have to fight with. Some unreliable detectors appears that
checks for some secondary flags, like entry-points, or some code-sequence,
or file-tagging, but they weren't quite functional. It tooks rather long
time (months!) until a good detector was written and build into antiviruses.
Partialy at that time it tooks many days until virus can get from country
to country - unlikely today. But no matter of that there were a big
compentition between antiviruses to catch all samples of MtE.
A many independent test were made (it was never before or after) testing
antiviruses on thousands of samples if it can find all MtE samples.
It tooks lots of time to all antiviruses to reach 100% hit-rate.
Again a question of exact detection appears. Recomentations of CARO suggested
the sollution (and some antiviruses follows it, like TBAV in that time) that
polymorphic library should be part of virus name, separated by semicolon.
For example MtE:Pogue.A - rest of hiererachical virus name should be
dot-separated as before to display versions/revisions. However, it was
quite difficult for avers to decide if there is a MtE encryptor at all,
they weren't able to go under this generated encryptor.
How they were detecting MtE? Well, a algorithmical scanners were this solution
once again. But withou visible schema it wasn't so easy. Most of antiviruses
used (and mostly they are using also now) an acceptage-disassembler. The idea
is simple: MtE generates only some instructions, loop is always terminated
by dec/dec/jnz (well, not always, but no matter now) all you need is to know
is given instruction can be generated by MtE and the size of instruction (to
know where the next instruction is). If a jnz is found, you need to check
if it is in backward direction and there are two dec-s before it. Well, and
to solve conditional paths - just try to pass both of the using recursion.
If test is passed and backward jnz is found, a MtE virus is reported.
Such a test is fast enought, hits all infected samples and has no (or really
really little) false positives. And it can be as little as a bit more than
300 bytes as it is illustrated in TBAV.
Thats why some of antiviruses can't report exactly what virus is encrypted
by some polymorphical library - they are checking (usualy) if decryptor
can be generated by coresponded poly engine. This technology is intended to be
non-destructive analyzis (not to load this code once again) in comparison with
emulation.
Plagiating MtE - polymorphic era
Success of MtE was never replicated. At first I think none of routines were as
good as MtE, they were different - as usualy noone understoods Darkie's code.
But to be as successful as MtE it is not enought to write a good polymorphic
engine - this is something I want you to understand - as current avir
technologies can handle polymopric viruses mostly without problems. To be as
successful as MtE one have to make simmilar breaktrough - something that
is incompatible with current thinking of antivirus guys.
When MtE kicks all the avers pretty hard, a many virus coders started to write
simmilar engines (as once MtE was already detectable). First one I remember
was TPE - Trident Polymorphic Engine (released by Trident group). There was
a big fear from AV side of it because all they sill remembers MtE's fear.
However, TPE wasn't as successful as MtE in world becase most of viruses
weren't spread enought to be important. TPE technology was a bit different
than MtE's - it uses several schemes of main encryptor, picking one of them
plus some number of introduction schemes placed before encryption loop. It
was rather schematic, but there were many schemes so it wasn't visible
for the first view. Hovever, some detecting routines used simmilar algorithm
as for MtE, some detected each scheme in encryptor and checks it. In general,
TPE as handled much easier by avers - as they knew how to deal with it
already.
I will not make big differences between each polymorphical
engine as they are principialy unimportant. Some engines were really easy
piece of cake for avers, some made them a lot of problems. Some noticeable
poly engines usualy only reach some limits of antiviruses but never goes
above them - thats why other poly engines weren't so successful - because
MtE settuped a rather high limit. I can show you it on SMEG - all why
it was so dangerouse for avers is because it can generate a really long
decryptors.
Well, a big fear of avers can be, if many polymorphical viruses (or engines)
appears in a short time, each of them non-trivial (on some of limits of
scanners), it will be really hard to implement specialised scanning routines
for all of them, if they are reported in-wild.
Heuristics
Well, this is a another big chapter, developed together with other
technologies. A time ago, heuristic was only a experiment how one can catch
unknown virii. But wasn't quite relieable, widely it was introduced by TBAV (sure
everyone knows). As we have completely dedicated article to this topic
I will not describe it here - only to describe its reasons and influence
in history.
Finally, with more and more viruses comming each month, some avers tried
to find out something that can detect even those they are not able to
add so fast - to detect unknown viruses, in general. In fact it has same
proposal as checksumming already mentioned. For a long time heuristics
was some kind of avers alchemy to improove their hit-rates. It was magic
that everyone admire (avers, virus-writes, gurus, coders and regular lamers),
but noone trust. Funny, isn't it? First for wide public, surely not best,
and mostly fooled by virus-writers is TBAV. TBAV puts all its power
into fast heuristics but it has primary weak point - it was passive instead
of active (disassembling instead of emulation) and it wasn't able to go through
encryptors. Another bad thing for TBAV were displayed flags so anyone can
see what internal flags were found on given file. And using documentation
you can find out what TBAV suspects on your virus - and you can tune up
not to be detected by TBAV easily. Soon many viruses started to be anti-tbav
that means not detected by tbav's heuristic by default (today it is some sort of standard). It is too bad
for heuristic - as it is designed to catch new viruses, but if they are
all designed not to be detected by such a heuristic, there is no way to do so.
TBAV's heuristic finds its death in these things.
TBAV (followed by some plagiats) uses, as I already mentioned, a passive method
or disassembly (in other words) that analyses code (instructions) and
detects some suspecting schemas - like setting registers and calling interrupts,
etc. There were a lot of flags (nearly for every letter of alphabet) for many
things and they are detected in different ways. But it was rather easy to fool,
simply if it looks for mov ah,40 int 21, all you need is to do mov ah, 3f
inc ah int 21 and TBAV will not complain. For this reason anviruses that
still uses passive analyzis as main weapon combines it with register emulation
(tbav as well) that can (a bit) keep a track of values in registers. When
int 21 is found, for example, a 10 instructions before are likely analyzed
to find out values of registers. It works in many cases and do not work
in many cases as well.
Most funny thing was decryptor detection. It didn't work in many cases,
and then tbav runs to detect instructions from encrypted area - and usualy
it founds many suspected instructions there of course. Well, I'm not here to
judge TBAV or other avir, for this proposal we have another article.
Another more powerful heuristic is presented by AVP (but
it is usualy hidden as avp displays regulary detected viruses at first),
by DrWeb and Nod-iCE. They are using active heuristics (emulating as much
as possible) and are able to detect much more suspected activities. Also,
you don't see any flags there, so it is harder to fool them. But AVP's heurstic
as well as Dr.Solomon's are setuped to be less-sensitive as they can detect
plenty of viruses by scan-strings and they do not need to be as successful
on uknown viruses as others. For this reason of course they have less
false-positives as well (our experiments some time ago shows that hit-rate
of Dr.Solomon's heuristic for example is round about 70%).
Active heuristic (emulation) is destructive to code, as it emulates as much as
possible, and it must be trickily combined with scanning. But it simplifies
scanning as emulation can simply go through decryptors and then av can
detect virus exactly as it is already in decrypted state. For this reason
it is also called as generic-decryptor in some antiviruses - if they are
using emulation only for this. But heuristics finally after years of beeing
unsure becomes a standard, and as it is showed by Nod-iCE and DrWeb, it can
be really relieable. This what emulation gives us. However top antiviruses
today uses combination of both methods.
Weak point of passive heuristic (or disassembly) is disassembly itself:
there is difficult to find out values of registers even in simple cases.
Of course it depends on implementation of heuristic. Also any encryption,
or data-depended or highly-structured code can't be understood by
disassembly-based heuristic scanner. As heuristic scanner looks for typical
structure of instructions of viruses (searching for executable files,
accessing and modifying them, becoming resident, etc) do this things in
some tricky way, not clearly and visible.
To fool emulation is much more difficult. Emulation typical executes code
of virus, like in regular computer, establishing some circumstances and
testing if code is performing usual virii activity. At first, emulators
are limitied by its definition - they are much slower than regular machine,
so long decryptors or routines jumping long time each to other are aborted
on a timeout - because heuristic can't hang for a long time on one file.
Then there are limits of processor - only one type of processor can be
emulated (more-less) perfectly. You can test processor if it works in the
way it should: undocumented (but mostly unknown!) instructions, may be
some badly implemented instructions in their emulator (its hard to find).
However, it is just work for couple of minutes for them to implement
another instruction. But there are also other limits - machine can't be
emulated completly: entire of file can't be loaded (imagine loading 500k
exe file), virtual machine doesn't work like it should - many of interrupts
may not work, things doing by other parts of system are not also completly
emulated, i/o ports usualy doesn't work (may be some easies of them are
emulated, but they can't work with all of them), etc. Hardest for avers
should be reaching limits of emulator, because they can't extend their limits
every time: memory length, file loading, emulation speed.
Cracking Windows
Have you ever crack a window? Just take a rock, and throw it to the window.
Easy, isn't it? All right, I'm not going to write about it, but about
real Windows - Microsoft's revange to the rest of the world. Time ago,
with Win3.x world was devided between ones that doesn't like Windows (or
even hate) and to the ones that likes Windows. (who of them used it, it doesn't
matter now). Simmilar it was at the virus scene - most of them stayed
at DOS level for three main reasons - there were no need to write for Win
and DOS was good enought, it was less documented and finally many of coders
weren't able to code something for Windows. Now it is a bit changed. Microsoft
rocks the world with Windows 9x and turned everything to be PE-ized.
Well, history repeats:
Windows 3.xx
First Windows viruses were simple examples. At first, a file format is a bit
changed - NE has extra fields, there are different circumstaces in protected
mode, but interrupts still works and things are more-less similar. So the
first viruses were simple non-resident infectors. And all avers needed to do
is to implement scanning of secondary entrypoint in NE. Virus was pretty
visible, simple scan-string can be used. Later I remember a big rumours
about first resident viruses in Windows. A many discussions started
if viruses will stay under DOS or will move under Windows. Today I think
all you know whats true. But Windows are more complicated - there are more
files beeing target of infection - DLL for example, and more things to infect
in them. Interesting example was virus that infects exported labels in DLL,
for example exported function XyzA, instead of regular entry-point was infected.
What scanner must search for in this case? It has to go through all the exports!
And there can be a lot of them that will decrese speed of scanning rapidly.
It is still interesting idea. Only way it was handled by avers was scanning
file-end (what they usualy do) for string, and oops - virus is there. But if
the things are more complicated to scan, some encryption for example or not
to be located on some easy determinable place - it will be really bad for
avers (they'll have to emulate code at every export label).
Memory scanning is also not possible in a way it was for scanners. Under
Windows 3.xx one can do what he wants - some viruses for example for this
reason goes to Ring 0, but antivirus can do the same to scan all the memory.
But there is of course more memory to be scanned and it is rather slower.
Today memory scanning in Windows is not that prefered. Instead of it, a
resident scanners are used more widely, as they are more consistent now
with operating system (Win 9x) and there is not as big hunting for memory
as it were under DOS with Bill's world famouse 640k limit.
Windows 95/98
was a real Microsoft's smash. Some of users tried to ignore it, but time
shows that Win95 changed the world - everyone start using it. Today virus
writers must to focus on this platform, because there are lot of users.
First tries for Windows 95 started at the time when only beta version was
available. VLAD promptly prepared first virus for Win95, and they spend
a lot of time with exploring the details - it was Bizatch (by Quantum/VLAD).
But their virus did not work under final Win95. The reason why it didn't work
is simple - may be some of you know book called "Inside Windows 95". A lot
of userful things regarding windows internals is published there - it was
also published before Win95 was released. For this reasons, programmers
at Microsft got order to changed some important things in Win95 to be
incompatible with the book already written (to deny access to internal things).
Also magic numbers of imports were changed there, and imported label
for example FileOpenA was no longer correctly linked at load time.
Another interesting is a Bizatch story. Because avers has access to beta
version of this virus (well there are some guys at virus scene that can
trade internal things with avers without remorse) and they firstly assumed
it is not functional. Of course, a real version they also got later on.
But they named it Boza (well, all-around-the-world-hated Vesselin Bontchev
(even avers hate him because of his ego)) - because he doesn't want to please
a virus author. But - CARO rules (setuped by Vesselin!) says that if
virus calls itself in some way, this name should be choosed primary. And
Bizatch is: "Please note: name of this virus is [Bizatch] written by Quantum
of VLAD". Instead of it Vesselin find a name Boza (from bulgarian alcoholic
drink) - with no connection to original virus (this is the worst case
suggested by CARO naming rules). Everyone at the scene was angry about avers,
of course.
Forget about flame wars now. Scanning - that what is interesting. Windows 95
were 32-bit, but format they used - PE, was used even time ago. Windows NT 3.x
used it as well as win32s extension to 3.xx versions. At first what one
can expect from 32bit file format - all offsets and pointers are 32bit, of
course. Other principles are more-less simmilar to NE - there is primary
entry point, several segments can be defined, many exported functions
(for 32bit DLLs), etc. But things are same as before - to scan for simple
virus (non-encrypted) all is needed to load entry point of file and
scan for some bytes - all is as before, only PE loader is needed.
Now let's see what weapons avers have against the Windows viruses. At first
we have a look at oldies scanning methods: scan-string scanning can be
used in a same way as before. Checksummers may also do its work but a PE
(or NE) schemes must be implemented there. The hardest part is heuristics
and generic decryption (well, or both at the same time). For PE a 32bit
emulator must be programmed and at the present time I don't know about any
antivirus having it fully functional DrWeb is preparing it, but not yet...
For this reason current heuristic engines uses for 32bit PE only passive
heuristics (some kind of disassmbly). And thats why there aren't generic
decryptors and each polymorphical virus for Win9x must be handled separately.
But all Win9x viruses can be detected by its decryptor and - there are
not many polymorphical viruses for Win9x that are principialy different
so at the present time a generic decryption is not as urgent as it was
for DOS.
Macro world
Microsoft offers many virus-friendly enviroments. During all the history it
was this way and another powerfull macro system becomes a new platform
for viruses. Yeah, MS Office did it again. First tries to write virus
for Word were something like jokes. Most of people hassitated to call
it virus, "self-spreading macro" was most obviouse definition. But today
everyone call it a virus, and there are realy many of them now. One may guess
there are even more macro viruses (or other script viruses) appears monthly
than "regular" viruses.
Scanners have their life more complicated once again. Microsoft keeps "structured
document format" documentation for themsefs claiming: it is a internal
format, you don't need to know about it, just use our programming interface.
However, Microsoft's interface doesn't allow enought to scan for viruses
in macro area. Avers had to find out document format by themselfs. Many of
them weren't able to do it for a long time (until third-party documentation
appears). Because this format is up to twice fragmentized, and moreover -
fragmentation definition can be fragmented as well. Scanning macros was so
unusual for avers, so some antiviruses scanned whole files (funny, if virus
body scanned for can be fragmented too - and they are not able to catch
fragmented pieces), or even specialized antiviruses appears and were
rather successful at market - like F-Win or HMVS.
From virus-maker's point of view there is no more needed than to understood macro commands. But avers has to do much more. Microsoft's document format can be encapsulated in other formats and it is needed to scan them all (like MS Excnage's folders, etc). Once they have reading routines to access macro area, a regular scan-strings can be used. Some of first macro scanners just scanned from names of macros, but it is outdated today. Scan-strings are really relieable. However, or polymorphic macro viruses things are more complicated. For these reasons and again - to catch new viruses appearing every day, a heuristic scanners appears. They are based on dissassembly of macro code (accessing macro area and walking through instructions, finding unusual and/or suspected instructions or combinations). For macros heuristic is much more reliable as instruction set is much more limitied, there are no registers or widely accessible memory, etc.
Closing
Congratulation if you read all the things above. Hope it was not boring,
and it helps you some way. The main thing I tried to present here is you
have to think, not plagiating other viruses, not doing all viruses
same way one right like another - but to show you that you have to understand
scanning methods in order to write better viruses. Because the more your
virus complicates life to the avers, the more it is successful. If you
can write something that completly beats currently used methods, thats the best.
I can give an example of slovak viruses like Dark Paranoid, or TMC:Level_42, or
let's start with german virus Tremor: its nothing unusual except after
it was detected by avers and added into scanners - it permutates (changed usual
schema) and old samples weren't caught by antiviruses again. Or Dark Paranoid:
as they weak point of stealth viruses is their presence in memory (and they
can be detected there easily), Dark Paranoid is encrypted memory, having
polymorphical handler of single-step interrupt to encrypt only one instruction
beeing executed. In this way Dark Paranoid can't be caught in memory by
simple scan-string, it can't be caught in files once it is stealth. Or TMC,
that stands for Tiny Mutation Compiler (well, linker actually) is able
to permutates its own instructions placing them in random order, connecting
them with jumps and contitional jums and finally relocates all memory access
instructions and jumps. Scan string for it can't be choosed as it can
be broken after every single instruction. Moreover, in files it has only
permutator and linker stored with data used to constructuct and link whole
body (not a instructions) - and it takes really long even to emulation heuristic
to construct whole virus and to test it.
These are examples of non-traditional thinking. Find your own way, break the
limits of current point of view - this way you can efectively beat avers - that
they affraids most of all: they can't change principles of their scanners
every day. Think of it...
flush