Patrick Min
July 1992
An evaluation of different techniques for virus detection. The discussion is sufficiently general to be applicable to a substantial number of computing platforms. All mentioned practical issues concern the MS DOS operating system. Improvement of the operating system is presented as the most fundamental and therefore effective way to tackle the virus problem.
Published July 1992 by the Dutch National Criminal Intelligence Service (CRI), Computer Crime Unit, PO Box 20304, 2500 EH, The Hague, The Netherlands.
The phenomenon computer virus poses a threat to the reliability of automated systems. Considerable research effort has been put into the issue of how to detect and erase a virus. As a result, there are now several sophisticated anti-virus programs available. The most widely used detection method of these anti-virus programs is the so-called "signature scanning".
Recently, a new problem has arisen: for several reasons, the signature scanning detection method is rapidly becoming inadequate, and will eventually become obsolete. Therefore, alternative methods must be devised.
In this paper we will attempt to evaluate different approaches that currently are in an early stage of development compared to signature scanning. In the next chapter, some theoretical background is provided on virus detection in general. After that, signature scanning and its drawbacks will be examined. In chapter four to seven four main other anti-virus methods are discussed, namely heuristical scanning, integrity checking, monitoring and hardware protection. After treatment of some miscellaneous methods in chapter eight, chapter nine addresses the issue of operating system improvement, and investigates in what ways the previously mentioned methods fit into the operating system.
In this chapter we will try to discuss some theoretical implications with respect to virus detection. This includes defining a virus and discussing virus detection from a general point of view.
First, a definition of a computer virus, originally given by Cohen:
"A computer virus is a program that can 'infect' other programs by modifying them to include a, possibly evolved, copy of itself." [1]
As we are looking for a way to see if a specific program is a virus, we will have to examine the actions, or behaviour, of this program. Of course an expert can analyze the instructions of a specific program and prove it is a virus, but that is not the issue here. The actual question is if there exists a general method, an algorithm, that when given a program as input, outputs "yes" if this program is a virus, and "no" otherwise. In practice, one would want to implement this algorithm to produce a perfect virus scanner.
An informal reasoning about the detectability of viruses is given by A. Padgett Peterson, in which he uses the fact that the behaviour of viruses includes attempts to make changes:
"Computers are designed to run programs. Viruses are programs. e.g. Computers are designed to run viruses. - Old Adage. Viruses perform actions which SHOULDN'T occur during normal computer operation except in special cases. - My thought. Viruses cause changes - its in their charter. Therefore, viruses ARE detectable by looking for changes (or attempted changes) that should not occur." [2]
This does not solve our problem: in fact it has become the problem of how to determine that a change should not occur.
Cohen proved that the problem "Is a virus recognizable by its appearance?" is undecidable.[3] In a virus' appearance there will be attempts to make changes. The proof of Cohen does not imply that, after one sees a virus' appearance, one cannot conclude that it is a virus. The proof tells us there is no automated way, no algorithm, that is able to always come to such a conclusion. In other words, a perfect virus scanner is impossible.
A computer virus is a program, which means it consists of instructions and data. Certain parts of it are highly likely to be unique, meaning these will not be found in any other program. So when one would find such a specific part of a virus in a file, one can conclude it almost certainly contains the virus.
All instructions and data of a program are stored as numbers. Consequently the unique part of a virus used to identify it is a sequence of numbers. This identification sequence is called a signature. A signature scanner merely checks all the files on a disk for the presence of virus signatures.
The number of viruses that are discovered is growing at an ever increasing rate. In 1991 there were twenty times as many virus specimens available than in 1989. The number of reported infections increased by a factor of seven.[4]
Every new virus causes a signature to be added to the list of all known signatures. Hence a signature scanner will need more time after every signature list update.[5] Users will become less inclined to use a certain scanner if they will have to wait for it longer. In general, there is an obvious trade-off between the user-friendliness of a system and its level of protection.
In order for a virus to be detected by a signature scanner it must first have been discovered. Only then a signature can be determined and included in scanner updates.
One would want to discover a virus before it activates and, for instance, destroys a harddisk. There are several known cases of viruses that have not been discovered before their activation.[6]
The question arises whether or not there is a significant number of undiscovered viruses. It is our belief that as long as signature scanning is the most often used detection method, a well written virus has a considerable chance of remaining unnoticed until it activates.
Even after a virus has been discovered, and its signature has been included in scanning package updates, it will still continue to spread until everyone whose disks are infected with it has obtained an update. This requirement to obtain updates on a regular basis lays a burden on users: in order for the scanner to remain valuable, not a single update must be missed. Again, this reduces a system's user-friendliness.
Several particular viruses modify themselves in such a way that identification using a signature becomes more difficult or even impossible. This can be done for instance by encrypting the virus, and adding a different decryption routine each time it copies itself. Such viruses are called polymorphic, or mutating viruses.
In order to identify a polymorphic virus, it has to be analyzed, and a specific detection algorithm has to be written. This extra complicates the task of writing scanning software. Eighteen months after it was discovered, the until recently most sophisticated polymorphic virus could be detected in all its forms by about half of the available scanners.[7] This virus is evaluated as being orders of magnitude simpler than the scheme used by the "Mutation Engine", which will now be discussed.[8]
A Bulgarian virus writer has written a utility, called "mutation engine" (mte for short), which can be used to turn a virus into a polymorphic one. This virus can either be a newly written virus, or a modified existing virus. In the latter case, an existing virus would have to be substantially modified.
The utility consists of an object module that has to be linked to the virus, and contains a subroutine which the virus has to call each time before it copies itself. The subroutine encrypts the virus using a random key, and generates a randomly formatted decryption routine, which is prepended to the encrypted virus. The length of the resulting module is variable as well. The degree of mutation is much higher than anything like it seen before.
A method to detect mte-based viruses could involve measuring instruction frequencies and their contact frequency with other instructions. Before an algorithm can be written that reliably detects every possible mte-based infection, the mte code has to be thoroughly analyzed.
Anti-virus researchers agree that the engine itself is not the most significant, but the fact that it is widely available, and that it is fairly easy to use. Furthermore, according to Fridrik Skulason, a renowned anti-virus researcher, there are numerous ways to improve it. So in the worst scenario the computer world would be flooded with viruses utilizing various versions of the engine.
Instead of trying to detect every virus by means of a unique signature, a scanner could also attempt to find them by looking for operations that are characteristic of viruses. This kind of scanning is called heuristical scanning.
A heuristical scanner contains a set of rules describing possible virus operations. Examples of such rules are:
"Programs are suspicious if
- they start with a jump instruction to almost the end of the file
- they intercept an execute request for a certain process, and then open its program file in read/write mode"
At present, this method is not sufficiently reliable to be a real alternative: the main problems are false positives and false negatives.
A false positive occurs when the scanner reports suspicious code which is in fact not part of a virus. If a scanner produces many false positives, a user will tend to disbelieve its results.
A false negative occurs when the scanner determines a file to be "clean", if in fact it has been infected by a virus. At present, the heuristical scanning method is not capable of achieving the same detection rates as the currently best signature scanners.
And even if it does produce a true positive, the message given to the user is not as definite as a signature scanner would produce. The following messages were generated by the F-PROT anti-virus program (version 2.03a), the first using its heuristical scanner, the second using its signature scanner. Both messages apply to the same infected program.
If a user had just used the heuristical scanner, he would perhaps notify a system operator after reading the above message. If done so, the operator would not be able to identify or remove the virus: for that he would need a signature scanner.
Writers of heuristical scanners are maintaining a list of programs that cause false positives, to be able to recognize them as true negatives in the future. It is very important that for such programs the exact context is defined in which an operation is allowed. If for example the DOS FORMAT program simply would be allowed to format, a virus could infect it and legitimately contain code that formats the harddisk.
In our opinion the heuristical rulebase will become more comprehensive, and consequently heuristical scanners will achieve better detection rates. In fact, as of May 1992, heuristical scanners have an 80-95 % detection rate. It is our expectation that this rate will be over 95 % by the end of this year. However, it still does not eliminate the problem of false positives: at present, these rates are about 1-10 %.[9]
The main advantage of the heuristical scanning method is its ability to detect viruses that cannot be found by signature scanners. These include viruses using something like the mutation engine, still undiscovered viruses, and viruses that are yet to be written. So when the method has been improved, and especially the false positive rate is significantly reduced, it may serve as a method which is used first: after it would report a suspect program, a signature scanner would be used to see if the virus can be identified. If the signature scanner then fails to produce an identification, at least one can be quite sure that the suspicious program needs to be examined further.
Integrity checking utilities compute a virtually unique value for every file in a system. The first time this is done, these authentication values (as we will call them) are stored either in a separate file or in the checksummed files itself. Once in a while the checker is rerun, to see if any changes were made to a file. The checker recalculates the values for every file, and compares them to the stored values.
Such values can be calculated using a method called the cyclic redundancy check (CRC). This method ensures a sufficiently reliable check to see whether or not a file has been changed: when using a 32 bits CRC value, the odds are one to over four milliard that a change remains unnoticed, i.e. that the CRC value is the same for a file before and after it has been changed.
Another way to construct an authentication value for a file is to use a cryptographic checksum. Such a checksum is created by a complex algorithm, using as input a secret number and the file that has to be checksummed.
There are viruses which attempt to circumvent integrity checkers. They reside in memory and detect when a checker (or any other program) opens an executable file in read mode. Before such a read is allowed, the virus first disinfects the file. After the integrity of the file is confirmed, the file is again infected. This type of virus is called a "stealth virus".
So before an integrity check is run, one has to be absolutely sure there exists a secure link between disk and memory: e.g. when requesting a specific part of a disk to be read, the exact contents of that part should be stored in memory after the read. A way to almost ensure this is to boot a system with a write-protected, original system disk. Assuming the files on this disk are not infected, no stealth viruses will be in memory after the boot. However, the system disk could have been infected, either at the manufacturer or during a time when for some reason it was not write-protected. So this strategy does not provide perfect assurance, but something very close to it.
After a reliable link between disk and memory is established, a virus could still have tampered with a file in such a way that its authentication value remains the same, or it could have recalculated it. For example, if a virus writer had access to the polynomial that was used to calculate a CRC value, this would be fairly easy. Especially values that are stored inside the files themselves are vulnerable.[10] A solution could be to generate a random polynomial when the anti-virus software is installed. In general, a virus should have as less as possible access to any additional input the integrity checker uses.
With respect to the strength of integrity checking, Vesselin Bontchev wrote in VIRUS-L:
"Besides, it is not correct that a virus can be written to evade any given detection scheme it knows about. A well implemented integrity check should be able to resist to any virus, even if the author [of a virus, PM] has the full source code of the program. This is why integrity checkers are stronger than monitors and scanners in the first place! They can be bypassed only by one kind of attack, but this is an attack against the integrity checkers in general and does not require the virus author to know how exactly a particular integrity checker is implemented."[11]
Using a reliable integrity checker, that always informs the user when a change has taken place, does not guarantee that no virus will ever again infect one's system. There are viruses that infect an executable exactly at the time when it is being changed. Although the integrity checker would report the change, the user obviously would permit it. This is the kind of attack Bontchev is referring to in the above quote.
A monitor program is invoked during startup, and stays resident in memory. It monitors interrupts that usually are called by viruses, and asks the user to confirm any action that could have originated from a virus.
A difficulty here is how to determine which actions are virus-like and which are not. Just as with heuristical scanning, this causes many false positives. Thus a monitor should also maintain a list of known false positives and be careful to precisely store what a program is allowed to do and what it is not.
The user is alerted if a monitor detects a virus-like action. The monitor then offers the possibility to abort the action, or to permit it. Due to the often technical nature of these actions, only a skilled user can make a responsible decision as to whether or not to allow the action.[12]
Several existing viruses attack specific anti-virus products, in an attempt to disable or evade them. Of all discussed methods, monitoring is the weakest in that it is the most vulnerable to such attacks.[13] A virus can detect the presence of a monitoring program and for instance refrain from any virus-like actions while it is active. In general, because there is no hardware supported memory protection under MS DOS, a virus can always manipulate a monitor. A virus can also bypass a monitor completely by directly calling the ROM BIOS: the monitor just intercepts interrupt calls.
Hardware virus protection usually comes as an add-on card which is plugged into a free expansion slot inside the computer. An anti-virus program, stored in ROM on the card, is invoked at startup, before the system is allowed to boot.
The primary advantage of using hardware protection is that one can be certain that the code on the card is executed before any possibly infected boot sector or program. So the results from any scanning or integrity checking done cannot be tampered with. Software protection always appears after the boot process, i.e. the execution of the boot sector, the loading of system files, device drivers, etcetera.
Additional benefits of hardware protection are the fact that its program will always be executed, at every startup, that the code resides in ROM and thus cannot be modified, and the facility to include password protection.
However, the monitoring program on the hardware card faces the same problem as the software monitor or heuristical scanner: how to determine what actions could have originated from a virus, in other words what is virus-like behaviour. With respect to this issue, Bontchev wrote in VIRUS-L:
"There are lots of ways in which such cards can be bypassed. The key thought here is that in order not to be too obtrusive, the card must try to decide what is "legitimate" access to the disk and stop only access that it thinks is non-legitimate. Unfortunately, this problem is undecidable and a virus can always mask its access as legitimate."[14]
Just as a virus can bypass a software monitor by directly calling the BIOS, it can also bypass a hardware card by directly calling it:
"A simpler idea is to call the program on the card directly at the point at which it decides to pass the control to the disk... In order to prevent such attacks, ThunderByte has a different variant of the program with every different example of the card... Unfortunately, this does not completely prevent the virus from looking for some characteristic patterns..."[14]
The program on the hardware card resides in the address space of a PC. This means it can be called at any location. At some stage in the checking program it is decided that a disk request is allowed. If a virus would call the program at exactly that point, its disk access would falsely be authorized.
"It can't [prevent a virus from directly calling the card, PM]. Unless it has its own CPU and its program is completely unreachable from the address space of the PC. But this will make it quite expensive."
"The current versions of such cards are no more expensive than an expensive anti-virus program, but they are not much more secure. A more secure version is possible, but it will be much more expensive."[14]
Files may be marked to signify that they should not be modified or deleted. If this is implemented in software, the mark can be removed in software. The program managing the write protection is faced with the problem whether or not to allow a request to remove a mark. This is the same kind of problem as the ones presented to a monitoring program.
Many viruses mark the files they have infected, because they only need to infect them once. Also, memory-resident viruses obviously check if they are not already resident: these are the so-called "are you there?" calls. The idea behind vaccination is to put as many "infected" marks on all files as possible, and to implement as many "yes" responses to "are you there?" calls as possible. This to make many viruses believe they have already infected a system, which in fact they have not.
This method has several drawbacks. First, not all viruses check their presence in such an easy way. Second, several viruses have conflicting presence tests. And third, this kind of protection is very easily circumvented. As an example, consider a memory-resident virus checking its presence by calling a DOS interrupt, using an unused function number. If for example the number 100 is returned in a certain register, the virus assumes it is already memory-resident. A vaccination program for this virus would do exactly the same thing. A way to circumvent it would for instance just involve returning another number (e.g. 101) as the number indicating presence, in future versions of the virus.
An anti-virus program using bait programs creates several dummy executables throughout the file system, attempts to execute them, and then compares the executables with the ones originally created. If any change occured, it is likely due to a virus.
Drawbacks of this method include the fact that not all viruses spread when executables are created or executed, and the requirement of a secure link between disk and memory, just as when using integrity checkers. The method can provide some detection support to a package using other strategies as well. It is a kind of heuristical method, applied in a relatively safe environment.
From reading the previous chapters, one can conclude that viruses in fact exploit the weaknesses of an operating system. Anti-virus software tries to neutralize these weaknesses as much as possible. But, to restate an essential flaw:
"The basic problem with most anti-virus defenses is that they are layered on top of the operating system while according to industry reports most infections occur before the operating system is loaded."[15]
In the latest popular operating systems (MS DOS 5.0, OS/2 2.0 and DR DOS 6.0) not even the most elementary provisions for data protection are implemented (except for a password protection scheme in DR DOS which can easily be bypassed).
An operating system provides an interface between the user and the computer's resources, together with a library of interface and supporting functions for the applications programmer.
Naturally a user wants to be protected from misuse. This can be another user, or a program misusing the system's resources, either intentionally (abuse) or unintentionally (errors). Viruses are programs intentionally misusing the system.
In Operating System Concepts, the authors name as a motivation for operating system protection:
"the need to ensure that each program component active in a system uses system resources only in ways consistent with the stated policies for the uses of these resources." [16]
For example, a compiler invoked by a process is only allowed access to certain files, usually source files that need to be compiled. Conversely, private files of the compiler cannot be accessed by the invoking process. An active virus infected process usually does not use resources in ways consistent with stated policies for its usage.
To be able to pursue usage policies the operating system should provide general mechanisms for applications programmers, that will allow them to implement safer programs. The operating system should also use these mechanisms to ensure its own integrity. This means that resource access should become controlled: under MS DOS _all_ access is uncontrolled.
To enforce usage policies the operating system has to administrate which resources every process is allowed to use. For example, a text editor is allowed to read and write text files, and read any additional files private to the editor, but is not allowed to format a disk. Obviously the mechanism used to store each process' privileges has to be carefully protected from virus manipulations. Also a facility to analyze and log the resource requirements of new software should be provided.
In fact every file that is of importance with respect to the functioning of the operating system must be protected: thorough, robust integrity checking should be applied to all system files. Especially important are all the files needed to boot an operating system.
If the operating system contains well implemented integrity checking utilities, then they could be relied upon to check the integrity of applications as well. Every application could use a different variant of the same method. The exact method used can be determined during installation, which would mean a virus could never rely on a certain application using a certain integrity checking method. This scheme can also be employed when installing the operating system itself. The procedure could even include installing a random "mutation" of the operating system, comparable to viruses using the mutation engine.
If hardware supported memory protection is available, each executing process can be assigned its own, restricted memory area. This means read and/or write access to other areas is disabled. In hardware, priviliged instructions can be implemented. The set of instructions an application is allowed to execute would be a subset of the operating system's set. Finally, hardware could enforce the write protection of important files.
If the current operating systems were revised to include considerably more protection facilities, it would at least require much more effort to write a virus. As a result less new viruses would be written. The viruses that still appear will be larger to be able to circumvent the extra protection measures, and hence easier to detect.
An essential advantage of such a revision is that all current viruses, and their derivations, will become disabled. Virus writers will have to start from scratch, and will face a much harder task. Unfortunately, application programmers will also have to substantially rewrite existing software.
Instead of providing downward compatibility for previous operating systems in a new one, applications running on previous operating systems should be rewritten. This can be facilitated by providing at least the functions of the previous system, all of course well documented and optimized to meet the new performance and security requirements.
The operating system is the only software package that has to be used. So when improved, the operating system actually becomes an anti-virus tool the user _always_ installs.
In this paper we have looked at various methods for virus detection. It has become clear that each has its advantages and disadvantages: as it stands each method on its own is not capable of providing complete virus protection. Therefore a successful protection scheme would combine several methods into one, multi-layered package. Each layer of the operating system would have its own protection method.
With respect to today's operating systems the following setup can be considered mandatory:
Bontchev gives two reasons why this is not a setup that is expected to gain widespread usage:
"Unfortunately, it seems to be that the scanners will be with us at least for the next two years... It's still easier to write and maintain a good scanner (which simply does not try to detect anything it can't with wildcard strings) than a good integrity checker... And the polymorphic viruses are not that many right now. They are just a trend - a dangerous trend - and I am trying to tell all those who produce or rely on known-virus scanning that they should look a bit more far than their noses..."[17]
Nowadays it seems as if virus and anti-virus writers are battling over who will gain the most control over the operating system. In our opinion the problem should be tackled in a more fundamental way, i.e. the operating system itself must change. No downward compatibility should be provided, as to completely cut off every current virus. Although the transfer to another operating system will be harder at first, users will benefit in the long run from the increased level of protection.
We do not expect such an operating system to become standard, however. The performance of anti-virus software, downward compatability of operating systems, and user-friendliness of both are all issues of great commercial value. As long as potentially commercial benefits and information concerning virus detection and protection cause conflicts of interest, one cannot expect the average computer user to be objectively informed, and consequently demand better protection methods from software and hardware manufacturers. It is a consumer demand that really would induce them to improve their protection standards.
+----------------------+----------------------+------------+ | <- Program part 1 -> | <- Program part 2 -> | <-Virus -> | +----------------------+----------------------+------------+
It must then overwrite the old CRC with the new one. I won't explain how (I don't want to give any virus-writers any ideas), but with the right technique the CRC can be found, recalculated, and rewritten in under 30 seconds.", Kevin Dean, "Stealth Bomber" version 2.02 documentation file, 1992.
I would like to thank the following proof readers for their valuable comments:
Murphy's first law on computer viruses: If a computer virus can be written, it will be written.
Murphy's second law on computer viruses: If a computer virus just cannot be written, it will be written anyway. It will just take a little bit longer.
[Back to index] [Comments (0)]