Leonard Adleman
Advances in Cryptology - CRYPT0 '88, LNCS 403, pp. 354-374, 1990
1990
Research supported by NSF through grant CCR 8519296
In recent years the detection of computer viruses has become common place. It appears that for the most part these viruses have been 'benign' or only mildly destructive. However, whether or not computer viruses have the potential to cause major and prolonged disruptions of computing environments is an open question.
Such basic questions as:
have been at most partially addressed [Col][Co2]1. Indeed a generally accepted definition of computer virus has yet to emerge.
For these reasons, a rigorous study of computer viruses seems appropriate.
For the purpose of motivating the definitions which follow, consider this (fabricated) `case study':
A text editor becomes infected with a computer virus. Each time the text editor is used, it performs the text editing tasks as it did prior to infection, but it also searches the files for a program and infects it. When Tun, each of these newly infected programs performs its `intended' tasks as before, but also searches the files for a program and infects it. This process continues. As these infected programs pass between systems, as when they are sold,OTgiven to others, new opportunities for spreading the virus are created. Finally, after Jan. 1, 1990, the infected programs cease acting as before. Now,each time such a program is run, it deletes all files.
Such a computer virus can easily be created using a program scheme (in an ad hoc language) similar to that found in [Col]:
{main:= call injure; ... call submain; ... call infect; ... } {injure:= if condition then whatever damage is to be done and halt } {infect:= if condition then infect files }
where for the `case study virus':
{main:= call injure; call submain; call infect; } {injure:= if date ≥ Jan. 1,1990 then while files ≠ 0: file = get-random-file; delete file; halt; } {infect:= if true then file = get-random-executable-file; rename main routine submain; prepend self to file; }
By modifying the scheme above, a wide variety of viruses can be created. Even `helpful' viruses may be created. For example the following minor variant of Cohen's [Col] compression virus which saves storage space:
{main:= call injure; decompress compressed part of program; call submain; call infect; } {injure:= if false then halt } {infect:= if executable-files ≠ 0 then file = get-random-executable-file; rename main routine submain; compress file; prepend self to file; }
With the `case study virus' and all of those which could be created by the scheme above, it appears that the following properties are relevant:
A formal definition of computer virus is presented next.
Notation 1
Definition 1 For all partial , for all either:
Definition 2 For all , for all , for all partial functions iff:
Definition 3 For all partial , for all iff .
Definition 4 For all partial , for all iff .
Definition 5 For all Gödel numberings of the partial recursive functions , a total recursive function is a virus with respect to iff for all , either:
Remark 1 The choice of symbols above is intended to suggest the decomposition of all `accessible'information into `data' (information not susceptible to infection) and `programs'(infomation susceptible to infection).
In this section the set of viruses is decomposed into the disjoint union of four principal types. The nature of so called `Trojan horses' is considered.
Definition 6 For all Gödel numberings of the partial recursive functions , for all viruses with respect to , for all :
When there exists a unique such that (e.g. when is injective) then if is pathogenic (contagious, benignant, a Trojan horse, a carrier, virulent) with respect to and , the reference to will be dropped and will be said to be pathogenic (contagious, benignant, a Trojan horse, a carrier, virulent) with respect to .
Hence, if with respect to some virus an infected program is benignant, then it computes the same function as its uninfected predecessor. If it is a Trojan horse then it is incapable of infecting other programs. It can only imitate or injure, and under the right conditions it will do the latter. If it is a carrier, it is incapable of causing injury but under the right conditions it will infect other programs.
Definition 7 For all Gödel numberings of the partial recursive functions , for all viruses with respect to :
The next theorem records some simple facts about types of viruses.
Theorem 1 For all Gödel numberings of the partial recursive functions for all viruses with respect to :
Proof
Part 1 follows immediately from the recursion theorem.
All other parts follow immediately from the definitions.
Thus, all programs infected by a benign virus are benignant with respect to their uninfected predecessors. They function just as if they had never been infected. Viruses in this class appear to be the least threatening. This class includes many `degenerate' viruses such as the identity function and `padding' functions.
Programs infected by an Epeian virus can only be benignant or Trojan horses with respect to their uninfected predecessors. Further the latter option must sometimes occur. Epeian viruses will not be able to spread themselves; however, an infected program may imitate the `intended' task of its uninfected predecessor until some `trigger' causes it to do damage. Among the Epeian viruses are the `degenerate' class of constant functions, which never imitate-or-infect but only injure.
Programs infected by a disseminating viruses can only be benignant or carriers with respect to their uninfected predecessors. Further the latter option must sometimes occur. Thus programs infected with such viruses are never pathogenic. However, it is worth noting that disseminating viruses may modify the size of programs or their complexity characteristics, and by this means become detectable or cause harm (or benefit as in the case of the compression virus). In fact, size and complexity may be important properties when considering viruses. An extension of the current theory to account for size and complexity seems appropriate (see § further research).
Malicious viruses can both spread and produce injuries. They appear to be the most threatening kind of virus. The `case study virus' in § basic definitions is malicious.
Remark 2 It may be appropriate to view contagiousness as a necessary property of computer viruses. With this perspective, it would be reasonable to define the set of viruses as the union of the set of disseminating viruses and the set malicious viruses, and to exclude benign and Epeian viruses altogether.
The question of detecting viruses is addressed in the next theorem:
Theorem 2 For all Gödel numberings of the partial recursive functions :
Proof
Let . It is well known (§13 and §14 [Ro]) that is .
To establish that , let (for examplelet be an index for the identity function) and consider the function such that for all :
Then is a partial recursive function. Let be an index for , and let , be such that:
where is as in the theorem [Ro].
Then is a total recursive function and:
It follows that
Thus . It follows, as in §7.2 [Ro], that as desired.
To establish that , consider the following formula for which arises directly from the definition of virus:
Where is a `step counting' predicate for such that:
And where is a predicate for such that:
Since for all acceptable Gödel numberings of the partial recursive functions it is easily seen that there exist recursive predicates and as above, it follows that .
Thus detecting viruses is quite intractable, and it seems unlikely that protection systems predicated on virus detection will be successful.
As noted in [Co1] isolating a computing environment from its surroundings is a powerful method of protecting it from viruses. For example, if no new programs can be introduced, no old programs can be updated, and no communication can occur, then it seems viruses are no threat.
Unfortunatly, such isolation is unrealistic in many computing environments. The next theorems explore the possibility of protecting computing environments with less severe forms of isolation.
Definition 8 For all Gödel numberings of the partial recursive functions , for all viruses with respect to , let:
The infected set of
Definition 9 For all Gödel numberings of the partial recursive functions , for all viruses with respect to , is absolutely isolable iff is decidable.
Clearly if a virus is absolutely isolable, then (at least in theory) it can be neutralized. Whenever a program becomes infected, it is detected and removed. The following is a simple fact about absolutely isolable viruses:
Theorem 3 For all Gödel numberings of the partial recursive functions , for all viruses with respect to if for all then is absolutely isolable.
Proof trivial.
Thus the case study virus, as implemented using the scheme in §basic definitions would be absolutely isolable. In fact, what little experience with viruses there is to date seems to suggest that in practice people who produce viruses begin by producing ones with the increasing property necessary for theorem 3 to apply. Unfortunately, not all viruses have this property. For example, with any reasonable compression scheme, the compression virus of §basic definitions would not have this property. Nonetheless, the compression virus is absolutely isolable. Given a program with the proper syntax, it is in the infected set if and only if decompressing the compressed part results in a legitimate program.
Is every virus absolutely isolable?
Regretably, the next theorem shows that the answer is no.
Theorem 4 For all Gödel nurnberings of the partial recursive functions , there exists a total recursive function such that:
Proof
Let be a total recursive function such that:
Let be a 1 - 1 total recursive function such that for all :
Such a function, known as a padding function, exists by Proposition 3.4.5 [MY].
Let be such that:
where is the least natural number such that, for all with .
Then is a monotonically increasing total recursive function and by (1), it follows that:
Let be such that for all :
Then since is monotonically increasing, it follows that is a total recursive function.
Consider the function such that for all and :
where for all , denotes the one element sequence in consisting only of .
Then by standard arguments, is a total recursive function and:
Let be such that for all :
Then is a total recursive function and it follows from (2) and (3) that:
Applying the s-m-n theorem there exists a total recursive function such that for all :
By the recursion theorem, there exists an such that for all :
Let . Then is a total recursive function since is.
Let , then using that fact that is a total recursive function and applying (4) gives:
1 of the theorem now follows directly from the definition of malicious virus.
Since, for all total recursive functions is recursively enumerable, it follows that .
Let be such that for all . Since is 1 - 1 so is . Then implies the existence of a such that . Let , then:
On the other hand, assume and . Then there exists an such that:
Since is 1 - 1, it follows that . . Hence, and 2 of the theorem holds.
Thus, for the viruses described in the previous theorem, protection cannot be based upon deciding whether a particular program is infected or not. Paradoxically, despite this, it is often possible to defend against such viruses. How such a defense could be mounted will be described below; however, a few definitions are in order first.
Definition 10 For all Gödel numberings of the partial recursive functions , for all viruses with respect to , let:
The germ set of
Thus the germs of a virus are functionally the same as infected programs, but are syntactically different. They can infect programs, but cannot result from infection. They may start `epidemics', but are never propagated with them.
Definition 11 For all Gödel numberings of the partial recursive functions , for all virzlses with respect to , is isolable within its germ set iff there exists an such that:
Notice that if a virus is isolable within its germ set by a decidable set , then not allowing programs in the set to be written to storage or to be communicated will stop the virus from infecting. Further, the isolation of some uninfected germs by this process appears to be an added benefit.
Returning now to the viruses described in the previous theorem: assume that the function above had the property that for all . The proof of the previous theorem could easily have been modified to assure this. Further, in Godel numberings derived in the usually fashion from natural programming languages, a constructed in a straightforward manner would have this property. Consider the set
By the monotonically increasing property of and the property of which is being assumed, is decidable. On the other hand if then there exists such that
And it follows that . On the other hand if then there exist an such that
By (2) and (4):
And hence as desired.
Thus viruses like the ones in theorem 4 demonstrate that decidability of is sufficient but not necessary for neutralization. Apparently, more work needs to be done before a clear idea of the value of isolation will emerge. Are all viruses isolable within their germ set? The answer is no (proof omitted). Are all disseminating viruses isolable within their germ set? The answer is not known. Are there notions of isolation which provide significant protection at a reasonable cost?
The study of computer viruses is embryonic. Since so little is known, virtually any idea seems worth exploring.
Listed below are a few avenues for further investigation.
Introduce complexity theory and program size theory into the study of computer viruses. As noted earlier, even disseminating viruses may affect the complexity characteristics and size of infected programs and as a result become detectable or harmful.
Complexity theory and program size considerations can be introduced at a abstract level (see for example [MY]) or a concrete level.
For example, viruses in the `real world' would probably have the property that the running time of an infected program, at least while imitating or infecting, would be at most polynomial (linear) in the running time of its uninfected precursor. Does this class of `polynomid (linear) viruses' pose a less serious threat? Do NP-completeness considerations, or cryptographic considerations come into play?
In this paper one form of protection mechanism, isolation, was briefly considered. In addition to considering isoIation in greater depth, numerous other possibilities exist. For example:
The notion of computer viruses presented here is not the only one possible. It was selected because it seemed to be art adequate place to begin an investigation. More general, and more restrictive notions are possible. Indeed it seems possible that no definition will conform to everyone's intuitions about `computer viruses'.
More `machine dependent' approaches could be considered. Approaches which take into account the communications channels over which viruses pass seem particularly important.
One interesting generalization of the current notion is inspired by [Co1], where viruses are assumed to be capable of evolving. The `Mutating Viruses' (-viruses) partially defined next are an attempt to capture this property.
Definition 12 For all , for all , for all sets of partial functions from to , iff:
Definition 14 For all sets of partial functions from to , for all partial , for all iff .
Definition 15 For all Gödel numberings of the partial recursive functions a set of total recursivefunctions is a mutating virus, -virus, with respect to iff both:
Some computer viruses which have recently caused problems (e.g. the so called `Scores virus' [Up] which attacked Macintosh computers) are -viruses and not just viruses. Hence this generalization of the notion of virus may be of more than theoretical interest.
This is only a partial definition because some notion of `connectivity' is needed. That is, the union of two -viruses, neither of which `evolves' into the other should not be a -virus. Many definitions of `connectivity' can be defined, but further study will be required to choose those which are most appropriate. Once an appropriate choice is made, an important question will be whether the set of infected indices of a -virus can be harder to detect than those of a virus.
This issue has evolved during joint work with K. Kompells.
There appear to be programs which can reproduce or reproduce and injure but which are not viruses (e.g. programs which just make copies of themselves but never `infect'). These `computer organisms' may be a serious security problem.
It may be appropriate to study `computer organisms' and treat `computer viruses' as special case.
I would like to thank Dean Jacobs, and Gary Miller for contributing their ideas to this paper.
I would also like to thank two of my students: Fred Cohen and Kireeti Kompella. Cohen brought the threat of computer viruses to my (and everyone's) attention. Kompella has spent many hours reviewing this work and has made numerous suggestions which have improved it.
1 It appears that F. Cohen is the first researcher in an academic setting to consider the practical and theoretical aspects of computer viruses. The formalism presented here differs considerably from that explored by Cohen [Col][Co2].
2Now rhift your theme, and ring that wooden horse
Epeior built, inrpired by Athena -
the ambuscade Odysseus filled with fighters
and rent to take the inner town of troy
The Odyssey of Homer, 8.492-495.
translation by Robert Fitzgerald
Doubleday & Co.,NY, 1961