Linux.Lacrimae 0.30 (x) 2007-2008 by herm1t
Earlier I wrote pretty large doc for this virus, descibing its work
step by step. Now I think that there is no point to examine each
function of the virus individually or the practices of working with
ELF headers. The code came to be big and sloppy, so I will try just to
tell you what virus is doing and why some features were implemented as
they were. Probably I would write the tutorial on linux viruses. Or I
will not. Anyway, if you cannot understand something, it will be
better for you to search for the answers not in forums or zines, but
in the sources and standards. There are everything you need: System V
ABI/386, TIS ELF, libc and kernel sources, man after all.
The basic principle was set before by Z0mbie in his "Methodology of
undetectable virus". In a few words: you should take the file to
pieces (headers, instructions, parts of data; when virus code should
be mixed with victim and all should be assembled back. It became
possible in Linux after introducing PIC-executables (or just PIE).
This feature was added about 2003, and was meant to enforce the
security of the system. The most "dangerous" programs build as PIE.
But you know, if something increases in one place, it decreases
elsewhere. Programs ran slowly, prelink trying to fix this, and in
turn complicating integrity checking. Crucially, relocations for data
appeared in the executables and the code became position-independent,
and this allows the virus to move the victims code as it wishes.
Disassembling
So. The virus found the file (search), loaded it into the memory
(load_elf), performed all neccessary checks before (elf_check, the ELF
type ET_DYN, the presence of section headers). Matched. Must be
infected. First of all we need to disassemble the code. Just feed the
disassembler with loaded section (at the beginning I used the XDE, but
then made my own and significantly different version of this engine
called YAD), and put the result into the list of structures (code_t).
For the fast access to the list make the array of pointers with the
number of entries equal to the section size. For example, if one needs
to test is where any instruction at the given offset, it's enough to
write:
if (fast[offset] != NULL)
...
instead of
for (c = code; c != NULL; c = c->next)
if (c->src - map == offset) {
...
break;
}
To make it possible afterwards to restore the alignment of particular
instructions, the virus comparing the code against standard paddings
generated by binutils. The victim disassembled from top downards with
less regard to jumps. If there are any data within code - that's just
tough, return with error. Though there is no convenient way to produce
such PIE, and all programs which using inline assembly and data within
code just cannot be build as PIE. In such a way distinction of code
and data is the last thing one should bother about. Moreover, the used
data structures doesn't perfectly fit with that task. The virus is
working with sections and there are at least four sections with code
(.init, .fini, .plt, .text) linked to each other by jumps.
We will fix the numerous headers and initialized pointers in the data
segment, we have all neccesary information in relocation table
(.rel.dyn). One must find all places in code which calculate the
addresses. So, some explanations about the ELFs internal structure are
of necessity.
Position Independent Executables
PIE is an ELF file with type ET_DYN and zero base address. Naturally,
it will be loaded at quite different address, and to make it possible
for the the program to run, the loader and the program itself will
take pains. The code has the read-only permissions and will not be
fixed by loader. If the function is not a leaf one (one which makes
calls) or accessing the global variables, the following scheme is
used:
130b: e8 23 00 00 00 call 1333
# 1310+405c=536c (.got.plt)
1310: 81 c3 5c 40 00 00 add $0x405c,%ebx
.......................................................
# eax = .got.plt - 16236 = 0x1400 (main)
1326: 8d 83 94 c0 ff ff lea 0xffffc094(%ebx),%eax
132c: 50 push %eax
132d: e8 b2 fd ff ff call 10e4 <__libc_start_main@plt>
.......................................................
1333: 8b 1c 24 mov (%esp),%ebx # (1)
1336: c3 ret
.......................................................
133c: e8 00 00 00 00 call 1341 # (2)
1341: 5b pop %ebx
1342: 81 c3 2b 40 00 00 add $0x402b,%ebx
All addresses are relative to the start of .got.plt section. If the
function does not call another functions, another register could be
used instead of ebx. To set the value of PIC-register the two
constructs (1) and (2) are used. The first in the CRT functions, added
by compiler, the second in the program itself.
Parsing
The virus traversing the instruction lists (mark), this time in the
order of execution (to prevent the situation when PIC-register got
used before initialization; it is not likely that gcc will produce
such code, but this feature will come into play later); fill the
"link" fields for the branch instructions (the pointer to the
destination of the branch), look for the external functions calls,
initializations of the PIC-register and instructions which using that
register. Mark the found instructions by following flags:
* FL_GOTPTR1 - call 0f / 0: pop reg / add reg, imm32 (set on add)
* FL_GOTPTR2 - call 0f / add reg, imm32 / ... / 0: mov reg, esp /
ret
* FL_GOTOFF - mod==2 && (rm==pic_reg|base == pic_reg|index ==
pic_reg)
* FL_EXTERN - call of external function (the symbol field is the
name of function)
* FL_SEEN - the code without this flag cannot be reached
If the instruction with FL_GOTOFF refer to the code, it's destination
will be tested also.
JMP-tables
As such all instructions that need to be fixed found. And all could be
just fine and simple except for jump tables. The array of addresses is
in the .rodata, the address of array calculated in the same way as any
other address, but one need to know not only the address of table, but
its size too. The most simple case of jump table looks as follows:
145e: 83 f8 34 cmp $0x34,%eax
1461: 77 35 ja 1498
1463: 8b 84 83 90 ea ff ff mov 0xffffea90(%ebx,%eax,4),%eax
146a: 01 d8 add %ebx,%eax
146c: ff e0 jmp *%eax
Virus should look for the cmp argument that is the size of table. If
the "jmp reg" instruction occuried, virus starts to look through the
code in the opposite direction for the table address (in this case it
is the "mov" insn, when the address was found, the virus memorizes
index register and look further for the "cmp index, imm". If this
command was found, the virus has all the things needed for parsing and
saving the contents of table for later fixing. The fatal error will
occur and the file will be left uninfected, if something missed or the
found addresses does not referrence the code.
It's not the end of misfortunes with jump-tables, but the beginning.
Table index may reside not in register but in the local variable:
3d60: 83 7d e8 04 cmpl $0x4,0xffffffe8(%ebp)
3d64: 0f 87 8b 03 00 00 ja 40f5
3d6a: 8b 55 e8 mov 0xffffffe8(%ebp),%edx
3d6d: 8b 84 93 30 de ff ff mov 0xffffde30(%ebx,%edx,4),%eax
3d74: 01 d8 add %ebx,%eax
3d76: ff e0 jmp *%eax
To work with such tables the virus trying to keep track the movings
between registers and local variables. Both for the code w/ or w/o
stack frame (variables referrenced through esp, not ebp) - in the
latter case virus has little to none chances. The jmp-table parsing
code is quite fragile, but still works, and if it will fail, it is
just as well possible to find another file, less complex. There are
another approaches to the problem, but they require more detailed
analysis of the code and used in decompilers.
The same procedures applied to the virus body itself during
disassembly. To be able to split the virus code and the victim, the
virus code marked with FL_ME flag. The virus must not use
uninitialized data in the .bss, jump-tables, relocations and pointers
to the code. Only pointers to data is allowed. There is nothing
fundamental in these restrictions, it is possible to add the missed
capabilities if you wish, but this requires additional code, and the
code is too big as it is.
Imports
Let's care now for the external functions the virus is needed to work
on the new place. We need to find all the calls with FL_EXTERN flag
and the corresponding functions within imports. If something missed -
extend the corresponding sections (extend_section) and add
(add_symbol): the name of function to .dynstr, symbol to .dynsym,
helper-code to .plt, reloc to .rel.plt, the address of helper to
.got.plt and finally the version to .gnu.version_r. Rebuild the .hash
(build_hash). It's all taped.
Integration, mutation, optimization
The virus now has two lists ready for the intermixture (insert_code).
The lists dissected on ret/jmp instructions to preserve the order of
execution and to prevent the virus from fallthrough to the victim and
vice versa and cut in to the single list (code_mixer). Piece of virus,
piece of victim, piece of virus... And so forth.
Since we have a single list, we could care about such chicken shit as
alignment. The virus discards the fillings, calculates the boundary on
which the insn is aligned, prevents jumps from ending on filling. The
alignment requirements saved in the "al" field.
Now you can do with the list whatever you want. Additionaly rearrange
the code blocks (M_BRO in config.h). Randomly invert the conditional
jumps (M_IJCC), mutate some instructions (MUTATE) - it just a sketch,
the code was taken from RPME and BI-PERM and one piece was written
after Lexotan.
It's time to think of how to compose all this to a working program.
Let's find out the length of jump commands (short/near) (OJMP in
config.h). Extend the data section to add there our own data later.
Add the call to virus to .init or virus entry to .ctors.
Assembly and linking
Let's build the file (build_elf). Copy the ELF header, table of
segments (PHT) and start to copy the sections which belongs to the
loadable segments, saving new addresses and offsets. Align the section
if neccessary. If section contains code - assemble it by copying each
command from the list (don't forget to align it) and save its new
address. If the section contains data just copy it. Fix the rest
entries (non PT_LOAD) in PHT. Copy not-loadable sections. Copy and fix
the sections table (SHT). Copy all stuff that was appended to the file
and does not belong to any section.
And now we must link and fix the resulting file. Fix .rel.dyn and
rebuild the .rel.plt and .got.plt sections. When virus added imports
it was impossible to do, since the addresses of the sections were not
known.
It's time to attend to the code. The virus knew the new address of
.got.plt (the base for address calculations in program). It is high
time. For each instruction in list (depending on flags): find new
addresses for the calls to PLT-helpers for the external functions
calls; find new offsets relative to .got.plt for the commands
referrencing data and reassemble 'em; fix the addresses of conditional
and uncoditional branches; fix the offsets from current position to
.got.plt (FL_GOTPTR?). Fix all addresses from jump-tables, that were
saved. Fix the symbol tables (.dynsym and optionally .symtab). I had
no guts to fix the debugging information. :-) Set the entry point.
Copy the virus data to the previously extended data section.
Initialize the variables in the data section. Encrypt the data. Set
the infection marker. Fix the .dynamic entries. All is in readiness.
Write off the file, free memory (cunningly, so the list with virus
code will be left untouched to avoid repetitive disassembly (freec).
And continue the victim search.
A few details
Let's discuss the same in greater detail. "Fix", "set", but how to
"fix"? Suppose, that we have some absolute address, that needs to be
fixed, the old and new section table and the code in the lists. First
of all map the address to the pair
(section_io_by_addr). If address belong to the section with data, then
new address = new address of section + offset, if it's code section,
then: fast[offset]->dsta (fix_addr). If we need to fix the address
relative to .got.plt, then: new rel. address = fix_addr(old_gotoff +
old rel. addr.) - new_gotoff, where old_gotoff and new_gotoff are old
and new addresses of .got.plt respectively.
Prelink
Once in a while, when KDE-application or something as big is loading
it will drag a lot of libraries with itself. The setup of program and
libraries even if the address resolution performed lazily is a
time-consuming task. Prelink is intended to solve it by designating
the base addresses to all programs and libraries in the system, bind
the external calls and saving the undo information. What should the
virus do? Just save the base address (address of segment with offset
0, variable "prelink") and convert all absolute addresses to relative,
i.e. RVA. All that one should do is just to substract or add the base
address. There are not too many such places in the virus.
It's also necessary to fix the prelink undo information (it just EHDR,
PHT and SHT as they were before prelinking, with zero base address and
without sections added by prelink) and fix the checksum. There is just
one (or more?) snag. If the infected file be checked with "prelink
-y", then prelink will report that "file differs". What's wrong? When
virus added imports, it wrote the addresses of PLT-helpers to
.got.plt, while prelink saves there the real addresses of functions.
You can either put up with that, or to enable PRELINK_PERFECT in
config.h.
It's dreary operation. To find the path to the library, the virus will
run the interpreter. Load library into memory and look for all
necessary addresses of the functions. The resolve procedure differs
from the once in RTLD, so on some files prelink will swear anyway. It
might be fixed, but implement the RTLD in virus is a obvious overkill.
The infected program will work anyway.
Restrictions on virus code
Now a few words about the restrictions. To avoid adding its own relocs
(for the virus entry, pointers to .dynsym, .dynstr, .rel.plt), the
virus regardless of whether the file was prelinked will save RVAs, and
when will find the base address (get_base) in the run-time and add it
to the values of variables. The initial values must be set by patch
utility, for the succeeding generation of the virus the values of
variables will be set by previous one.
The virus is working with only one section - .data. That is why you
cannot use pointers to functions (referrenes to .text), unitialized
variables (.bss) and jump-tables (.rodata). Otherwise, one could save
the section index for each command accessing memory and which sections
must be extended (or the section_io_by_addr will fail). Actually,
there is one such item in the virus (if EXTEND_CTOR is enabled).
The major changes since version 0.25.2 (described in "Crimea River")
* New disassembler YAD (instead of XDE)
* Partial support for jump-tables indexes in local variables
* Prelink support
* .gnu.version_r support
* Inverting conditional jumps
* Short/near jumps optimization
Conclusion
I suppose, that's all. The virus is not too fast, most probably it
contains bugs and memory leaks. The used method of infection allows
much more opportunities than implemented in the current version. But
this virus is just the first demo of integration on Linux, and one may
take care of the rest. :-)
herm1t@vx.netlux.org, june, 2008