Optimising ASM Code - By Dark-D0G

Introduction

I am a big fan of optimizing code to the smallest and most efficient possible way. Most people optimize their code for:

1* Code to be faster 2* Code to be smaller

These are the two main objectives for optimization. For virus writters number 2 is way more important than number one. Number 1 is more important than number 2 to software companies. But of course to software companies, since they use HLL is it very hard to optimize because of the useless *shit* that C/C++ compilers make. So please people switch to assembly. What gets me even more mad is HLL virii, some of which are 150 KB and infect simple PE files. W.T.F??!!

Anyway, lets get right to it. Here is the index of this manual:

1) Stupid mistakes/code 2) Optimization in general 3) Special occasions 4) Conclusion

1. Stupid mistakes/code

Jumps

Some people especially newbies make stupid mistakes with jumps. They would code something like this:

cmp eax, 040h
je EAX_EQUALS_40
jne EAX_DOES_NOT_EQUAL_40

Just put:

cmp eax, 040h
jne EAX_DONT_NOT_EQUAL_40
...you code is this compare is a true

Or the other way around. In other words watch your jumps, because the best way to optimize is to get rid of unneeded data/code or recode the procedure to be more efficient.

Another mistake people make is when when comparing a register with -1 (aka INVALID_HANDLE_VALUE). The best way to optimize this is (only if the value of eax does not need to be saved):

inc eax        ;if 0ffffffh before than now = 0 jz my_procedure        ;and zero flag will be set.

This can be applied to many different places. Just remember this: A lot of times when reg is modified in an instruction, if the result is zero the zero flag will be set. This should sometimes eliminate the need for CMP(s).

LEA Heaven

The LEA instruction is almost like paradise to us, because it can be used to calculate almost any mathematic integer problem. The LEA format can look like of the below (not all listed):

LEA EAX, [ECX] ;eax = ecx
LEA EAX, [ECX + EDX] ;eax = ecx + edx
LEA EAX, [ECX*2] ;eax = ecx*2
LEA EAX, [ECX*4] ;eax = ecx*4
LEA EAX, [ECX*8 + EAX + 040404040] ;eax = (ecx*8) + eax + 040404040h

So lets look at what some people might have coded:

SUB EAX, 040h
ADD EAX, EDX
SHL EAX, 1 ;multiply by 2

This is really a waste of bytes. Lets look how this can be coded more efficiently:

LEA EAX, [EDX - 40 + EAX*2]

Or newbies might have coded:

ADD EDX, ECX
MOV EAX, EDX

Optimized version:

LEA EAX, [EDX + ECX]

Another thing is that:

LEA EBX, [ECX*2], assembles to be LEA EBX, [ECX*2 + 00000000]

So a solution to this would be to:

XOR EAX, EAX
LEA EBX, [ECX*2]

You have to play with this, because there are many possibilities. When you add immediate values to this they will be 32 bits. So be thoughtful of this. Here is the complete combination possibilities of the LEA instruction:

LEA A,[B+C*INDEX+DISPLACEMENT]

Logical Operations

One of the stupedist things you can do is this here:

MOV EAX, 040h
SUB EAX, 06060h
SUB ECX, EAX

This can all be calculated by the assembler because there are no variables in this equation.

SUB ECX, (040h - 06060h)

So when making calculations be sure that you are not making stupid instructions which can be avoided. Instructions which have no variables in the equation, can be calculated before even running the program. Just as an example TASM (probably the same for MASM) can calculate even instructions such as these:

Number_1 equ (((025h * 04h) - 024h)/040h)*((040h/05h) + 040h)

So dont do your constant calucations in the code.

Procedures

Procedures are a good programming practice because you can make your program one piece at a time and you can keep you code very organized. Be careful though, because sometimes you will use a procedure only once. Is this happens you are wasting 6 bytes (the call and ret). Look over your code and see how many times each procedure is being called.

Offsets

When you are doing:

MOV EAX, CS:[00000000]

The offset will take up four extra bytes. I suggest you do this:

XOR EAX, EAX
MOV EAX, CS:[EAX]

You soon will notice that you can have 5 lines of code written in the source code and it will assemble to be smaller than 1 line in another part of the code.
You should not repeatly use the same or close offset when writting to it. Here is an example:

mov ecx, [esi + 040404040h]
xchg ecx, eax
add edx, [esi + 040404044h] ;+4h
mov ecx, edx
sub [esi + 040404040h], edx

In the above example too much using 32-bit offsets. A better way to write this set of instructions is:

lea edi, [esi + 040404040h]
mov ecx, [edi]
xchg ecx, eax
add ecx, [edi + 04h] ;only offset 8 bit
sub [edi], edx

2. Optimization in general

The fastest instructions are the instructions that correspond to the current system. In other words, using 32-bit instruction on a 32-bit machine is faster than using 16 or 8 bit instructions on the 32-bit machine.
Also EAX register as many times as possible because a lot of opcodes have a special 1 byte less instruction for eax register and using this register is faster than any other register.

When using memory references by regs alone try not to use the Base Register (EBP) because it takes an extra byte. Here is an example:

MOV EAX, [EDX] < MOV EAX, [EBP]

Because MOV EAX, [EBP] assembles to be MOV EAX, [EBP + 00].
The same story goes when coding two regs:

MOV EAX, [EBP + ECX] < MOV EAX, [ECX + EBP]

MOV EAX, [ECX + EBP] will assembly yo MOV EAX, [ECX + EBP + 00]
So in conlclusion to this problem, just try not to use EBP alone and if you are using it with another register for memory referencing but EBP first.

Using the stack pointer in instructions such as:

MOV EAX, [ESP]
MOV EAX, [ESP + 040404040h]

Will increase the instruction by one byte because of ESP.

Another big optimization issue is using LODSB/STOSB/SCASB/MOVSB instructions. These are used with EDI and ESI. So if you are going to be doing a lot of reading/writing to a place load a pointer to the memory location into EDI or ESI respectively to the purpose.

Also remember that using 16 bit registers in a 32 bit environment will increase each instruction by 1 byte.

Another little optimization trick is the fact that:

XOR ECX, ECX < XOR CL, CL

Immediates that are > -128 and < 127 have a three byte less opcode. Here is an example:

ADD EBX,128 ; 6 bytes
SUB EBX,-128 ; 3 bytes

So for the about example you can do:

ADD EBX, 0127h
INC EBX ;4

Another interesting optimization is the fact that:

INC EAX < INC AL

So if you have to increment AH, these is no way (With INCs) to do this, but if you want to increment al, you can inc eax.

Couple notes on speed

Speed is also kinda important. Well here are some very basic/little notes to make your virii (hopefully!!) be faster:

1) Dont use complex isntructions. ex: enter, leave, bound.

2) Pentiums are made to work with dwords. So that means using memory locations which are a multiple of 4.

3) Aviod AGI stalls. AGI stalls happen when you have a instruction which modifies a register and then that register is used as a memory pointer. In others words, the second instruction is dependant on the first. Example:

ADD ESI, 08H
MOV ECX, [ESI]

3. Special occations

1.) For: add ecx, 02h

Use: inc ecx
inc ecx ;saves one byte

2.) For: sub ecx, 03h

Use: dec ecx
dec ecx ;saves one byte

3.) For: mov ecx, 040h

Use: push 040h
pop ecx ;be careful with this one

4.) For: xor edx, edx ;if EAX = 0

Use: cdq

5.) For: lea eax, [ebx*2]

Use: lea eax, [ebx + ebx]

6.) For: xor eax, eax
mov al, 040h

Use: movzx eax, al

7.) For: cmp eax, 00h

Use: test eax, eax

8.) For: mov eax, offset [ebp + _Create_File_API_Address]
call eax

Use: call dword ptr [ebp + _Create_File_API_Address]

4. Conclusion

You know when i started coding i would code so much useless code. After I finished the project i would go through and try to optimize. This is how everyone starts. But after some time of coding i started to optimize as i went along.
The best way of coding in asm is thinking of the objective of your code and in the back of your mind always keep a thought on optimization. You reach this only after a lot of practice.
Well, good luck with whatever it is you are coding and remember you can always optimize your code, as a whole, more than it is at its current state.