Mega Code Archive

Inline Assembler in Delphi (I) Introduction

Title: Inline Assembler in Delphi (I) - Introduction Question: This article simply intends to introduce you to the world of inline assembler in Delphi. It will barely give you an idea, and it doesn't intend to explain all the details of assembler programming, that will probably require an entire book... Answer: Inline Assembler in Delphi (I) Introduction By Ernesto De Spirito edspirito@latiumsoftware.com This article simply intends to introduce you to the world of inline assembler in Delphi. It will barely give you an idea, and it doesn't intend to explain all the details of assembler programming, that will probably require an entire book... Why and when If you take a look at the source code of the RTL and the VCL, you'll see inline assembler statements in various places. Why would Borland code parts of the RTL and the VCL in assembler? The answer is quite simple: to achieve execution speed. We know the compiler produces fast code, but a compiler can never be better than a professional assembler programmer. Now, if assembler is so good, why isn't the entire RTL and VCL coded in assembler? The answer is also simple: because a higher level programming language is easier to code, debug, read, and maintain, making it worth to sacrifice some speed for this convenience. This would help explain when assembler code should be used. To put it bluntly, apart from low-level system access, inline assembler is used when the difference in speed justifies the bother of coding in assembler. For example, in the unit Math.pas there is a lot of assembler, mainly for low-level system access (specifically, to access coprocessor features), and you'll see many assembler blocks, this time for speed, in system.pas, sysutils.pas, and classes.pas, which isn't strange since they can be considered the core units of the RTL and the VCL. In general, procedures and functions that are very likely to be called quite often in a program, should be highly optimized, but coding in assembler should be avoided to the maximum extent possible. If we want speed, before considering coding in assembler we should first improve the algorithm, and then optimize our Pascal code. If we decide to go for assembler, the optimized Pascal code would be useful for documentation purposes, and it might act as some sort of "backup code" in case in the future we have problems in maintaining the assembler source code. The CPU registers CPU registers are like predefined variables, but they reside in the CPU, and sometimes they have special purposes. They don't have a type, but you can consider them as signed or unsigned 32-bit integers, or as pointers, depending on the case. Since registers are located in the CPU, it is faster to access values stored in the registers than values stored in memory, so registers are used very much to cache values. Like variables, registers have names. The names of the registers we will use most are EAX, EBX, ECX, EDX, ESI, EDI, EBP and ESP. Each register has its own particularities: For some instructions, the CPU is optimized for using the EAX register (also known as "accumulator"), or at least the opcodes are smaller. EAX is used in multiplications and divisions (low order 32 bits of the operand and the result), string instructions, port I/O instructions, ASCII adjust and decimal adjust instructions, and in some special instructions (like CDQ, LAHF, SAHF, XLAT). EBX is a general-purpose register, and is implicitly used by XLAT. ECX (also known as "counter") has a special use with the LOOP instruction, bit rotating and shifting instructions, and string instructions. EDX is used in multiplications and divisions (high order 32 bits of the result of a multiplication, or high order 32 bits of both the dividend and remainder of a division) and in some other special instructions (like CDQ). ESI and EDI (known as the "source index" and "destination index" respectively) are like pointers used by string instructions to identify the source and destination of data, respectively. EBP (known as the "base pointer") is usually used for addressing values in the stack (parameters and local variables). ESP (known as the "stack pointer") is used to control the stack. It is modified automatically by instructions like PUSH, POP, CALL and RET, but can be modified by code and it can even be used as a general purpose register (as long as we preserve it). The registers EBX, ESI, EDI, EBP, and ESP should be preserved in inline assembler blocks, meaning that before using these registers we should save their values somewhere (usually the stack or another register), and when we are finished we should restore their original values. Since this saving and restoring implies instructions and therefore time, we'll only use these registers when the difference justifies it or when they are inevitably needed. Probably you noticed all register names start with the letter "E". It stands for "Extended". In the old Intel 80286, registers had 16 bits and had names like AX, BX, CX, etc. These registers still exist, and they are the least significant 16-bits (the "low words" or "low order words") of EAX, EBX, ECX, etc., respectively. By the way, the registers AX, BX, CX and DX are divided into two 8-bit registers. AL, BL, CL and DL are the low order bytes of AX, BX, CX and DX respectively, while AH, BH, CH and DH are the high order bytes of AX, BX, CX and DX respectively. For example, if the value of EAX is $7AFD503C, then the value of AX is $503C, the value of AH is $50 and the value of AL is $3C: 7A FD 50 3C AH AL /----/ AX /------------/ EAX If for example we store $99 in AH, then the value of EAX would be $7AFD993C. There is a special register, the Flags Register, which holds binary flags set by the mathematical and logical instructions, or explicitly by code, and they are used by conditional jump instructions. The Carry Flag is also used in some bit rotating instructions, and the Direction Flag is used in string instructions. This register isn't addressable by name like the other registers, but it can be saved and restored from the stack using the PUSHF and POPF instructions respectively, and it can also be partially get in the AH register using the LAHF instruction, and restored from AH using the SAHF instruction. Assembler instructions Assembler instructions are enclosed in asm..end blocks, and have the form [label:] [prexix] opcode [operand1 [, operand2 [, ...]]] The opcode is the name of the instruction, like MOV, ADD, PUSH, etc. Instructions can be separated by semicolons, line breaks or comments. By the way, comments should be in Object Pascal style, i.e. the semicolon is not the start of a comment till the end of line as it is in ordinary assembler. The following, is an example of an asm..end block mixing different kinds of comments and instruction separations: asm xchg ebx, edx; add eax, [ebx]; {semicolons separate statements} // line breaks separate statements mov ebx, p sub eax, [ebx] (*comments separate statements*) mov ebx, edx end; The convention is to use line breaks for separation, like this: asm xchg ebx, edx add eax, [ebx] mov ebx, p sub eax, [ebx] mov ebx, edx end; In the source code of the VCL you'll see that opcodes and register names are written in capital letters, and instructions are indented eight characters, but we won't use this convention here. Asm..end blocks can occur anywhere in the source code where a Pascal sentence can occur, and we can write 100% assembler procedures and functions if use "asm" instead of "begin", like this: procedure test; asm // assembler statements end; Notice that these two implementations are not the same: function f(parameters): type; begin asm // assembler statements end; end; function f(parameters): type; asm // assembler statements end; The reason is that the compiler performs certain optimizations when we implement procedures and functions completely in assembler (without using the begin..end block). Labels should be declared with a Label section, as in any other Object Pascal code, unless they are prefixed by "@@": function IsMagicNumber(x: integer): boolean; asm cmp eax, MagicNumber je @@Bingo xor eax, eax ret @@Bingo: mov eax, 1 end; Labels prefixed by "@@" are local to the inline assembler block in which they are used. This will generate a compiler error: begin .... asm .... @@destination: .... end; .... asm .... jnz @@destination // Error .... end; .... end; To correct it, we need to use a conventional label, local to the procedure or function: label destination; begin .... asm .... destination: .... end; .... asm .... jnz destination // Right .... end; .... end; Operands Sometimes an operand or operands are implicit. For example, the instruction CDQ (Convert Dword to Qword) apparently takes no operands, but it works with EDX and EAX (it extends the most significant bit of EAX, the "sign" bit, into EDX, so EDX:EAX will represent the integer in EAX converted to Int64, where EAX will hold the 32 least significant bits, and EDX will hold the 32 most significant bits). For most instructions, operands can be registers. For example mov eax, ecx copies the value of ECX in EAX. Many operands can be immediate values. For example: mov eax, 5 mov eax, 2 + 3 // Constant expression, resolved at compile time mov al, 'A' // The ASCII code of 'A' is $41 (65) mov eax, 'ABC' // Equivalent to MOV EAX, $00414243 Many operands can be memory references. For example: mov [ebx], eax // EBX^ := EAX; Memory references can have many forms: mov eax, [$000FFFC] // Absolute address mov eax, [ebx] // Register mov eax, [ebp-12] // Register plus/minus constant offset mov eax, [ebp+ebx] // Register plus offset in a register mov eax, [ebp+ebx+8] // Register plus offset in a register // plus/minus constant offset mov eax, [ebp+ebx*4] // Register plus offset in a register // multiplied by a constant mov eax, [ebp+ebx*4+8] // Register plus offset in a register // multiplied by a constant, // plus/minus constant offset The use of Pascal identifiers is translated to one of the forms above: mov eax, parameter // mov eax, [ebp + constant_offset] mov eax, localvar // mov eax, [ebp - constant_offset] mov eax, globalvar // mov eax, [absolute_address] call procname // call absolute_address First example We are ready to learn some opcodes with a couple of examples. We can begin with a simple function: function f(x: integer; y: integer): integer; // f(x,y) = (-x-y+5)*7 { begin Result := (-x - y + 5) * 7; end; } asm // Parameters are passed in EAX (x) and EDX (y); neg eax // EAX := -EAX; // EAX = -x sub eax, edx // EAX := EAX - EDX; // EAX = -x-y add eax, 5 // EAX := EAX + 5; // EAX = -x-y+5 imul 7 // EAX := EAX * 7; // EAX = (-x-y+5)*7 end; The first three parameters (left to right) are passed in EAX, EDX and ECX. For methods, the first parameter is Self (passed in EAX), and the first parameter explicitly declared is in fact the second parameter (passed in EDX), and the second explicit parameter is actually the third parameter (passed in ECX). The return value should be placed in EAX for 32-bit ordinal values (AX and AL should be used to return 16-bit and 8-bit ordinal values respectively). The comments explain the opcodes quite well, but for IMUL we have to add two things: IMUL considers the operands (EAX and 7 in the example) as signed integers (we should use MUL when the operands are unsigned) The result of the multiplication is a 64-bit integer (the most significant 32 bits of the result are placed in EDX). Multiplications are quite expensive in terms of CPU time, and sometimes it is faster to substitute them with bit shifting (when we multiply or divide by a power of two), additions and subtractions. For example: a * 7 = a * (8 - 1) = a * 8 - a = a * 2^3 - a a * 7 = a shl 3 - a Instead if IMUL 7, we can do the following: mov ecx, eax // ECX := EAX; // ECX = -x-y+5 shl eax, 3 // EAX := EAX shl 3; // EAX = (-x-y+5)*8 sub eax, ecx // EAX := EAX - ECX; // EAX = (-x-y+5)*8 - (-x-y+5) // EAX = (-x-y+5)*7 Let's see another example: function remainder(x: integer; y: integer): integer; // Returns the remainder of x divided by y { begin Result := x mod y; end; } asm // Parameters are passed in EAX (x) and EDX (y); mov ecx, edx // ECX := EDX; // EDX = y cdq // EDX:EAX := Int64(EAX); // EAX = x idiv ecx // 32-bit signed integer division: // EAX := Int64(EDX:EAX) div integer(ECX); // EDX := Int64(EDX:EAX) mod integer(ECX); mov eax, edx // Result := EDX; // remainder end; The stack When a program is loaded, it gets assigned a stack, which is a memory region used as a LIFO (Last In, First Out) structure, controlled by the ESP register, which points to the stack top. ESP starts pointing to the end of the region, so every time we push a 32-bit value, the ESP register gets subtracted 4 (bytes), and the value is stored in the location pointed by ESP. | | +-----------+ | | +-----------+ | $01234567 | +-----------+ | | PUSH $89ABCDEF // SUB ESP,4; MOV [ESP],$89ABCDEF | | +-----------+ | $89ABCDEF | +-----------+ | $01234567 | +-----------+ | | Conversely, when we pop a 32-bit value, the value is retrieved from the location pointed by ESP, and ESP is added 4 (bytes). POP EAX // MOV EAX,[ESP]; ADD ESP,4 | | +-----------+ +-----------+ | $89ABCDEF | EAX | $89ABCDEF | +-----------+ +-----------+ | $01234567 | +-----------+ | | The stack is used to store the return address of procedures and functions, parameters, local variables and intermediate results. In the following example we use the stack to save the value of a register for later use: function IntDiv(x: integer; y: integer; r: pinteger = NIL): integer; // Returns the integer quotient of x / y, and the remainder in r { begin Result := x div y; if r NIL then r^ := x mod y; end; } asm // Parameters are passed in EAX (x), EDX (y) and ECX (r) push ecx // Save ECX (r) for later use mov ecx, edx // ECX := EDX; // ECX = y cdq // EDX:EAX := Int64(EAX); // EAX = x idiv ecx // 32-bit signed integer division: // EAX := Int64(EDX:EAX) div integer(ECX); // EDX := Int64(EDX:EAX) mod integer(ECX); pop ecx // Restores ECX (ECX := r) cmp ecx, 0 // if ECX = NIL then jz @@end // goto @@end; mov [ecx], edx // ECX^ := EDX; // remainder @@end: // local label (preceded by "@@") end; Notice that for every PUSH we make we have to perform a POP, so the value of ESP is left unchanged (ESP is one of the registers we have to preserve). The CMP instruction subtract the second operand from the first (ECX-0 in this case), like the SUB instruction, but the result is not stored anywhere, although the Zero flag will be set (turned on) or cleared (turned off) depending on whether the result is zero or not, as it happens with all mathematical and logical instructions (except in certain cases). We can take advantage of this fact, and instead of writing cmp ecx, 0 we can write or ecx, ecx // ECX := ECX or ECX; The result of ECX Or ECX is ECX itself, so the value stored in ECX is the same it had before the operation, but -like we said above- the Zero flag will be set if the result is zero (i.e., if ECX was zero). We use OR instead of CMP because OR operates on two registers, just taking 2 bytes to code, while CMP operates a register with an immediate 8-bit value, taking three bytes to code, but CMP doesn't actually write to the destination (ECX in this case) like OR does, which is sometimes important when writing code optimized for Pentium. TEST ECX, ECX is usually preferred because it combines the best of both worlds (two bytes to code and doesn't write to the register: just performs a bitwise AND operation to set the flags based on the result, which is discarded). JZ (Jump if Zero), goes (jumps) to the label indicated as operand if the Zero flag is set (on), or continues with the normal execution flow it the Zero flag is cleared (off). Passing parameters on the stack Let's go back to the stack. We said the first three parameters of a function were passed on EAX, EDX and ECX, but what if we have or more parameters? Additional parameters are passed on the stack, left to right, so the last parameter, will be the first on the stack. Let's suppose we have a function function Sum(a, b, c, d, e: integer): integer; begin Result := a + b + c + d + e; end; and that we want to make the call Sum(1,2,3,4,5); In assembler, it would be like this: mov eax, 1 mov edx, 2 mov ecx, 3 push 4 push 5 call Sum The CALL instruction pushes the return address on the stack and jumps to (starts executing) the function. The RET (RETurn) instruction (generated by the compiler when the end of the function is reached) pops this address from the stack and jumps to it to continue the execution from that point. Notice that we pushed the parameters on the stack, but didn't pop them. This is because, except in the CDECL calling convention, cleaning up the parameters is responsibility of the called function, not the caller. To clean up the parameters, the RET instruction is used with an operand indicating the number of bytes ESP should be incremented, 8 in this case (ESP was decremented 4 bytes per parameter when they were pushed). The compiler takes care of this, so we don't have to bother about it, but if you see the CPU debug window and you wonder what's that RET $08, now you know what it is. Upon entry to Sum, the stack would in theory like this: | | +-----------+ | Ret_Addr | +-----------+ | $00000005 | (parameter e) +-----------+ | $00000004 | (parameter d) +-----------+ | | When a function has parameters in the stack (or local variables), the compiler generates a few instructions that are called a "stack frame". Upon entry to the function (in the "asm"), EBP is pushed on the stack (to preserve it) and ESP is assigned to it, and before leaving the function (in the "end;"), the original value of EBP is popped from the stack: function Sum(a, b, c, d, e: integer): integer; asm // push ebp; mov ebp, esp; .... end; // pop ebp; ret 8; Thus, when we enter Sum, the stack would actually look like this: | | +-----------+ | Orig. EBP | +-----------+ | Ret_Addr | +-----------+ | $00000005 | +-----------+ | $00000004 | +-----------+ | | At [EBP] we find the original value of EBP that was pushed on the stack to preserve it when building the stack frame, at [EBP+4] we find the return address of the procedure, and at [EBP+8] we have the last parameter (the last parameter is pushed last, and therefore it's the first on the stack). The next parameter (right to left) is at [EBP+12] and so on if we had more parameters. Now let's write the function Sum in assembler: function Sum(a, b, c, d, e: integer): integer; { begin Result := a + b + c + d + e; end; } asm add eax, b add eax, c add eax, d add eax, e end; Notice that in the asm..end block we used "b", "c", "d" and "e" instead of "EDX", "ECX", "[EBP+12]" and "[EBP+8]" respectively. We can do that because the compiler will make the appropriate substitutions. Local variables on the stack If our function has local variables, in complete inline assembler functions the compiler will make space for them in the stack moving the stack pointer, so the stack frame for a function with two integer local variables would look like this: push ebp mov ebp, esp sub esp, 8 // Moves ESP as if we pushed 8 bytes ... add esp, 8 // Moves ESP as if we popped 8 bytes pop ebp For the purpose of the example, here is a variant of the Sum function introduced above, but using two local variables: function SumL(a, b, c, d, e: integer): integer; var f, g: integer; { begin f := b + c; g := d + e; Result := a + f + g; end; } asm // push ebp; mov ebp, esp; sub esp, 8; add edx, ecx mov f, edx // b + c mov edx, d add edx, e mov g, edx // d + e add eax, f add eax, g end; // add esp, 8; pop ebp; ret 8 Within this function, the stack would look like this: | | +-----------+ | var. g | +-----------+ | var. f | +-----------+ | Orig. EBP | +-----------+ | Ret_Addr | +-----------+ | Param e | +-----------+ | Param d | +-----------+ | | Next: Inline Assembler in Delphi (II) - ANSI strings