Mega Code Archive

 
Categories / Delphi / Strings
 

Inline Assembler in Delphi (II) ANSI strings

Title: Inline Assembler in Delphi (II) - ANSI strings Question: In this chapter we'll learn a few more assembler instructions, and the basics of working with ANSI strings, also called long strings. Answer: Inline Assembler in Delphi (II) ANSI strings By Ernesto De Spirito edspirito@latiumsoftware.com In this chapter we will learn a few more assembler instructions, and the basics of working with ANSI strings, also called long strings. New opcodes These are the opcodes introduced in this article: JL (Jump if Lower): The correct description would take long to be explained, so let's just say that JL jumps (goes) to the specified label if in the previous CMP (or SUB) operation, the first operand is less than the second operand in a signed comparison: // if signed(op1) cmp op1, op2 jl @@label JG (Jump if Greater), JLE (Jump if Lower or Equal), and JGE (Jump if Greater or Equal) complete the family of conditional jumps for signed comparisons. JA (Jump if Above): jumps (goes) to the specified label if in the previous CMP (or SUB) operation, the first operand was greater than the second one, being both operands considered as unsigned values: // if unsigned(op1) unsigned(op2) then goto @@label; cmp op1, op2 ja @@label JB (Jump if Below), JBE (Jump if Below or Equal), and JAE (Jump if Above or Equal) complete the family of conditional jumps for unsigned comparisons. LOOP: Decrements ECX, and if not zero, it jumps to the specified label. LOOP @@label is a shorter and faster equivalent to: dec ecx // ECX := ECX - 1; jnz @@label // if ECX 0 then goto @@label Example: xor eax, eax // EAX := EAX xor EAX; // EAX := 0; mov ecx, 5 // ECX := 5; @@label: add eax, ecx // EAX := EAX + ECX; // Executed 5 times loop @@label // Dec(ECX); if ECX 0 then goto @@label; // EAX would value 15 (5+4+3+2+1) Working with ANSI strings A string variable is represented by a 32 bit pointer. If the string is the empty string (''), then the pointer is Nil (zero), otherwise it points to the first character of the string. The length of the string, and the reference count are two integers at negative offsets from the position of the first byte: +-----------+ | s: string |-------------------+ +-----------+ | V --+-----------+-----------+-----------+---+---+---+---+---+---+---+-- | allocSiz | refCnt | length | H | e | l | l | o | ! | #0| --+-----------+-----------+-----------+---+---+---+---+---+---+---+-- (longint) (longint) (longint) \-----------------v-----------------/ StrRec record const skew = sizeof(StrRec); // 12 When we pass a string as a parameter to a function, what is passed is just the 32-bit pointer. Strings as return values are more difficult to explain. The caller of a function returning a string must pass --as an invisible last parameter of PString type-- the address of the string variable that will hold the result of the function. d := Uppercase(s); // Internally converted to: Uppercase(s, @d); If the result of the function will be used in an expression rather than assigned directly to a variable, the caller must use a temporary variable initialized to Nil (the empty string). The compiler does that for us in the Object Pascal code, but we have to do it by ourselves if we call string returning functions from assembler code. For some tasks, we can't call the classic string functions directly. For example, the function Length isn't the name of a real function. It's a construct built-in into the compiler, and the compiler generates code to call the appropriate function, depending on whether the parameter is a string or a dynamic array. In assembler, instead of Length, we have to call the function _LStrLen (declared in the System unit) to get the string length. There are more things we should know about strings, but we have enough for a first example. Assembler version of Uppercase This is the declaration of the function: function AsmUpperCase(const s: string): string; The parameter "s" will be passed in EAX, and the address of the "Result" will be passed as a second parameter, i.e., in EDX. Basically, this function should: Get the length of the string to convert Allocate memory for the result string Copy the characters, converting them to uppercase 1) Get the length of the string to convert We'll do this by calling System.@LStrLen. The function expects the string in EAX (we already have it there), and the result will be placed in EAX, so we have to save the value of EAX (the parameter "s") somewhere before calling the function to avoid losing it. We can save it in a local variable "src". Since functions are free to use EAX, ECX and EDX, we should assume the value of EDX ("@Result") could also be destroyed after calling System.@LStrLen, so we should first save it, for example in a local variable "psrc". The result of System.@LStrLen, left in EAX, will be used as a parameter for System.@LStrSetLength (to allocate memory for the content of the result string), and then we need it to count the bytes to be copied, so we also have to save it, for example in a variable "n": var pdst: Pointer; // Address of the result string src: PChar; // Source string n: Integer; // String length asm // The address of the result string is passed in EDX. // We save it in a local variable (pdst): mov pdst, edx // pdst := EDX; // Save EAX (s) in a local variable (src) mov src, eax // src := EAX; // n := Length(s); call System.@LStrLen // EAX := _LStrLen(EAX); mov n, eax // n := EAX; 2) Allocate memory for the result string This is accomplished by calling System.@LStrSetLength. The procedure expects two parameters: the address of the string (we saved it in "pdst"), and the length of the string (we have it in EAX). // SetLength(pdst^, n); // Allocates result string mov edx, eax // EDX := n; // Second parameter for LStrSetLength mov eax, pdst // EAX := pdst; // First parameter for LStrSetLength call System.@LStrSetLength // _LStrSetLength(EAX, EDX); 3) Copy the characters, converting them to uppercase If the length of the string was zero, we are done: // if n = 0 then exit; mov ecx, n // ECX := n; test ecx, ecx // And ECX with ECX to set flags (ECX unchanged) jz @@end // Go to @@end if the zero flag is set (ECX=0) Otherwise, we should copy the characters from one string to the other, converting them to uppercase as needed. We are going to use ESI and EDX for pointing the characters of the source string and the result string respectively, AL to load a character from the source string and perform the change before storing it in the destination string, and ECX with the LOOP instruction to count the characters. Since ESI is a register we must preserve, we have to save its value to restore them later. I decided to save ESI pushing it on the stack. push esi // Save ESI on the stack // Initialize ESI and EDX mov eax, pdst // EAX := pdst; // Address of the result string mov esi, src // ESI := src; // Source string mov edx, [eax] // EDX := pdst^; // Result string @@cycle: mov al, [esi] // AL := ESI^; // if Shortint(AL) cmp al, 'a' jl @@nochange // AL in ['a'..#127] // if Byte(AL) Byte(Ord('a')) then goto @@nochange cmp al, 'z' ja @@nochange // AL in ['a'..'z'] sub al, 'a'-'A' // Dec(AL, Ord('a')-Ord('A')); @@nochange: mov [edx], al // EDX^ := AL; inc esi // Inc(ESI); inc edx // Inc(EDX); loop @@cycle // Dec(ECX); if ECX 0 then goto cycle pop esi // Restore ESI from the stack @@end: end; NOTE: The article applies only to single-byte character strings (SBCS). Previous: Inline Assembler in Delphi (I) - Introduction Next: Inline Assembler in Delphi (III) - Static Arrays