Mega Code Archive

Inline Assembler in Delphi (VII) 128 bit integer arithmetic

Title: Inline Assembler in Delphi (VII) - 128-bit integer arithmetic Question: This article shows the use of inline assembler to wotk with 128-bit integers. Answer: Inline Assembler in Delphi (VII) 128-bit integer arithmetic By Ernesto De Spirito edspirito@latiumsoftware.com Part 1: Introduction With 32 bits we can represent 2^32 different numbers, i.e. 4294967296 (~4 billion) different numbers, like signed integers from -2147483648 to +2147483647 or unsigned integers from 0 to 4294967295 (types Longint and Longword respectively). That's enough for many purposes, like for example holding a position of a byte within a 4GB file, but sometimes we need more than that, and there we have TLargeInteger (Windows unit) and Int64 (since Delphi 4) to represent 64-bit integers that can have 2^64 different values, i.e. 18446744073709551616 (~18 sixtillons) values, from -9223372036854775808 to +9223372036854775807 (~9 sixtillons, 17-18 decimal digits). That number of digits is really more than enough for me, and right now I really can't figure any practical use for more than that. Hey, not even Bill Gates counts his money in sixtillons! ;) But from time to time I see someone in a forum asking for more digits than what the Int64 offers... Anyway, whether useful or completely useless for a practical purpose, we'll see the implementation of many procedures and functions designed to work with 128-bit integers, that will serve for the purpose of showing examples of the basic assembler instructions. These "large integers", "big integers" or "huge integers" can hold 2^128 different values (38-39 decimal digits). Representation of the huge integer I called the new type Hugeint, but for example Bigint (big integer) or Int128 could have been good names. Largeint (large integer) could be confused with the type in the Windows.pas unit which refers to a 64-bit integer. When it comes to the representation of the new type, there also many ways to do it. I decided the most simple is representing it as an array of four 32-bit integers: type Hugeint: packed array [0..3] of longword; I also decided to use the little-endian format since it's the standard in the Intel architecture, and this means that the first element of the array (lowest address) will hold the low-order (least-significant) 32 bits of the large integer, and the last element of the array (highest address) will hold the high-order (most-significant) 32 bits of the large integer. This is how the numbers 5 and 5000000000 ($12A05F200) would be represented: +---- Low-order 32 bits | v +-------------+-------------+-------------+-------------+ | $00000005 | $00000000 | $00000000 | $00000000 | = 5 +-------------+-------------+-------------+-------------+ 0 1 2 3 +-------------+-------------+-------------+-------------+ | $2A05F200 | $00000001 | $00000000 | $00000000 | = 5000000000 +-------------+-------------+-------------+-------------+ $12A05F200 ^ | High-order 32 bits ----+ Integers themselves are also stored in little-endian format (low-order byte first). If we see the byte representation of the numbers in a memory dump, it would look like this (byte values are represented in hexadecimal notation): $00000005 +-------------+-------------+-------------+-------------+ | 05 00 00 00 | 00 00 00 00 | 00 00 00 00 | 00 00 00 00 | = 5 +-------------+-------------+-------------+-------------+ 0 1 2 3 +-------------+-------------+-------------+-------------+ | 00 F2 05 2A | 01 00 00 00 | 00 00 00 00 | 00 00 00 00 | = 5000000000 +-------------+-------------+-------------+-------------+ $12A05F200 $2A05F200 $00000001 However, for almost all operations we can make abstraction of the byte order and consider the 32-bit integers as atomic units, since the byte order is handled transparently. A few useful instructions Before we begin, let's see some useful instructions that we might use in this article (mainly in the continuation of this part), but first allow me to say that it isn't the purpose of this article to actually teach you assembler. All I can do in this limited space is just showing you examples of some instructions. For reference material, I recommend you these links: Intel 80386 Reference Programmer's Manual An HTML version of this Intel manual. The pseudo-code helps explain the instructions and their effects on the flags. Excellent. http://people.freebsd.org/~jhb/386htm/toc.htm There are some broken links, but the pages are there. Try finding them in the directory index (http://people.freebsd.org/~jhb/386htm/). iAPx86 - Norton Guide Not as much explicative as the above document, but contains all the instructions from 8086 to Pentium and Pentium Pro, with size and timing information not included in the above document. http://www.clipx.net/ng/iapx86/index.php The IA-32 Intel Architecture Software Developer's Manual, Volume 2: Instruction Set Reference PDF Manual describing the instructions for the IA-32 processors (Pentium, Pentium Pro, Pentium II, Pentium III, Pentium 4, and Xeon). Includes pseudo-code to explain the instructions and how they affect the flags in the flags register. http://www.intel.com/design/pentium4/manuals/245471.htm Optimization How to optimize for the Pentium family of microprocessors Excellent optimization guide written by Agner Fog http://fatphil.org/x86/pentopt/index.html Optimizations for Intel's 32-Bit Processors Another excellent optimization guide. http://x86.ddj.com/ftp/manuals/686/optimgd.pdf OK, now let's get to the instructions. Reference: Z/ZF: Zero Flag S/SF: Sign Flag C/CF: Carry Flag P/PF: Parity Flag A/AF: Auxiliary Flag s: sign bit (high-order bit) o: odd bit (low-order bit) x: bit value 0: namely, the value 0 1: namely, the value 1 r: bit is reversed from previous value u: bit is unchanged from previous value XX: unknown value (register, immediate, or memory reference) In the examples it should be assumed the value of AL previous to each operation is sxxxxxxo (sign bit, 6 unknown bits, and odd bit). Here are some instructions to begin: SHL al,1 AL := xxxxxxo0 CF := s Shift left SAL al,1 AL := xxxxxxo0 CF := s Synonym for SHL SHR al,1 AL := 0sxxxxxx CF := o Shift right SAR al,1 AL := ssxxxxxx CF := o Shift arithmetic right SAR al,7 AL := ssssssss CF := x This extends the sign bit ROL al,1 AL := xxxxxxos CF := s Rotate left ROR al,1 AL := osxxxxxx CF := o Rotate right RCL al,1 AL := xxxxxxoC CF := s Rotate thru carry left RCR al,1 AL := Csxxxxxx CF := o Rotate thru carry right AND al,al AL := uuuuuuuu CF := 0 Sets flags (see below) AND al,-1 AL := uuuuuuuu CF := 0 -1 = $FF = 1111111 Sets flags (see below) AND al,$01 AL := 0000000u CF := 0 $01 = 00000001 AND al,$80 AL := u0000000 CF := 0 $80 = 10000000 AND al,$5A AL := 0u0uu0u0 CF := 0 $5A = 01011010 AND al,0 AL := 00000000 CF := 0 XOR AL,AL or MOV AL,0 are better TEST AL,XX AL := uuuuuuuu TEST is like AND, but the result doesn't get stored in the destination. The result is used to set flags (see below). TEST AL,-1 It's usually better than AND AL,-1 and OR AL,AL because it doesn't write to AL, which allows certain optimizations in some cases. OR al,al AL := uuuuuuuu CF := 0 Sets flags (see below) OR al,$01 AL := uuuuuuu1 CF := 0 $01 = 00000001 OR al,$80 AL := 1uuuuuuu CF := 0 $80 = 10000000 OR al,$5A AL := u1u11u1u CF := 0 $5A = 01011010 OR al,-1 AL := 11111111 CF := 0 Same as MOV AL,1 XOR al,al AL := 0 CF := 0 Use MOV AL,0 to preserve flags XOR al,$5A AL := ururruru CF := 0 $5A = 01011010 XOR al,-1 AL := rrrrrrrr CF := 0 Same as NOT AL Except for the rotation instructions (ROL, RCL, ROR, and RCR) all of the above set SF, ZF and PF based on the result of the operation: SF = value of the high-order bit of the result ZF = 1 ("set") if the result is zero, 0 ("cleared") otherwise PF = 1 ("set") if the low-order byte of the result contains an even number of 1 bits, 0 ("cleared") otherwise Let's see more instructions: STC CF := 1 Set Carry Flag CLC CF := 0 Clear Carry Flag CMC CF := r Complement Carry Flag LAHF AH := SZxAxPxC SAHF Assuming AH is SZxAxPxC: ZF := S; ZF := Z; AF := A; PF := P; CF := C SETc AL AL := CF Set if carry SETs AL AL := SF Set if sign SETz AL AL := ZF Set if zero SETe AL AL := ZF Set if equal (synonym of SETZ) SETp AL AL := PF Set if parity SETpe AL AL := PF Set if parity even (synonym of SETP) SETo AL AL := OF Set if overflow SETnc AL AL := NOT CF Set if not carry SETns AL AL := NOT SF Set if not sign SETnz AL AL := NOT ZF Set if not zero SETne AL AL := NOT ZF Set if not equal (synonym of SETNZ) SETnp AL AL := NOT PF Set if not parity SETpo AL AL := NOT PF Set if parity odd (synonym of SETNP) SETno AL AL := NOT OF Set if not overflow SETa (or SETNbe), SETae (or SETnb), SETb (or SETnae), SETbe (SETna), SETg (or SETNle), SETge (or SETnl), SETl (or SETnge), and SETle (SETng) set the destination byte to 1 or 0 depending on whether the specified condition is met or not. ADD AL,XX AL := AL+XX CF := 1 if operation generated a carry 0 otherwise SUB AL,XX AL := AL-XX CF := 1 if operation needed a borrow 0 otherwise SUB AL,0 AL := uuuuuuuu Set flags based on AL SUB AL,AL AL := 0 Same as XOR AL,AL or MOV AL,0 CMP AL,XX CMP is like SUB, but the result doesn't get stored in the destination. The operation simply set the flags ADC AL,XX AL := AL+XX+C CF := 1 if operation generated a carry 0 otherwise SBB AL,XX AL := AL-C-XX CF := 1 if operation needed a borrow 0 otherwise NEG AL AL := -AL CF := 1 if previous AL 0 NOT AL; INC AL is the same NOT AL AL := rrrrrrrr CF := u XOR AL,-1 is the same Conversion functions These functions will help us understand the representation of these huge integers. Longword to Hugeint Let's start by converting a Longword into a huge integer. The lowest 32 bits of the result will be the 32 bits of the parameter and the higher 96 bits will be zero. function UToHugeint(const x: Longword): Hugeint; overload; // Result := Hugeint(x); // Parameters: EAX = x; EDX = @Result; asm xor ecx, ecx // ECX := 0; mov [edx+_0_], eax // Result[0] := x; mov [edx+_1_], ecx // Result[1] := 0; mov [edx+_2_], ecx // Result[2] := 0; mov [edx+_3_], ecx // Result[3] := 0; end; Comments: * "_0_", "_1_", "_2_", and "_3_"? What are these? They are constants that represent the offsets of the four elements of the array, allowing us to write cleaner code. const _0_ = 0; _1_ = 4; _2_ = 8; _3_ = 12; Longint to Hugeint The lowest 32 bits of the result will be the 32 bits of the parameter. If the number is positive or zero, then the higher 96 bits will be 0, but if the number is negative, the higher 96 bits will be 1. It might seem like we need to make a comparison or test the sign and then to perform a conditional jump based on the result: function ToHugeint(const x: Longint): Hugeint; overload; // Result := Hugeint(x); // Parameters: EAX = x; EDX = @Result; asm or eax, eax // EAX := EAX or EAX; // EAX remains unchanged // Side effect: SF (Sign Flag) := EAX mov ecx, 0 // ECX := 0; jns @@not_negative // if not SF then goto @@not_negative; dec ecx // ECX := ECX - 1; // 0 - 1 = -1 = $FFFFFFFF @@not_negative: mov [edx+_0_], eax // Result[0] := x; mov [edx+_1_], ecx // Result[1] := ECX; // 0 or $FFFFFFFF mov [edx+_2_], ecx // Result[2] := ECX; // 0 or $FFFFFFFF mov [edx+_3_], ecx // Result[3] := ECX; // 0 or $FFFFFFFF end; Comments: Notice the use of "MOV ECX, 0" instead of "XOR ECX, ECX" to avoid changing the state of the Sign Flag (SF) set in the preceding instruction (OR) and then used in the conditional jump that appears in the following instruction (JNS). Of course we could have changed the order of the operations for this to be unnecessary... Instead of: or eax, eax jns @@not_negative the following pairs of instructions would achieve the same: * and eax, eax // EAX keeps the value, but SF gets the sign jns @@not_negative // if SF = 0 then goto @@not_negative * test eax, $80000000 // result will be zero only if sign bit is 0 jz @@not_negative // if ZF then goto @@not_negative * test eax, $87654321 // any value with bit 31 (sign bit) set jns @@not_negative // if SF = 0 then goto @@not_negative * cmp eax, 0 // compares eax with 0 jge @@not_negative // if greater or equal then goto @@not_negative Notice the use of "DEC ECX" to turn the value of ECX from $00000000 to $FFFFFFFF (by decrementing the value of the register). "NOT ECX" would have accomplished the same thing (by inverting the bits), at the same speed, and taking the same number of bytes to code the instruction, but NOT isn't a pairable instruction while DEC is. For this reason NOT is usually avoided and substituted as follows: If you know beforehand that the previous value is 0, use DEC Dest If you know beforehand that the previous value is 1, use INC Dest If you don't know what the previous value is, use XOR Dest, -1 Also notice in the order of the instructions that we never used a register that was set in the immediately previous instruction. This is one of the conditions for pairing to occur. You'll find more information about instruction pairing in the documents about optimization that we recommended above. We can simplify the function thanks to the CDQ instruction which extends the sign of EAX into EDX. This is basically how CDQ works: if EAX = 0 then EDX := $0 else EDX := $FFFFFFFF; Here's a smaller and simpler implementation using CDQ: function ToHugeint(const x: Longint): Hugeint; overload; // Result := Hugeint(x); // Parameters: EAX = x; EDX = @Result; asm mov ecx, edx // ECX := @Result; cdq // EDX := IIF(x=0, 0, $FFFFFFFF); mov [ecx+_0_], eax // Result[0] := x; mov [ecx+_1_], edx // Result[1] := EDX; // 0 or $FFFFFFFF mov [ecx+_2_], edx // Result[2] := EDX; // 0 or $FFFFFFFF mov [ecx+_3_], edx // Result[3] := EDX; // 0 or $FFFFFFFF end; CDQ is usually replaced using MOV and SAR, which offer the advantage that the source doesn't have to be EAX and the destination doesn't have to be in EDX (plus they are pairable instructions). Let's see an example: function ToHugeint(const x: Longint): Hugeint; overload; // Result := Hugeint(x); // Parameters: EAX = x; EDX = @Result; asm mov ecx, eax // ECX := x; sar ecx, 31 // ECX := IIF(x=0, 0, $FFFFFFFF); mov [edx+_0_], eax // Result[0] := x; mov [edx+_1_], ecx // Result[1] := EDX; // 0 or $FFFFFFFF mov [edx+_2_], ecx // Result[2] := EDX; // 0 or $FFFFFFFF mov [edx+_3_], ecx // Result[3] := EDX; // 0 or $FFFFFFFF end; Hugeint to Longint A Hugeint can be converted to a Longint by simply taking the low-order 32 bits. The high-order 96 digits of the Hugeint should be all 0 or all 1 matching the sign bit of would be the result (bit 31) for the Hugeint value to be in the range of a Longint, but the function doesn't check for that and performs the conversion blindly (in the same way that a Longint is converted to a Shortint, for example). function ToLongint(const x: Hugeint): Longint; overload; // Result := Longint(x); // No exception is raised if the value is not within // range (high-order 96 bits are discarded). // Parameters: EAX = @x; asm mov eax, [eax+_0_] // Result := x[0]; end; Int64 to Hugeint Int64 parameters are passed on the stack, so functions with an Int64 parameter will automatically create a stack frame. The lowest 64 bits of the result will be the 64 bits of the parameter, and the higher 64 bits of the result will extend the sign bit of the high-order integer that makes up the int64 value. {$IFDEF DELPHI4} function ToHugeint(const x: Int64): Hugeint; overload; // Result := Hugeint(x); // Parameters: x on the stack; EAX = @Result; asm mov edx, dword[x+_0_] // EDX := x[0]; mov ecx, dword[x+_1_] // ECX := x[1]; mov [eax+_0_], edx // Result[0] := x[0]; mov [eax+_1_], ecx // Result[1] := x[1]; sar ecx, 31 // ECX := IIF(x[1]=0, 0, $FFFFFFFF); mov [eax+_2_], ecx // Result[2] := ECX; // 0 or $FFFFFFFF mov [eax+_3_], ecx // Result[3] := ECX; // 0 or $FFFFFFFF end; {$ENDIF} Int64 values are stored in little-endian format, so the low-order integer is the first, at offset 0 from the base address of the variable, and the high-order integer is the second, at offset 4 from the base address of the variable. In this case the base address of the variable is EBP+8 (see the first chapter of this series of articles), so the first element is at EBP+8 (EBP+8+0), and the second element is at EBP+12 (EBP+8+4). I could have used EBP+8 and EBP+12 to address the elements, but "x+_0_" and "x+_1_" refer to these addresses more transparently. The "DWORD" size specifier is mandatory since the assembler takes "x+_0_" and "x+_1_" as pointers to 64-bit data (because "x" is considered a pointer to 64-bit data) and doesn't allow to move the referenced value to a 32-bit register. Hugeint to Int64 A Hugeint can be converted to an Int64 by simply taking the low-order 64 bits. The high-order 64 digits of the Hugeint should be all 0 or all 1 matching the sign bit of would be the result (bit 31) for the Hugeint value to be in the range of an Int64, but the function doesn't check for that and performs the conversion blindly: {$IFDEF DELPHI4} function ToInt64(const x: Hugeint): Int64; overload; // Result := Int64(x) // No exception is raised if the value is not within // range (high-order 64 bits are discarded). // Parameters: EAX = @x; asm mov edx, [eax+_1_] // EDX := x[1]; mov eax, [eax+_0_] // EAX := x[0]; // Result = EDX:EAX = x[1]:x[0] end; {$ENDIF} Comment: Int64 return values should be placed in the EDX (high-order 32 bits) and EAX (low-order 32 bits). More assembler instructions In the source code example (attached) you'll find the implementation of some functions to operate with the Hugeint data type. The purpose is to exemplify the instructions we've seen so far along with some new ones: BT (Bit Test): BT dword ptr [eax], edx -- CF = value of the EDXth bit in the memory pointed by EAX BTS (Bit Test and Set): BTS dword ptr [eax], edx -- sets to 1 the EDXth bit in the memory pointed by EAX CF = previous value of that bit BTR (Bit Test and Reset): BTR dword ptr [eax], edx -- sets to 0 the EDXth bit in the memory pointed by EAX CF = previous value of that bit BTC (Bit Test and Complement): BTC dword ptr [eax], edx -- toggles the value of the EDXth bit in the memory pointed by EAX CF = previous value of that bit We won't reproduce the functions here since you can find them in the source code attached, but we'll show different possible implementations of the function _IsNeg, simply to provide more examples of the instructions we've seen so far: function _IsNeg(x: Hugeint): boolean; // Result := x // Parameters: EAX = @x asm mov eax, [eax+_3_] // EAX := High order 32 bits of x shr eax, 31 // AL := High order bit of EAX (sign bit) end; function _IsNeg(x: Hugeint): boolean; asm cmp dword ptr [eax+_3_], 0 // if x[3] jl @@negative // goto @@negative mov al, 0 // Result := False; ret // exit; @@negative: // @@negative: mov al, 1 // Result := True; end; function _IsNeg(x: Hugeint): boolean; asm // set the Sign Flag and then put it in AL mov eax, [eax+_3_] // EAX := High order 32 bits of x or eax, eax // SF := Sign bit of EAX // alt.: add eax, 0 // also: sub eax, 0 // also: and eax, eax // also: and eax, -1 // or any negative value // also: test eax, eax // also: test eax, -1 // or any negative value sets al // AL := SF; // Sign Flag // alt.: lahf; shr ax, 31 // also: lahf; rol ax, 1; and al, $1 end; function _IsNeg(x: Hugeint): boolean; asm // set the Carry Flag with the Sign Bit to put it in AL mov eax, [eax+_3_] // EAX := High order 32 bits of x bt eax, 31 // CF := Sign bit of EAX // alt.: shl/rol/rcl eax, 1 setc al // AL := CF; // Carry Flag // alt.: mov al, 0; rcl, 1 // also: mov al, 0; adc al, al // also: lahf; mov al, ah; and al, $1 // also: lahf; ror/rcr/shr/sar ax, 1; shr al, 7 // also: lahf; ror/shr/sar ax, 8; and al, $1 // also: lahf; rol ax, 8; and al, $1 // also: lahf; rcl ax, 9; and al, $1 end; function _IsNeg(x: Hugeint): boolean; asm // set the Parity Flag and then negate it in AL mov al, [eax+_3_+3] // EAX := High order 8 bits of x or al, $7F // PF := Not Sign bit // alt.: and eax, $80000000 setnp al // AL := Not PF; // Not Parity Flag // alt.: lahf; rol/shl ax, 6 / rcl ax, 7; xor al,-1 / not al; and al, $1; // also: lahf; ror/shr/sar ax, 10 / rcr ax, 11; xor al,-1 / not al; and al, $1; end; In the next part we'll see functions to add, subtract, multiply and divide huge integers. Part 2: The four fundamental operations In this second and last part we'll finally get to see the actual arithmetics, with the four fundamental operations (addition, subtraction, multiplication and division). Before getting into them I'd like to say that the procedures and functions introduced in the preceeding two parts have been corrected and also further optimized. I still haven't been able to test them as much as I'd have liked to. If you find any bugs or have any comments about the source code, please drop me an email. Addition How do we add two numbers, each made up of four 32-bit integers? Well, it's actually pretty easy. We simply add them in the same way that we would add two numbers of four decimal digits (like 3597 and 0015 for instance), except that here each "digit" can have about 4 billion different (2^32) values instead of just ten. The algorithm would be like this: function AddWithCarry(x: Longint; y: Longint; var Carry: Boolean): Longint; forward; function HugeAdd(x: Hugeint; y: Hugeint): Hugeint; // Result := x + y; var Carry: Boolean; begin Carry := False; Result[0] := AddWithCarry(x[0], y[0], Carry); Result[1] := AddWithCarry(x[1], y[1], Carry); Result[2] := AddWithCarry(x[2], y[2], Carry); Result[3] := AddWithCarry(x[3], y[3], Carry); end; AddWithCarry is a fictitious function which returns an integer with the low order 32 bits of the result of the addition of the two arguments, plus 1 if Carry (the third argument) is True. It also stores True or False to the Carry (passed by reference) depending on whether the addition generated a carry or not (or whether the carry is 1 or 0, if you want to see it that way). Actually, this function doesn't have to be fictitious: function AddWithCarry(x: Longint; y: Longint; var Carry: Boolean): integer; asm // if Carry then CF := 1 else CF := 0; test byte ptr [ecx], -1 // Side-effect: CF := 0; jz @@NoCarry stc // CF := 1; @@NoCarry: // Result := x + y + CF; CF := GeneratedCarry; adc eax, edx // Carry := CF; setc byte ptr [ecx] end; It would be more efficient to have HugeAdd coded entirely in assembler: function HugeAdd(x: Hugeint; y: Hugeint): Hugeint; // Result := x + y; // Parameters: EAX = @x; EDX = @y; ECX = @Result asm push esi mov esi, [eax+_0_] // ESI := x[0]; add esi, [edx+_0_] // ESI := ESI + y[0]; mov [ecx+_0_], esi // Result[0] := ESI; mov esi, [eax+_1_] // ESI := x[1]; adc esi, [edx+_1_] // ESI := ESI + y[1] + Carry; mov [ecx+_1_], esi // Result[1] := ESI; mov esi, [eax+_2_] // ESI := x[2]; adc esi, [edx+_2_] // ESI := ESI + y[2] + Carry; mov [ecx+_2_], esi // Result[2] := ESI; mov esi, [eax+_3_] // ESI := x[3]; adc esi, [edx+_3_] // ESI := ESI + y[3] + Carry; mov [ecx+_3_], esi // Result[3] := ESI; pop esi end; Subtraction Subtraction works very much like addition, but instead of generating a carry, the operation generates a borrow (also represented by the Carry Flag) if the minuend (first operand) is less than the subtrahend (second operand): function SubtractWithBorrow(x: Longint; y: Longint; var Borrow: Boolean): Longint; forward; function HugeSub(x: Hugeint; y: Hugeint): Hugeint; // Result := x - y; var Borrow: Boolean; begin Borrow := False; Result[0] := SubtractWithBorrow(x[0], y[0], Borrow); Result[1] := SubtractWithBorrow(x[1], y[1], Borrow); Result[2] := SubtractWithBorrow(x[2], y[2], Borrow); Result[3] := SubtractWithBorrow(x[3], y[3], Borrow); end; function SubtractWithBorrow(x: Longint; y: Longint; var Borrow: Boolean): Longint; asm // if Borrow then CF := 1 else CF := 0; test byte ptr [ecx], -1 // Side-effect: CF := 0; jz @@NoBorrow stc // CF := 1; @@NoBorrow: // Result := x - y - CF; CF := NeededBorrow; sbb eax, edx // Borrow := CF; setc byte ptr [ecx] end; You should be ready to write a pure assembler version of HugeSub, since it's the same as HugeAdd, but all you have to do is replace ADD and ADC with SUB and SBB respectively. Opposite number Given a number, these implementations of HugeNeg return it's opposite number (two's complement): function HugeNeg(x: Hugeint): Hugeint; begin // Result := (Not x) + 1; Result := HugeAdd(HugeNot(x), IntToHuge(1)); end; function HugeNeg(x: Hugeint): Hugeint; begin // Result := 0 - x; Result := HugeSub(IntToHuge(0), x); end; The second one is the simplest and fastest because it involves a single operation, and now that we know how to subtract, we can implement it in assembler: function HugeNeg(x: Hugeint): Hugeint; // Result := -x; // Parameters: EAX = @x; EDX = @Result asm // Result := 0 - x; push esi xor esi, esi mov ecx, [eax+_0_] // x[0] sub esi, ecx // 0 - x[0] mov ecx, 0 mov [edx+_0_], esi // Result[0] mov esi, [eax+_1_] // x[1] sbb ecx, esi // 0 - x[1] - Borrow mov esi, 0 mov [edx+_1_], ecx // Result[1] mov ecx, [eax+_2_] // x[2] sbb esi, ecx // 0 - x[2] - Borrow mov ecx, 0 mov [edx+_2_], esi // Result[2] mov esi, [eax+_3_] // x[3] sbb ecx, esi // 0 - x[3] - Borrow mov [edx+_3_], ecx // Result[3] pop esi end; Multiplication A way of multiplying numbers is by means of an addition loop: function HugeMul(x: Hugeint; y: Hugeint): Hugeint; begin SetZero(Result); while not HugeIsZero(y) do begin Result := HugeAdd(Result, x); HugeSub(y, 1) end; end; Computationally speaking, this algorithm is quite poor. For example, if the value of "y" was 4 million, the loop would repeat 4 million times! Anyway, the idea would still good if we could somehow accelerate the process. Let's play a little bit with algebra: x * y = x * (y[3]*2^96 + y[2]*2^64 + y[1]*2^32 + y[0]*2^0) = (x*y[3])*2^96 + (x*y[2])*2^64 + (x*y[1])*2^32 + (x*y[0])*2^0 Now we have reduced the problem of multiplying two Hugeint numbers to multiplying a Hugeint number by a 32-bit integer. We multiply the first operand by the four integers that make up the second operand and then we shift the partial results by 0, 32, 64, and 96 bits (to multiply them by 2^0, 2^32, 2^64 and 2^96), and finally we add these values to get the final result. function HugeMulInt(x: Hugeint; y: Longint): Hugeint; forward; function HugeMul(x: Hugeint; y: Hugeint): Hugeint; begin Result := HugeShl(HugeMulInt(x, y[3]), 96) + HugeShl(HugeMulInt(x, y[2]), 64) + HugeShl(HugeMulInt(x, y[1]), 32) + HugeMulInt(x, y[0]); end; This is exactly the way we multiply decimal numbers when performing caculations on a paper, except that here the base is 2^32 instead of ten. Let's see now how we can a multiply a Hugeint by an integer: function MultiplyWithCarry(x: Longint; y: Longint; var Carry: Longint): Longint; forward; function HugeMulInt(x: Hugeint; y: Longint): Hugeint; // Result := x * y; var Carry: Longint; begin Carry := 0; Result[0] := MultiplyWithCarry(x[0], y, Carry); Result[1] := MultiplyWithCarry(x[1], y, Carry); Result[2] := MultiplyWithCarry(x[2], y, Carry); Result[3] := MultiplyWithCarry(x[3], y, Carry); end; function MultiplyWithCarry(x: Longint; y: Longint; var Carry: Longint): integer; // Result := LoDWord(x * y + Carry); // Carry := HiDWord(x * y + Carry); asm // EDX:EAX := EAX * EDX; // x * y mul edx // Inc(EDX:EAX, Carry); add eax, [ecx] adc edx, 0 // Carry := EDX; // High order 32 bits of the result mov [ecx], edx; end; MultiplyWithCarry is very much like AddWithCarry, but it performs a multiplication instead of an addition, and it generates a carry of 32 bits instead of just one bit (the multiplication of two 32-bit values generates a 64-bit result, while the addition of two 32-bit values can generate a 33-bit result). MultiplyWithCarry first performs an unsigned multiplication of "x" (EAX) by "y" (EDX), using the MUL opcode. The result is a 64-bit unsigned integer in EDX:EAX, to which the function adds the Carry passed by parameter. The function returns the lower 32 bits of this final result (located EAX), and the higher 32 bits (EDX) constitute the carry for the next multiplication, which are stored in the Carry parameter (passed by reference). An assembler implementation of HugeMul and HugeMulInt can be found in the source code attached. For reasons of simplicity, in the examples above the functions consider the numbers are unsigned, but the assembler implementations consider signed numbers. Also, the attached version of HugeMul doesn't call HugeMulInt or HugeShl, and is highly optimized. Instead of considering a Huge integer as four 32-bit integers multiplied by four powers of 2^32, we consider them as 128 1-bit integers multiplied by 128 powers of 2: bit127 * 2^127 + bit126 * 2^126 + ... + bit1 * 2^1 + bit0 * 2^0 Since each bit can only be 0 or 1, the algorithm shown above can be greatly simplified: function HugeMul(x: Hugeint; y: Hugeint): Hugeint; // Result := x * y; var i: Longint; begin SetZero(Result); for i := 0 to 127 do if BitTest(y, i) then Result := HugeAdd(Result, HugeShl(x, i)); end; The idea is to add different powers of 2 of "x", depending those powers on the bits set on "y". For example, if "y" was 20, bits 5 and 3 would be on (20 in decimal is 10100 in binary), so only two additions would be performed, and the result would be HugeShl(x, 3) plus HugeShl(x, 5). This algorithm can be coded quite efficiently in assembler, but still the first algorithm will work faster. The reason why I've shown this is because it'll make it easier to understand the algorithm we'll use for divisions. Division Let's first see the case of a division of a Hugeint by a 32-bit integer, which should be easy to understand: function DivideWithRemainder(x: Longint; y: Longint; var Remainder: Longint): Longint; forward; function HugeDivInt(x: Hugeint; y: Longint): Hugeint; // Result := x div y; var Remainder: Longint; begin Remainder := 0; Result[0] := DivideWithRemainder(x[3], y, Remainder); Result[1] := DivideWithRemainder(x[2], y, Remainder); Result[2] := DivideWithRemainder(x[1], y, Remainder); Result[3] := DivideWithRemainder(x[0], y, Remainder); asm mov edx, Remainder end; end; function DivideWithRemainder(x: Longint; y: Longint; var Remainder: Longint): Longint; // Result := Remainder:x div y; // Remainder := Remainder:x mod y; asm push esi mov esi, edx // y mov edx, [ecx] // Remainder // EAX := EDX:EAX div ESI; // EDX := EDX:EAX mod ESI; div esi // Remainder := EDX; mov [ecx], edx; pop esi end; HugeDivInt leaves the remainder of the division in EDX, so it can be used in a function returning the remainder of the division: function HugeModInt(dividend: Hugeint; divisor: Longint): Longint; // Result := dividend mod divisor; // Parameters: EAX = @dividend; EDX = @divisor; asm sub esp, TYPE(Hugeint) // Make place on the stack for a Hugeint mov ecx, esp // to hold the result of the division call HugeDivInt // Perform the division add esp, TYPE(Hugeint) // Restore the stack pointer mov eax, edx // Result := Remainder; // was left in EDX end; For the case of two huge integers we can think of an algorithm like the one we would use to divide two numbers of four digits with paper and pencil, but it turns to be quite complex, plus it isn't actually very fast since it implies divisions, multiplications, and substractions, and sometimes you take one step forwards and two steps back. Is there another possible algorithm? Yes, there is: function HugeDiv(dividend: Hugeint; divisor: Hugeint): Hugeint; // Result := dividend div divisor; begin if HugeIsZero(divisor) then raise EDivByZero.CreateRes(@sDivByZero); Result := 0; while HugeCmp(dividend, divisor) = 0 do begin dividend := HugeSub(dividend, divisor); Result := HugeAdd(Result, IntToHuge(1)); end; end; Of course, this algorithm turns out to be awfully slow (if we divide 12 million by 3, the loop would execute 4 million times), but we can speed things up if we subtract from the dividend the divisor multiplied by different powers of 2, from higher to lower, setting the corresponding bit of the result every time we perform a subtraction (the bit in the position of the power of 2 that was used). It's the inverse of what we did in the case of a multiplication shown above. The division process would then be reduced to just 128 subtractions at most. In the following example, the dividend is 20 (10100 in binary) and the divider is 3 (11 in binary): 10100 - 11 * 2^2 = 10100 - 1100 = 1000 Result := 100 1000 - 11 * 2^1 = 1000 - 110 = 10 Result := 110 Initially, 11 * 2^2 is the highest value that is less or equal to the dividend, so we subtract that value from the dividend and we set bit 2 of the result because we subtracted the divisor multiplied by two to the power of 2. So far, the remainder is 8 (1000 in binary), and 11 * 2^1 is the highest value that is less than or equal to this remainder, so we subtract that value from the remainder, and we set bit 1 of the result because we subtracted the divisor multiplied by two to the power of 1. The remainder is 2 (10 in binary), and since the divisor is greater than that value, division stops there. The remainder of the operation would then be 2 (10 in binary) and since bits 2 and 1 of the result were set, the result is 110 in binary, i.e. 6 in decimal. function HugeDiv(dividend: Hugeint; divisor: Hugeint): Hugeint; var _r_: Hugeint; // remainder _d_: Hugeint; // divisor _q_: Hugeint; // quotient BitPosR, BitPosD, count: integer; begin _r_ := dividend; _d_ := divisor; HugeSetZero(_q_); BitPosD := HugeBitScanReverse(_d_); if BitPosD = -1 then RaiseDivByZero; BitPosR := HugeBitScanReverse(_r_); count := BitPosD - BitPosR; if count 0 then _d_ := HugeShl(_d_, count); repeat if HugeCmp(_d_, _r_) _r_ := HugeSub(_r_, _d_); HugeBitSet(_q_, count); end; _d_ := HugeShr(_d_, 1); dec(count); until count Result := _q_; asm lea edx, _r_ end; end; HugeBitScanReverse is a function that returns the position of the first non-zero bit, performing the search from bit 127 to bit 0. If all bits are zero, the result is -1. We use HugeBitScanReverse to determine the first power of two we should multiply the divisor in order to begin the iteration. The assembler implementation of HugeDiv that you can find attached supports signed numbers. It is just a first approximation, and it can be heavily optimized. The function leaves in EDX the address of the remainder, so it can be used by a function returning the modulus of the division: function HugeMod(dividend: Hugeint; divisor: Hugeint): Hugeint; // Result := dividend Mod divisor; // Parameters: EAX = @dividend; EDX = @divisor; ECX = @Result asm push ecx // @Result call HugeDiv // EDX := @remainder; pop eax // EAX := @Result; call HugeMov // EAX^ := EDX^; end; Previous: Inline Assembler in Delphi (VI) - Calling external procedures