Mega Code Archive
High speed parser
Title: High speed parser
Question: You can download some necessary files from www.pisarev.net
The component is intended for some mathematics and logical calculations. Parser works with the very high speed - about 10000000 operations per 1 - 1.5 seconds for simple mathematics formulas and a little more for logical formulas (these results were reached on the computer with Athlon 1800XP, 512 DDR in Windows XP)
Answer:
The TDataEditor parser
Introduction
The component is intended for some mathematics and logical calculations. Parser contains standard set of mathematics (such as sin or sqrt) and logical functions (such as or true). It also contains standard set of types. But it is possible to create new functions and new types. Parser works with the very high speed - about 10000000 operations per 1 - 1.5 seconds for simple mathematics formulas and a little more for logical formulas (these results were reached on the computer with Athlon 1800XP, 512 DDR in Windows XP). Parser creates binary representation of formula (script) and makes all following calculations by the script. There are no limitations for the formula; it could have any length and any amount of embedded formulas. Embedded formula is expression that is located between brackets and has increased calculation priority.
There are two types of script: logical script (for logical formulas) and mathematics script (for mathematics formulas). Mathematic script could contain inside another mathematic script (embedded formula). Logical script could contain inside both types of script. These scripts have a lot of similar parts. Each script has a header that contains some initial information. Detailed structures of mathematics and logical scripts are in source file (DataEditor.pas). It is in constant description section.
Formulas structure
Each formula has to be constructed by some rules. Also it is necessary to know about structure it may have. Functions, types, and other components are in list below:
Single: type, defines float 32-bit value
Double: type, defines float 64-bit value
Int64: type, defines integer signed 64-bit value
Integer: type, defines integer signed 32-bit value
Longword: type, defines integer unsigned 32-bit value
Smallint: type, defines integer signed 16-bit value
Word: type, defines integer unsigned 16-bit value
Shortint: type, defines integer signed 8-bit value
Byte: type, defines integer unsigned 8-bit value
if: reserved word, defines logical expression
and: operand, used for manipulations on logical expressions and executes conjunction operation
or: operand, used for manipulations on logical expressions and executes disjunction operation
xor: operand, used for manipulations on logical expressions and executes exclusive disjunction operation
not: operand, used for manipulations on logical expressions and executes negation operation
logical function, returns true if first mathematics expression is greater than second mathematics expression
: logical function, returns true if mathematics expressions are not equal
=: logical function, returns true if first mathematics expression is equal to or greater than second mathematics expression
=: logical function, returns true if mathematics expressions are equal
Odd: logical function, returns true if mathematic expression is an odd number
True: logical function, always returns true
False: logical function, always returns false
+: operand, executes adding operation
-: operand, executes subtraction operation
*: mathematics function, executes multiplying operation
/: mathematics function, executes division operation
Sqrt: mathematics functions, root of a number. Root could have any degree
Div: mathematics functions, executes integer division operation
Mod: mathematics functions, executes remainder operation
Int: mathematics function, returns the integer part of a number.
Frac: mathematics function, returns the fractional part of a number
Random: mathematics function, returns random number within the range 0 Trunc: mathematics function, truncates a number to an integer
Round: mathematics function, returns the value rounded to the nearest whole number
Sin: mathematics function, returns the sine of the angle in radians
ArcSin: mathematics function, returns the inverse sine of a number
Sinh: mathematics function, returns the hyperbolic sine of an angle
ArcSinh: mathematics function, returns the inverse hyperbolic sine of a number
Cos: mathematics function, returns the cosine of the angle in radians
ArcCos: mathematics function, returns the inverse cosine of a number
Cosh: mathematics function, returns the hyperbolic cosine of an angle
ArcCosh: mathematics function, returns the inverse hyperbolic cosine of a number
Tan: mathematics function, returns the tangent of the angle
ArcTan: mathematics function, returns the arctangent of a number
Tanh: mathematics function, returns the hyperbolic tangent of an angle
ArcTanh: mathematics function, the inverse hyperbolic tangent of a number
CoTan: mathematics function, returns the cotangent of the angle
ArcCoTan: mathematics function, returns the inverse cotangent of a number
CoTanh: mathematics function, returns the hyperbolic cotangent of an angle
ArcCoTanh: mathematics function, the inverse hyperbolic cotangent of a number
Sec: mathematics function, returns the secant of an angle
ArcSec: mathematics function, returns the inverse secant of a number
Sech: mathematics function, returns the hyperbolic secant of an angle
ArcSech: mathematics function, returns the inverse hyperbolic secant of a number
Csc: mathematics function, returns the cosecant of an angle
ArcCsc: mathematics function, returns the inverse cosecant of a number
Csch: mathematics function, returns the hyperbolic cosecant of an angle
ArcCsch: mathematics function, returns the inverse hyperbolic secant of a number
Abs: mathematics function, returns an absolute value
Ln: mathematics function, returns the natural log of an expression.
Lg: mathematics function, returns log base 10
Log: mathematics function, returns the log of expression for a specified base
Pi: mathematics function, returns 3.1415926535897932385
Exp: mathematics function, returns the exponential of an expression
!: mathematics function, returns factorial of an expression
^: mathematics function, raises expression to any power
Logical formulas
Any case the logical formula needs to be started with the reserved word if. It means the current expression is logical. You can use any amount of embedded formulas that are between brackets. Embedded formulas could have any type. Therefore inside of brackets its necessary to write reserved word if if embedded expression is logical and write nothing if embedded expression if mathematics. For example:
"if (2 log 4) = (4 sqrt 2) or (if (2 * 2) = 4)"
if true and (if true xor true) or (2 * 2) = 4
Both expressions return true. You may check this use SpeedTest program.
There is each mathematics expression that contains functions needs to be put inside of brackets in the formula. But its not necessary for numbers. For example:
if 1 = 1 or 2 = 2 or 3 =3
if (trunk pi) = 3
What components of formula are mathematics functions - you can clarify using components list above.
Some functions for working with the parser
All operations with the formulas executes object of TDataEditor class. Parser has a lot of methods for registration new functions, types; also it has some methods for translating formulas to scripts and executing these scripts. If you are not going to create new functions, you may skip that paragraph.
procedure RegisterNumFunction(out FunctionID: Integer; const FunctionName: string; RequireValue1, RequireValue2: Boolean); virtual;
Register mathematics function with FunctionName name. Parameters RequireValue1 and RequireValue2 mean if parameter needs before or after function accordingly. FunctionID is identifier of registered function. If function failed, the FunctionID parameter is -1. This parameter will be changed automatically after procedure SortNumFunctionsData is called. Registration order is not important, but always after registration new elements its necessary to call sorting procedure. There are three sorting procedures: SortNumFunctionsData, SortBoolFunctionsData and SortTypesData in TDataEditor for sorting mathematics, logical functions and types accordingly. These methods sort registered functions thereby placing the longest functions to the beginning of memory. Its necessary to remember that after sorting procedure is called all functions identifiers are changed. For example:
...
type
TForm1 = class(TForm)
...
private
FLogID: Integer;
FSinhID: Integer;
...
end;
...
procedure TForm1.FormCreate(Sender: TObject);
begin
with DataEditor do begin
// Log and Sinh functions already exist like standard functions,
// but this sample is just hypothetical
//Function Log needs both parameters:
RegisterNumFunction(FLogID, 'log', True, True);
//Function Sinh needs just a second parameter:
RegisterNumFunction(FSinhID, 'sinh', False, True);
// Its possible, that now FLogID = 0 and FSinhID = 1;
SortNumFunctionsData;
// Now FLogID = 1 and FSinhID = 0, because of the first function
// is the longest function
end;
...
end;
...
function UnRegisterNumFunction(): Boolean; Deletes registered function by its name or identifier.
procedure RegisterBoolFunction(out FunctionID: Integer; const FunctionName: string; RequireValue1, RequireValue2: Boolean); virtua; It has the same sense as method RegisterNumFunction above.
procedure RegisterType(out TypeID: Integer; const TypeName: string); overload; virtual; Registers new type and gives new type identifier TypeID. If function failed, the TypeID parameter is -1.
function UnRegisterType(): Boolean; Deletes registered type by its name or identifier.
procedure StringToNumScript(const S: string; out Script: TScript; OpenedBracket: Char = '('; ClosedBracket: Char = ')'); overload; virtual; Converts string S to the script Script. Parameters OpenedBracket and ClosedBracket contains characters of the beginning and the end of embedded formula.
procedure StringToBoolScript(const S: string; out Script: TScript; OpenedBracket: Char = '('; ClosedBracket: Char = ')'); overload; virtual; It has the same sense as method StringToNumScript above.
function ExecuteNumScript(Index: Integer): Double; overload; virtual; Executes mathematics script and returns the execution result. Parameter Index defines script address.
function ExecuteNumScript(P: Pointer): Double; overload; virtual; Executes mathematics script and returns the execution result. Parameter P defines script address.
function ExecuteNum: Double; overload; virtual; Executes mathematics script that is in the Script property of TDataEditor class and returns execution result.
function ExecuteBoolScript(Index: Integer): Boolean; It has the same sense as method ExecuteNumScript above.
function ExecuteBoolScript(P: Pointer): Boolean; It has the same sense as method ExecuteNumScript above.
function ExecuteBool: Boolean; It has the same sense as method ExecuteNumScript above.
property Script: TScript; Contains script. Script could have any type.
property Formula: string; Contains formula.
property Accuracy: TRoundToRange; Indicates the power of rounding. To be short its the second parameter of Accuracy function. Its necessary just for operations with logical scripts. It has default value -7.
property OnNumFunction: TNumFunctionEvent;
TNumFunctionEvent = function(FunctionID: Integer; TypeID: Integer; var Value1: Double; Value2, Value3: Double): Boolean of object;
This event needs to be processed only if there are new functions in object of TDataEditor class. Otherwise all standard functions execute within DefaultNumFunction method. Parameter FunctionID is identifier of the processing function. Parameter TypeID is the current type identifier. This event will appear each time its necessary to execute function in script even if its just a standard function. Therefore you can change the standard functions behavior. If you want to call default function - you should set function result to True. Parameters Value2 and Value3 are functions parameters, they are valid only if current function was registered (see RegisterNumFunction method, parameters RequireValue1 and RequireValue2 above) with necessary instructions. Result of function execution should be placed to Value1 parameter. For example:
...
uses ..., Math;
...
type
TForm1 = class(TForm)
...
private
FLogID: Integer;
FSinhID: Integer;
function NumFunction(FunctionID: Integer; TypeID: Integer;
var Value1: Double; Value2, Value3: Double): Boolean;
...
end;
...
procedure TForm1.FormCreate(Sender: TObject);
begin
with DataEditor do begin
// Log and Sinh functions are already exist like standard functions,
// but this sample is just hypothetical
RegisterNumFunction(FLogID, 'log', True, True);
RegisterNumFunction(FSinhID, 'sinh', False, True);
SortNumFunctionsData;
end;
...
end;
function TForm1.NumFunction(FunctionID, TypeID: Integer;
var Value1: Double; Value2, Value3: Double): Boolean;
begin
// Now we check if the current function is not standard,
// if its so then we calculate our function and place result in
// Value1 parameter and set Function result to True.
// Otherwise we set function result to True to call default
// function:
if FunctionID = FLogID then Value1 := Value1 := LogN(Value2, Value3)
else if FucntionID = FSinhID then Value1 := SinH(Value3)
else begin
Result := True;
Exit;
end;
Result := False;
end;
...
property OnBoolFunction: TBoolFunctionEvent;
TBoolFunctionEvent = function(FunctionID: Integer; TypeID: Integer; var Value1: Boolean; Value2, Value3: Double): Boolean of object;
It has the same sense as property OnNumFunction above.
Samples
GraphX
For the first sample of using TDataEditor Ive made an ActiveX object which builds functions graphs. I guess it fully demonstrates parser possibilities. Now Id like to describe some methods and properties of TGraphBldr component. Ive made the TGraphBldr component to use it in ActiveX object - its descendent of TCustomControl class. So, here are some properties and methods of TGraphBldr:
function XCoord(X: Double): Double; returns X coordinate on a scale of graph. Parameter X should be indicated on scale of a component.
function YCoord(Y: Double): Double; returns Y coordinate on a scale of graph. Parameter Y should be indicated on scale of a component.
function Coordinates(X, Y: Double): TCoord;
TCoord = record
X, Y: Double;
end;
return coordinates on a scale of graph. Parameters X and Y should be indicated on a scale of component.
property Picture: TBitmap; contains graph image.
property DetailLevel: Integer; Regulates graph detail level. This property indicates the points amount that will be calculated from the minimum to maximum values of axes X. If you increase the points amount on axis X then occurs accordingly increase the points amount on axis Y. Therefore the effect of increase this property is more noticeable on graphs with the predominated vertical lines, like tan x or sec x. So, its practically senselessly to increase detail level on such graphs like sin x or cos x. Component creates points array that will be exposed by filtration. Filtration is necessary because of there are many points with the same coordinates - its the side effect of overweening detail level.
property FramePen: TPen; defines object for graph frame line drawing
property GraphPen: TPen; defines object for graph line drawing
property GridPen: TPen; defines object for graph grid line drawing
property HorzSpacing: Double; defines horizontal cell size of a graph grid
property ShowAxis: Boolean; defines if graph axes are visible
property ShowFrame: Boolean; defines if graph frame is visible
property ShowGrid: Boolean; defines if graph grid is visible
property ShowText: Boolean; defines if formula text is visible on graph
property Text: string; contains formula
property TracePen: TPen; defines object for trace lines drawing
property Tracing: Boolean; defines if graph is traceable
property VertSpacing: Double; defines vertical cell size of a graph grid
property XMaxValue: Integer; maximum value of axis X
property YMaxValue: Integer; maximum value of axis Y
property OnTrace: TTraceEvent; TTraceEvent = procedure(Sender: TObject; X, Y: Double) of object; This event will appear each time user moves mouse cursor on a component if Tracing is True. X and Y parameters are mouse cursor coordinates on a scale of graph.
There is another one sample, more powerful than GraphX. It names Graph and its an application.
SpeedTest
This program is for speed testing of TDataEditor component. You can regulate amount of formula execution. Like youll see, there is some difference between time of execution simple and complex formulas in program. Also program demonstrates script length and structure. Structure is displayed by four byte numbers and by one byte number.
Generator
This program is intended for testing the execution of huge formulas. Generator program generates and executes a long length mathematics formula:
To generate new text press F8 key, or choose Service|Generate menu item. You can write your own formulas directly in program and execute them by pressing F9 key. Elements count trackbar regulates approximate amount of elements in formula. Elements are numbers and functions. Embeddings power regulates the power of embeddings appearance in generated text. Unfortunately, its almost impossible to get a valid result of generated text if we use all registered functions (if amount of elements in formula is set to maximum). But we can use simplest functions to get a valid result with the longest formula. Its possible to choose some functions sets in program.
DrawX
It draws picture using formulas which are on the left side of program. These formulas are in Drawing.dat. You can write there your own formulas. X function defines X-coordinate of current pixel, Y defines Y-coordinate of the pixel and Index is index of the pixel. For picture calculation we need at least three formulas, since we use 24-bit picture. Each time the program use three random formulas. There could appear a really interesting pictures!