Mega Code Archive

High speed parser

Title: High speed parser Question: You can download some necessary files from www.pisarev.net The component is intended for some mathematics and logical calculations. Parser works with the very high speed - about 10000000 operations per 1 - 1.5 seconds for simple mathematics formulas and a little more for logical formulas (these results were reached on the computer with Athlon 1800XP, 512 DDR in Windows XP) Answer: The TDataEditor parser Introduction The component is intended for some mathematics and logical calculations. Parser contains standard set of mathematics (such as sin or sqrt) and logical functions (such as or true). It also contains standard set of types. But it is possible to create new functions and new types. Parser works with the very high speed - about 10000000 operations per 1 - 1.5 seconds for simple mathematics formulas and a little more for logical formulas (these results were reached on the computer with Athlon 1800XP, 512 DDR in Windows XP). Parser creates binary representation of formula (script) and makes all following calculations by the script. There are no limitations for the formula; it could have any length and any amount of embedded formulas. Embedded formula is expression that is located between brackets and has increased calculation priority. There are two types of script: logical script (for logical formulas) and mathematics script (for mathematics formulas). Mathematic script could contain inside another mathematic script (embedded formula). Logical script could contain inside both types of script. These scripts have a lot of similar parts. Each script has a header that contains some initial information. Detailed structures of mathematics and logical scripts are in source file (DataEditor.pas). It is in constant description section. Formulas structure Each formula has to be constructed by some rules. Also it is necessary to know about structure it may have. Functions, types, and other components are in list below: Single: type, defines float 32-bit value Double: type, defines float 64-bit value Int64: type, defines integer signed 64-bit value Integer: type, defines integer signed 32-bit value Longword: type, defines integer unsigned 32-bit value Smallint: type, defines integer signed 16-bit value Word: type, defines integer unsigned 16-bit value Shortint: type, defines integer signed 8-bit value Byte: type, defines integer unsigned 8-bit value if: reserved word, defines logical expression and: operand, used for manipulations on logical expressions and executes conjunction operation or: operand, used for manipulations on logical expressions and executes disjunction operation xor: operand, used for manipulations on logical expressions and executes exclusive disjunction operation not: operand, used for manipulations on logical expressions and executes negation operation logical function, returns true if first mathematics expression is greater than second mathematics expression : logical function, returns true if mathematics expressions are not equal =: logical function, returns true if first mathematics expression is equal to or greater than second mathematics expression =: logical function, returns true if mathematics expressions are equal Odd: logical function, returns true if mathematic expression is an odd number True: logical function, always returns true False: logical function, always returns false +: operand, executes adding operation -: operand, executes subtraction operation *: mathematics function, executes multiplying operation /: mathematics function, executes division operation Sqrt: mathematics functions, root of a number. Root could have any degree Div: mathematics functions, executes integer division operation Mod: mathematics functions, executes remainder operation Int: mathematics function, returns the integer part of a number. Frac: mathematics function, returns the fractional part of a number Random: mathematics function, returns random number within the range 0 Trunc: mathematics function, truncates a number to an integer Round: mathematics function, returns the value rounded to the nearest whole number Sin: mathematics function, returns the sine of the angle in radians ArcSin: mathematics function, returns the inverse sine of a number Sinh: mathematics function, returns the hyperbolic sine of an angle ArcSinh: mathematics function, returns the inverse hyperbolic sine of a number Cos: mathematics function, returns the cosine of the angle in radians ArcCos: mathematics function, returns the inverse cosine of a number Cosh: mathematics function, returns the hyperbolic cosine of an angle ArcCosh: mathematics function, returns the inverse hyperbolic cosine of a number Tan: mathematics function, returns the tangent of the angle ArcTan: mathematics function, returns the arctangent of a number Tanh: mathematics function, returns the hyperbolic tangent of an angle ArcTanh: mathematics function, the inverse hyperbolic tangent of a number CoTan: mathematics function, returns the cotangent of the angle ArcCoTan: mathematics function, returns the inverse cotangent of a number CoTanh: mathematics function, returns the hyperbolic cotangent of an angle ArcCoTanh: mathematics function, the inverse hyperbolic cotangent of a number Sec: mathematics function, returns the secant of an angle ArcSec: mathematics function, returns the inverse secant of a number Sech: mathematics function, returns the hyperbolic secant of an angle ArcSech: mathematics function, returns the inverse hyperbolic secant of a number Csc: mathematics function, returns the cosecant of an angle ArcCsc: mathematics function, returns the inverse cosecant of a number Csch: mathematics function, returns the hyperbolic cosecant of an angle ArcCsch: mathematics function, returns the inverse hyperbolic secant of a number Abs: mathematics function, returns an absolute value Ln: mathematics function, returns the natural log of an expression. Lg: mathematics function, returns log base 10 Log: mathematics function, returns the log of expression for a specified base Pi: mathematics function, returns 3.1415926535897932385 Exp: mathematics function, returns the exponential of an expression !: mathematics function, returns factorial of an expression ^: mathematics function, raises expression to any power Logical formulas Any case the logical formula needs to be started with the reserved word if. It means the current expression is logical. You can use any amount of embedded formulas that are between brackets. Embedded formulas could have any type. Therefore inside of brackets its necessary to write reserved word if if embedded expression is logical and write nothing if embedded expression if mathematics. For example: "if (2 log 4) = (4 sqrt 2) or (if (2 * 2) = 4)" if true and (if true xor true) or (2 * 2) = 4 Both expressions return true. You may check this use SpeedTest program. There is each mathematics expression that contains functions needs to be put inside of brackets in the formula. But its not necessary for numbers. For example: if 1 = 1 or 2 = 2 or 3 =3 if (trunk pi) = 3 What components of formula are mathematics functions - you can clarify using components list above. Some functions for working with the parser All operations with the formulas executes object of TDataEditor class. Parser has a lot of methods for registration new functions, types; also it has some methods for translating formulas to scripts and executing these scripts. If you are not going to create new functions, you may skip that paragraph. procedure RegisterNumFunction(out FunctionID: Integer; const FunctionName: string; RequireValue1, RequireValue2: Boolean); virtual; Register mathematics function with FunctionName name. Parameters RequireValue1 and RequireValue2 mean if parameter needs before or after function accordingly. FunctionID is identifier of registered function. If function failed, the FunctionID parameter is -1. This parameter will be changed automatically after procedure SortNumFunctionsData is called. Registration order is not important, but always after registration new elements its necessary to call sorting procedure. There are three sorting procedures: SortNumFunctionsData, SortBoolFunctionsData and SortTypesData in TDataEditor for sorting mathematics, logical functions and types accordingly. These methods sort registered functions thereby placing the longest functions to the beginning of memory. Its necessary to remember that after sorting procedure is called all functions identifiers are changed. For example: ... type TForm1 = class(TForm) ... private FLogID: Integer; FSinhID: Integer; ... end; ... procedure TForm1.FormCreate(Sender: TObject); begin with DataEditor do begin // Log and Sinh functions already exist like standard functions, // but this sample is just hypothetical //Function Log needs both parameters: RegisterNumFunction(FLogID, 'log', True, True); //Function Sinh needs just a second parameter: RegisterNumFunction(FSinhID, 'sinh', False, True); // Its possible, that now FLogID = 0 and FSinhID = 1; SortNumFunctionsData; // Now FLogID = 1 and FSinhID = 0, because of the first function // is the longest function end; ... end; ... function UnRegisterNumFunction(): Boolean; Deletes registered function by its name or identifier. procedure RegisterBoolFunction(out FunctionID: Integer; const FunctionName: string; RequireValue1, RequireValue2: Boolean); virtua; It has the same sense as method RegisterNumFunction above. procedure RegisterType(out TypeID: Integer; const TypeName: string); overload; virtual; Registers new type and gives new type identifier TypeID. If function failed, the TypeID parameter is -1. function UnRegisterType(): Boolean; Deletes registered type by its name or identifier. procedure StringToNumScript(const S: string; out Script: TScript; OpenedBracket: Char = '('; ClosedBracket: Char = ')'); overload; virtual; Converts string S to the script Script. Parameters OpenedBracket and ClosedBracket contains characters of the beginning and the end of embedded formula. procedure StringToBoolScript(const S: string; out Script: TScript; OpenedBracket: Char = '('; ClosedBracket: Char = ')'); overload; virtual; It has the same sense as method StringToNumScript above. function ExecuteNumScript(Index: Integer): Double; overload; virtual; Executes mathematics script and returns the execution result. Parameter Index defines script address. function ExecuteNumScript(P: Pointer): Double; overload; virtual; Executes mathematics script and returns the execution result. Parameter P defines script address. function ExecuteNum: Double; overload; virtual; Executes mathematics script that is in the Script property of TDataEditor class and returns execution result. function ExecuteBoolScript(Index: Integer): Boolean; It has the same sense as method ExecuteNumScript above. function ExecuteBoolScript(P: Pointer): Boolean; It has the same sense as method ExecuteNumScript above. function ExecuteBool: Boolean; It has the same sense as method ExecuteNumScript above. property Script: TScript; Contains script. Script could have any type. property Formula: string; Contains formula. property Accuracy: TRoundToRange; Indicates the power of rounding. To be short its the second parameter of Accuracy function. Its necessary just for operations with logical scripts. It has default value -7. property OnNumFunction: TNumFunctionEvent; TNumFunctionEvent = function(FunctionID: Integer; TypeID: Integer; var Value1: Double; Value2, Value3: Double): Boolean of object; This event needs to be processed only if there are new functions in object of TDataEditor class. Otherwise all standard functions execute within DefaultNumFunction method. Parameter FunctionID is identifier of the processing function. Parameter TypeID is the current type identifier. This event will appear each time its necessary to execute function in script even if its just a standard function. Therefore you can change the standard functions behavior. If you want to call default function - you should set function result to True. Parameters Value2 and Value3 are functions parameters, they are valid only if current function was registered (see RegisterNumFunction method, parameters RequireValue1 and RequireValue2 above) with necessary instructions. Result of function execution should be placed to Value1 parameter. For example: ... uses ..., Math; ... type TForm1 = class(TForm) ... private FLogID: Integer; FSinhID: Integer; function NumFunction(FunctionID: Integer; TypeID: Integer; var Value1: Double; Value2, Value3: Double): Boolean; ... end; ... procedure TForm1.FormCreate(Sender: TObject); begin with DataEditor do begin // Log and Sinh functions are already exist like standard functions, // but this sample is just hypothetical RegisterNumFunction(FLogID, 'log', True, True); RegisterNumFunction(FSinhID, 'sinh', False, True); SortNumFunctionsData; end; ... end; function TForm1.NumFunction(FunctionID, TypeID: Integer; var Value1: Double; Value2, Value3: Double): Boolean; begin // Now we check if the current function is not standard, // if its so then we calculate our function and place result in // Value1 parameter and set Function result to True. // Otherwise we set function result to True to call default // function: if FunctionID = FLogID then Value1 := Value1 := LogN(Value2, Value3) else if FucntionID = FSinhID then Value1 := SinH(Value3) else begin Result := True; Exit; end; Result := False; end; ... property OnBoolFunction: TBoolFunctionEvent; TBoolFunctionEvent = function(FunctionID: Integer; TypeID: Integer; var Value1: Boolean; Value2, Value3: Double): Boolean of object; It has the same sense as property OnNumFunction above. Samples GraphX For the first sample of using TDataEditor Ive made an ActiveX object which builds functions graphs. I guess it fully demonstrates parser possibilities. Now Id like to describe some methods and properties of TGraphBldr component. Ive made the TGraphBldr component to use it in ActiveX object - its descendent of TCustomControl class. So, here are some properties and methods of TGraphBldr: function XCoord(X: Double): Double; returns X coordinate on a scale of graph. Parameter X should be indicated on scale of a component. function YCoord(Y: Double): Double; returns Y coordinate on a scale of graph. Parameter Y should be indicated on scale of a component. function Coordinates(X, Y: Double): TCoord; TCoord = record X, Y: Double; end; return coordinates on a scale of graph. Parameters X and Y should be indicated on a scale of component. property Picture: TBitmap; contains graph image. property DetailLevel: Integer; Regulates graph detail level. This property indicates the points amount that will be calculated from the minimum to maximum values of axes X. If you increase the points amount on axis X then occurs accordingly increase the points amount on axis Y. Therefore the effect of increase this property is more noticeable on graphs with the predominated vertical lines, like tan x or sec x. So, its practically senselessly to increase detail level on such graphs like sin x or cos x. Component creates points array that will be exposed by filtration. Filtration is necessary because of there are many points with the same coordinates - its the side effect of overweening detail level. property FramePen: TPen; defines object for graph frame line drawing property GraphPen: TPen; defines object for graph line drawing property GridPen: TPen; defines object for graph grid line drawing property HorzSpacing: Double; defines horizontal cell size of a graph grid property ShowAxis: Boolean; defines if graph axes are visible property ShowFrame: Boolean; defines if graph frame is visible property ShowGrid: Boolean; defines if graph grid is visible property ShowText: Boolean; defines if formula text is visible on graph property Text: string; contains formula property TracePen: TPen; defines object for trace lines drawing property Tracing: Boolean; defines if graph is traceable property VertSpacing: Double; defines vertical cell size of a graph grid property XMaxValue: Integer; maximum value of axis X property YMaxValue: Integer; maximum value of axis Y property OnTrace: TTraceEvent; TTraceEvent = procedure(Sender: TObject; X, Y: Double) of object; This event will appear each time user moves mouse cursor on a component if Tracing is True. X and Y parameters are mouse cursor coordinates on a scale of graph. There is another one sample, more powerful than GraphX. It names Graph and its an application. SpeedTest This program is for speed testing of TDataEditor component. You can regulate amount of formula execution. Like youll see, there is some difference between time of execution simple and complex formulas in program. Also program demonstrates script length and structure. Structure is displayed by four byte numbers and by one byte number. Generator This program is intended for testing the execution of huge formulas. Generator program generates and executes a long length mathematics formula: To generate new text press F8 key, or choose Service|Generate menu item. You can write your own formulas directly in program and execute them by pressing F9 key. Elements count trackbar regulates approximate amount of elements in formula. Elements are numbers and functions. Embeddings power regulates the power of embeddings appearance in generated text. Unfortunately, its almost impossible to get a valid result of generated text if we use all registered functions (if amount of elements in formula is set to maximum). But we can use simplest functions to get a valid result with the longest formula. Its possible to choose some functions sets in program. DrawX It draws picture using formulas which are on the left side of program. These formulas are in Drawing.dat. You can write there your own formulas. X function defines X-coordinate of current pixel, Y defines Y-coordinate of the pixel and Index is index of the pixel. For picture calculation we need at least three formulas, since we use 24-bit picture. Each time the program use three random formulas. There could appear a really interesting pictures!