Mega Code Archive

 
Categories / Delphi / Algorithm Math
 

How much looks John like Jon (Updated)

Title: How much looks John like Jon? (Updated) Question: You know the problem: A simple type mismatch and the search in the database goes fail, because the strings did not match... Answer: This function compares the single characters, counts the identical characters an calculates a procentual value, how similar the both strings are. To bring in a little unsharp factor, the function checks if any identical characters is in the "near" of the actual compare position. This is calculated in a formula depending on the length of the strings (diff). Some results: 'John' and 'John' = 100% 'John' and 'Jon' = 75% 'Jim' and 'James' = 40% "Luke Skywalker" and 'Darth Vader' = 0% Extension (24. May 2000): - If parameter tolerant is true, the rules to compare single characters are relaxed. Look for function CompChar. This is not a phonetic compare, just relaxed rules on similarty chracters. The selection is straight german, so feel free to replace with your specific language characters. - Instead of a static array of boolean, I use the class TBits (unit classes in D4) to get unlimited string length. Maybe this class is not available in earlier Delphi versions. - function Max is from unit Math which is not included in D3 Standard. Use instead this: function Max (i1, i2: Integer): Integer; begin if i1 end (*Max*); function StrSimilar (s1, s2: string; tolerant: Boolean): Integer; var hit: Integer; // Number of identical chars p1, p2: Integer; // Position count l1, l2, l: Integer; // Length of strings diff: Integer; // unsharp factor hstr: string; // help var for swapping strings // Array shows if position is already tested test: Classes.TBits; function CompChar (ch1, ch2: Char): Boolean; // german "umlauts" and similar charactes begin if tolerant then begin ch1:= UpCase (ch1); // compare case insensitive ch2:= UpCase (ch2); case ch1 of '', '', 'E': Result:= ch2 in ['', 'E', '']; 'B', 'P': Result:= ch2 in ['B', 'P']; 'C', 'Z': Result:= ch2 in ['C', 'Z']; 'D', 'T': Result:= ch2 in ['D', 'T']; 'F', 'V': Result:= ch2 in ['F', 'V']; 'G', 'K': Result:= ch2 in ['G', 'K']; 'S', '': Result:= ch2 in ['S', '']; 'I', 'J', 'Y', '', '': Result:= ch2 in ['I', 'J', 'Y', '', '']; else Result:= ch1 = ch2; end; end else begin Result:= ch1 = ch2; end; end; begin l1:= Length (s1); l2:= Length (s2); if (l1 // Test Length and swap, if s1 is smaller if l1 hstr:= s2; s2:= s1; s1:= hstr; l:= l2; l2:= l1; l1:= l; end; p1:= 1; p2:= 1; hit:= 0; // calc the unsharp factor depending on // the length of the strings diff:= Max (l1, l2) div 3 + ABS (l1 - l2); // init the test array test:= Classes.TBits.Create; // Calc size of TBits. Must be two bigger, because we're 0-Based // counting from 1, and we need one more then stringlength test.Size:= l1 + 2; // loop through the string repeat // position tested? if not test.Bits[p1] then begin // found a matching character? if CompChar (s1[p1], s2[p2]) and (ABS(p1-p2) test.Bits[p1]:= True; Inc (hit); // increment the hit count // next positions Inc (p1); Inc (p2); if p1 l1 then p1:= 1; end else begin // Set test array test.Bits[p1]:= False; Inc (p1); // Loop back to next test position if p1 l1 then begin while (p1 1) and not (test[p1]) do Dec (p1); Inc (p2) end; end; end else begin Inc (p1); // Loop back to next test position if p1 l1 then begin repeat Dec (p1); until (p1 = 1) or test.Bits[p1]; Inc (p2); end; end; until p2 l2; test.Free; // Release Booleanlist // calc procentual value Result:= 100 * hit DIV l1; end;