Mega Code Archive
How much looks John like Jon (Updated)
Title: How much looks John like Jon? (Updated)
Question: You know the problem: A simple type mismatch and the search in
the database goes fail, because the strings did not match...
Answer:
This function compares the single characters, counts the identical characters an calculates a procentual value, how similar the both strings are.
To bring in a little unsharp factor, the function checks if any
identical characters is in the "near" of the actual compare position. This is calculated in a formula depending on the length of the strings (diff).
Some results:
'John' and 'John' = 100%
'John' and 'Jon' = 75%
'Jim' and 'James' = 40%
"Luke Skywalker" and 'Darth Vader' = 0%
Extension (24. May 2000):
- If parameter tolerant is true, the rules to compare single
characters are relaxed. Look for function CompChar. This is
not a phonetic compare, just relaxed rules on similarty
chracters. The selection is straight german, so feel free to
replace with your specific language characters.
- Instead of a static array of boolean, I use the class TBits
(unit classes in D4) to get unlimited string length. Maybe
this class is not available in earlier Delphi versions.
- function Max is from unit Math which is not included in D3
Standard. Use instead this:
function Max (i1, i2: Integer): Integer;
begin
if i1 end (*Max*);
function StrSimilar (s1, s2: string; tolerant: Boolean): Integer;
var hit: Integer; // Number of identical chars
p1, p2: Integer; // Position count
l1, l2, l: Integer; // Length of strings
diff: Integer; // unsharp factor
hstr: string; // help var for swapping strings
// Array shows if position is already tested
test: Classes.TBits;
function CompChar (ch1, ch2: Char): Boolean;
// german "umlauts" and similar charactes
begin
if tolerant then begin
ch1:= UpCase (ch1); // compare case insensitive
ch2:= UpCase (ch2);
case ch1 of
'', '', 'E': Result:= ch2 in ['', 'E', ''];
'B', 'P': Result:= ch2 in ['B', 'P'];
'C', 'Z': Result:= ch2 in ['C', 'Z'];
'D', 'T': Result:= ch2 in ['D', 'T'];
'F', 'V': Result:= ch2 in ['F', 'V'];
'G', 'K': Result:= ch2 in ['G', 'K'];
'S', '': Result:= ch2 in ['S', ''];
'I', 'J',
'Y', '', '': Result:= ch2 in ['I', 'J', 'Y', '', ''];
else Result:= ch1 = ch2;
end;
end else begin
Result:= ch1 = ch2;
end;
end;
begin
l1:= Length (s1);
l2:= Length (s2);
if (l1 // Test Length and swap, if s1 is smaller
if l1 hstr:= s2; s2:= s1; s1:= hstr;
l:= l2; l2:= l1; l1:= l;
end;
p1:= 1; p2:= 1; hit:= 0;
// calc the unsharp factor depending on
// the length of the strings
diff:= Max (l1, l2) div 3 + ABS (l1 - l2);
// init the test array
test:= Classes.TBits.Create;
// Calc size of TBits. Must be two bigger, because we're 0-Based
// counting from 1, and we need one more then stringlength
test.Size:= l1 + 2;
// loop through the string
repeat
// position tested?
if not test.Bits[p1] then begin
// found a matching character?
if CompChar (s1[p1], s2[p2]) and
(ABS(p1-p2) test.Bits[p1]:= True;
Inc (hit); // increment the hit count
// next positions
Inc (p1); Inc (p2);
if p1 l1 then p1:= 1;
end else begin
// Set test array
test.Bits[p1]:= False;
Inc (p1);
// Loop back to next test position
if p1 l1 then begin
while (p1 1) and not (test[p1]) do Dec (p1);
Inc (p2)
end;
end;
end else begin
Inc (p1);
// Loop back to next test position
if p1 l1 then begin
repeat Dec (p1); until (p1 = 1) or test.Bits[p1];
Inc (p2);
end;
end;
until p2 l2;
test.Free; // Release Booleanlist
// calc procentual value
Result:= 100 * hit DIV l1;
end;