Mega Code Archive

 
Categories / Delphi / Examples
 

Searchtext

On 19 Jun 00, at 15:19, Tom Matthews wrote: > Delphians- > I am a newbie, a legacy newbiew perhaps, but a newbie Delphian and am > looking for either a recommendation for reading or shareware/freeware code > to do word searches in word processing/text documents. > The project I'm envisioning will scan a text doc for a block of text, > scan that block for potential keywords, then take those keywords & scan > multiple other documents for matches. > Can anyone point me in the right direction? If you are talking about the MECHANICS of text search, then pos is quite efficient, also hyperstring http://www.delphi32.com/vcl/3339/ has a bunch of text search routines including Boyer Moore Also http://softlab.od.ua has a 'pattern string engine' which looks powerful, although I have never tried it. I am not sure if any of the above will help you with multiple string searches which may be what you are after .. Binstock and Rex 'Practical Algorithms for Programmers' have a (quite long) implementation in C , but that might be overkill. If all your strings are words then a simple hashing approach might be as good as anything ie put your keywords in a hash table, then when searching the new document hash each word and attempt to look it up in the keyword table. Or you could build a trie of the keywords. I am not sure if a trie is in there, but Robert Marsh has a very elegant set of data structures in his "maps" library .. www.rmarsh.com Finally, on string searching http://ei.cs.vt.edu/~cs5604/f95/cs5604cnSS/Algs.html www.efg2.com is always worth checking out for matters algorithmic .. I think he mentions that Ray Lischner's book "Delphi in a Nutshell" has stuff on fast string searching The Stony Brook Algorithm Repository at http://www.cs.sunysb.edu/~algorith/ has a cornucopia of lovely stuff The algorithm Archive at http://www.medsp.com/scott/alg/alg.html is good also more than you ever wanted to know about pattern matching at http://www.cs.purdue.edu/homes/stelo/pattern.html imho, if all you are ever looking for is EXACT (not fuzzy/approximate) matches to WORDS (not patterns or substrings) then hashing should work well for you. hth John Aitchison