- Boyer–Moore string search algorithm
The Boyer–Moore string search algorithm is a particularly efficient
string searching algorithm , and it has been the standard benchmark for the practical string search literature. [Hume and Sunday (1991) " [Fast String Searching] " SOFTWARE—PRACTICE AND EXPERIENCE, VOL. 21(11), 1221–1248 (NOVEMBER 1991)] It was developed byBob Boyer andJ Strother Moore in 1977. Thealgorithm preprocesses the target string (key) that is being searched for, but not the string being searched (unlike some algorithms that preprocess the string to be searched and can then amortize the expense of the preprocessing by searching repeatedly). The execution time of the Boyer-Moore algorithm can be sub-linear: it doesn't need to check every character of the string to be searched, but rather skips over some of them. Generally the algorithm gets faster as the key being searched for becomes longer. Its efficiency derives from the fact that with each unsuccessful attempt to find a match between the search string and the text it's searching, it uses the information gained from that attempt to rule out as many positions of the text as possible where the string cannot match.How the algorithm works
The second table is slightly more difficult to calculate: for each value of "i" less than the length of the search string, we must first calculate the pattern consisting of the last "i" characters of the search string, preceded by a "mis"-match for the character before it; then weinitially line it up with the search pattern and determine the least number of characters the partial pattern must be shifted left before the two patterns match. For instance, for the search string ANPANMAN, the table would be as follows: (
Nsignifies any character that is not N)The amount of shift calculated by the second table is sometimes called the "good suffix shift" [http://www.movsd.com/bm.htm] or "(strong) good suffix rule". The original published Boyer-Moore algorithm [.
Uses
Wikipedia uses a patched version of PHP which uses an implementation of Boyer-Moore to speed up page loading. Wikipedia database developer Domas Mituzas discussed Wikipedia's use of the Boyer-Moore algorithm in a 2007 presentation:
"Fast String Search" is a replacement module for PHP’s strtr() function, which uses a Commentz-Walter–style algorithm for multiple search terms, or the Boyer-Moore algorithm for single search terms. License collisions (GPL code was used for it) do not allow its inclusion in PHP. Using a proper algorithm instead of foreach loops is an incredible boost for some applications. [D. Mituzas, "Wikipedia: Site internals, configuration, code examples and management issues", http://dammit.lt/uc/workbook2007.pdf]
ee also
*
Knuth–Morris–Pratt algorithm References
External links
* [http://www.cs.utexas.edu/~moore/publications/fstrpos.pdf Original article]
* [http://www-sr.informatik.uni-tuebingen.de/~buehler/BM/BM1.html Animation] of the Boyer-Moore algorithm
* [http://www.cs.utexas.edu/users/moore/best-ideas/string-searching/fstrpos-example.html An example of the Boyer-Moore algorithm] from the homepage ofJ Strother Moore , co-inventor of the algorithm
* [http://www-igm.univ-mlv.fr/%7Elecroq/string/node14.html An explanation of the algorithm (with sample C code)]
* [http://www.cs.nyu.edu/cs/faculty/cole/papers/CHPZ95.ps Cole et al, Tighter lower bounds on the exact complexity of string matching]
Wikimedia Foundation. 2010.