- Boyer–Moore–Horspool algorithm
In
computer science , the Boyer–Moore–Horspool algorithm or Horspool's algorithm is analgorithm for findingsubstring s in strings.It is a simplification of the
Boyer-Moore algorithm which is related to theKnuth-Morris-Pratt algorithm . The algorithm trades space for time in order to obtain anaverage-case complexity of O(N) on random text, although it has O(MN) in theworst case . The length of the pattern is M and the length of the search string is N.Example implementation
Here is an example implementation of the Boyer-Moore-Horspool algorithm, written in C.
Performance
The algorithm performs best with long needle strings, when it consistently hits a non-matching character at or near the final byte of the current position in the haystack and the final byte of the needle doesn't occur elsewhere within the needle. For instance a 32 byte needle ending in "z" searching through a 255 byte haystack which doesn't have a 'z' byte in it would take up to 224 byte comparisons.
The best case is the same as for the
Boyer-Moore algorithm inbig O notation , although the constant overhead of initialization and for each loop is less.The worst case behavior happens when the bad character skip is consistently low (with the lower limit of 1 byte movement) and a large portion of the needle matches the haystack. The bad character skip is only low, on a partial match, when the final character of the needle also occurs elsewhere within the needle. With 1 byte movement happening when the same byte is in both of the last two positions.
The canonical degenerate case similar to the above "best" case is a needle of an 'a' byte followed by 31 'z' bytes in a haystack consisting of 255 'z' bytes. This will do 31 successful byte comparisons, a 1 byte comparison that fails and then move forward 1 byte. This process will repeat 223 more times (255 - 32), bringing the total byte comparisons to 7,168 (32 * 224).
The worst case is significantly higher than for the
Boyer-Moore algorithm , although obviously this is hard to achieve in normal use cases. It is also worth noting that this worst case is also the worst case for the naive (but usual) memmem() algorithm, although the implementation of that tends to be significantly optimized (and is more cache friendly).ee also
*
Boyer–Moore string search algorithm References
* [http://www-igm.univ-mlv.fr/%7Elecroq/string/node18.html Description of the algorithm]
External links
*
FASMLIB contains implementation in x86 32bit assembly (procedure "mem.find")
* [http://movsd.com/bm.htm "BMH.ASM"] is an implementation of the Boyer–Moore–Horspool algorithm in x86 32bit assembly. ("BM.ASM" implements theBoyer–Moore string search algorithm ).
* [http://johannburkard.de/software/stringsearch/ StringSearch – high-performance pattern matching algorithms in Java] – Implementations of many String-Matching-Algorithms in Java (BNDM, Boyer-Moore-Horspool, Boyer-Moore-Horspool-Raita, Shift-Or)
* [http://angusj.com/delphi/ Boyer-Moore-Horspool implementation for Delphi] - Non-visual components (TSearch and TFileSearch) which enable very fast data searches using the Boyer-Moore-Horspool search algorithm.
Wikimedia Foundation. 2010.