- Bitap algorithm
The bitap algorithm (also known as the shift-or, shift-and or Baeza-Yates-Gonnet algorithm) is a
fuzzy string searching algorithm developed byUdi Manber and Sun Wu in 1991ref|Manber91ref|Manber92 based on work done byRicardo Baeza-Yates andGaston Gonnet .ref|BYG89 The algorithm tells whether a given text contains a substring which is "approximately equal" to a given pattern, where approximate equality is defined in terms ofLevenshtein distance — if the substring and pattern are within a given distance "k" of each other, then the algorithm considers them equal. The algorithm begins by precomputing a set ofbitmask s containing one bit for each element of the pattern. Then it is able to do most of the work withbitwise operation s, which are extremely fast.The bitap algorithm is perhaps best known as one of the underlying algorithms of the
Unix utilityagrep , written by Manber, Wu, andBurra Gopal . Manber and Wu's original paper gives extensions of the algorithm to deal with fuzzy matching of generalregular expression s.Due to the data structures required by the algorithm, it performs best on patterns less than a constant length (typically the
word length of the machine in question), and also prefers inputs over a small alphabet. Once it has been implemented for a given alphabet and word length "m", however, itsrunning time is completely predictable — it runs in O("mn") operations, no matter the structure of the text or the pattern.This algorithm was improved by Baeza-Yates and Navarro in 1996 and later by
Gene Myers for long patterns in 1998.Exact searching
The bitap algorithm for exact string searching, in full generality, looks like this when implemented in C:
#include
#include typedef char BIT; const char *bitap_search(const char *text, const char *pattern) { const char *result = NULL; int m = strlen(pattern); BIT *R; int i, k; if (pattern [0] = '
Wikimedia Foundation.
2010.