Bitap algorithm

Bitap algorithm

The bitap algorithm (also known as the shift-or, shift-and or Baeza-Yates-Gonnet algorithm) is a fuzzy string searching algorithm developed by Udi Manber and Sun Wu in 1991ref|Manber91ref|Manber92 based on work done by Ricardo Baeza-Yates and Gaston Gonnet.ref|BYG89 The algorithm tells whether a given text contains a substring which is "approximately equal" to a given pattern, where approximate equality is defined in terms of Levenshtein distance — if the substring and pattern are within a given distance "k" of each other, then the algorithm considers them equal. The algorithm begins by precomputing a set of bitmasks containing one bit for each element of the pattern. Then it is able to do most of the work with bitwise operations, which are extremely fast.

The bitap algorithm is perhaps best known as one of the underlying algorithms of the Unix utility agrep, written by Manber, Wu, and Burra Gopal. Manber and Wu's original paper gives extensions of the algorithm to deal with fuzzy matching of general regular expressions.

Due to the data structures required by the algorithm, it performs best on patterns less than a constant length (typically the word length of the machine in question), and also prefers inputs over a small alphabet. Once it has been implemented for a given alphabet and word length "m", however, its running time is completely predictable — it runs in O("mn") operations, no matter the structure of the text or the pattern.

This algorithm was improved by Baeza-Yates and Navarro in 1996 and later by Gene Myers for long patterns in 1998.

Exact searching

The bitap algorithm for exact string searching, in full generality, looks like this when implemented in C: #include #include typedef char BIT; const char *bitap_search(const char *text, const char *pattern) { const char *result = NULL; int m = strlen(pattern); BIT *R; int i, k; if (pattern [0] = '


Wikimedia Foundation. 2010.

Игры ⚽ Нужно сделать НИР?

Look at other dictionaries:

  • String searching algorithm — String searching algorithms, sometimes called string matching algorithms, are an important class of string algorithms that try to find a place where one or several strings (also called patterns) are found within a larger string or text. Let Σ be… …   Wikipedia

  • Fuzzy string searching — Approximate string search is the name that is used for a category of techniques for finding strings that approximately match some given pattern string. It may also be known as approximate or inexact matching. Approximate string searching has two… …   Wikipedia

  • List of algorithms — The following is a list of the algorithms described in Wikipedia. See also the list of data structures, list of algorithm general topics and list of terms relating to algorithms and data structures.If you intend to describe a new algorithm,… …   Wikipedia

  • Двоичный алгоритм поиска подстроки — (также bitap algorithm, shift or algorithm)  алгоритм поиска подстроки, использующий тот факт, что в современных компьютерах битовый сдвиг и побитовое ИЛИ являются атомарными операциями. По сути, это примитивный алгоритм поиска с небольшой… …   Википедия

  • Список алгоритмов — Эта страница информационный список. Основная статья: Алгоритм Ниже приводится список алгоритмов, группированный по категориям. Более детальные сведения приводятся в списке структур данных и …   Википедия

  • Levenshtein distance — In information theory and computer science, the Levenshtein distance is a string metric for measuring the amount of difference between two sequences. The term edit distance is often used to refer specifically to Levenshtein distance. The… …   Wikipedia

  • Damerau–Levenshtein distance — In information theory and computer science, the Damerau–Levenshtein distance (named after Frederick J. Damerau and Vladimir I. Levenshtein) is a distance (string metric) between two strings, i.e., finite sequence of symbols, given by counting the …   Wikipedia

  • Agrep — (approximate grep) is a fuzzy string searching program, developed by Udi Manber and Sun Wu between 1988 and 1991, for use with the Unix operating system. It was later ported to OS/2, DOS, and Windows.It selects the best suited algorithm for the… …   Wikipedia

  • Ricardo Baeza-Yates — (born March 21, 1961) is a Chilean computer scientist and director of the [http://research.yahoo.com/ Yahoo! Research] labs at Barcelona, Spain and Santiago, Chile. His Ph.D. from the University of Waterloo was entitled Efficient Text Searching …   Wikipedia

  • Pattern search — may refer to: * Pattern recognition * Pattern mining * String searching algorithm * Fuzzy string searching * Bitap algorithm * K optimal pattern discovery * Nearest neighbor search (Nearest neighbor) * Eyeball search …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”