Rolling hash

Rolling hash

A rolling hash is a hash function where the input is hashed in a window that moves through the input.

A few hash functions allow a rolling hash to be computed very quickly -- the new hash value is rapidly calculated given only the old hash value, the old value removed from the window, and the new value added to the window -- similar to the way a moving average function can be computed much more quickly than other low-pass filters.

One of the main applications is the Rabin-Karp string search algorithm, which uses the rolling hash described below.

Another popular application is rsync program which uses Mark Adler's adler-32 checksum as its rolling hash.

Example of a rolling hash

Rabin-Karp use a very simple rolling hash function that only uses multiplications and additions. a is a randomly chosen large number.

H = c1 * ak-1 + c2 * ak-2 + c3 * ak-3 ... + ck * a0

In order to avoid manipulating huge H values, all math is done modulo n. The choice of a and n is critical to get good hashing.

Removing and adding chars simply involves adding or subtracting the first or last term. Shifting all chars by one position to the left requires multiplying the entire sum H by a. Shifting all chars by one position to the right requires dividing the entire sum H by a. Note that in modulo arithmetic, a has a multiplicative inverse a* by which H can be multiplied to get the result of the division without actually performing a division.

Good values for a/a* and n can be found in a paper by P. L'Ecuyer (see external links below).

External links

* P. L'Ecuyer, "Tables of Linear Congruential Generators of Different Sizes and Good Lattice Structure", [http://www.ams.org/mcom/ "Mathematics of Computation"] 68:225, 249–260 (1999). [http://citeseer.ist.psu.edu/132363.html Available online at Citeseer] .


Wikimedia Foundation. 2010.

Игры ⚽ Нужно решить контрольную?

Look at other dictionaries:

  • Hash function — A hash function is any well defined procedure or mathematical function for turning some kind of data into a relatively small integer, that may serve as an index into an array. The values returned by a hash function are called hash values, hash… …   Wikipedia

  • Hash Pipe — Infobox Single Artist = Weezer Name = Hash Pipe from Album = Weezer Released = April 2001 (US) June 2 2001 (UK) Format = CD, Vinyl Recorded = 2001 Length = 3:07 Genre = Rock Label = DGC Records Writer = Rivers Cuomo Producer = Ric Ocasek Chart… …   Wikipedia

  • The Rolling Stones — Rolling Stones redirects here. For other uses, see Rolling Stones (disambiguation). The Rolling Stones Mick Jagger, Keith Richards, Ronnie Wood, Charlie Watts …   Wikipedia

  • Rabin-Karp string search algorithm — The Rabin Karp algorithm is a string searching algorithm created by Michael O. Rabin and Richard M. Karp in 1987 that uses hashing to find a substring in a text. It is used for multiple pattern matching rather than single pattern matching. For… …   Wikipedia

  • Karp-Rabin-Algorithmus — Der Rabin Karp Algorithmus ist ein Suchalgorithmus für Texte, der von Michael O. Rabin und Richard M. Karp entwickelt wurde. Der Algorithmus sucht nach einem Muster, z. B. einer Zeichenkette, innerhalb einer anderen Zeichenkette mit Hilfe von… …   Deutsch Wikipedia

  • Rabin-Karp-Algorithmus — Der Rabin Karp Algorithmus ist ein Suchalgorithmus für Texte, der von Michael O. Rabin und Richard M. Karp entwickelt wurde. Der Algorithmus sucht nach einem Muster, z. B. einer Zeichenkette, innerhalb einer anderen Zeichenkette mit Hilfe von… …   Deutsch Wikipedia

  • Кольцевой хэш — (англ. rolling hash)  хэш функция, обрабатывающая вход в рамках некоторого окна. Получение значения хэш функции для сдвинутого окна (window ) в таких функциях является дешевой операцией. Для пересчета значения требуется знать лишь… …   Википедия

  • Comparaison de fichiers — En informatique, la comparaison de fichiers consiste à comparer leur contenu, en isolant leurs différences de leur contenu commun. Le résultat de la comparaison peut être affiché en environnement graphique, en mode texte, ou comme partie de… …   Wikipédia en Français

  • Michael O. Rabin — Michael Oser Rabin Born September 1, 1931 (1931 09 01) (age 80) Breslau …   Wikipedia

  • File comparison — in computing is the automatic comparing of data between files on a file system. The result of comparisons are typically displayed to the user, but can also be used to accomplish tasks in networks, file systems and revision control.Examples of… …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”