Dynamic perfect hashing

Dynamic perfect hashing: In computer science, dynamic perfect hashing is a programming technique for resolving collisions in a hash table data structure.^[1]^[2]^[3] This technique is useful for situations where fast queries, insertions, and deletions must be made on a large set, S, of elements.

Details

In this method, the entries that hash to the same slot of the table are organized as separate second-level hash table. If there are k entries in this set S, the second-level table is allocated with k² slots, and its hash function is selected at random from a universal hash function set so that it is collision-free (i.e. a perfect hash function). Therefore, the look-up cost is guaranteed to be O(1) in the worst-case.^[2]

function Locate(x) is j = h(x); if (position h_j(x) of subtable T_j contains x (not deleted)) return (x is in S); end if else return (x is not in S); end else end

Although each second-level table requires quadratic space, if the keys inserted into the first-level hash table are uniformly distributed, the structure as a whole occupies expected O(n) space, since bucket sizes are small with high probability.^[1]

During the insertion of a new entry x at j, the global operations counter, count, is incremented. If x exists at j but is marked as deleted then the mark is removed. If x exists at j, or at the subtable T_j, but is not marked as deleted then a collision is said to occur and the j^th bucket's second-level table T_j is rebuilt with a different randomly-selected hash function h_j. Because the load factor of the second-level table is kept low (1/k), rebuilding is infrequent, and the amortized cost of insertions is O(1).^[2]

function Insert(x) is count = count + 1; if (count > M) FullRehash(x); end if else j = h(x); if (Position h_j(x) of subtable T_j contains x) if (x is marked deleted) remove the delete marker; end if end if else b_j = b_j + 1; if (b_j <= m_j) if position h_j(x) of T_j is empty store x in position h_j(x) of T_j; end if else Put all unmarked elements of T_j in list L_j; Append x to list L_j; b_j = length of L_j; repeat h_j = randomly chosen function in H_sj; until h_j is injective on the elements of L_j; for all y on list L_j store y in position h_j(y) of T_j; end for end else end if else m_j = 2 * max{1, m_j}; s_j = 2 * m_j * (m_j - 1); if the sum total of all s_j ≤ 32 * M² / s(M) + 4 * M Allocate s_j cells for T_j; Put all unmarked elements of T_j in list L_j; Append x to list L_j; b_j = length of L_j; repeat h_j = randomly chosen function in H_sj; until h_j is injective on the elements of L_j; for all y on list L_j store y in position h_j(y) of T_j; end for end if else FullRehash(x); end else end else end else end else end

Deletion of x simply flags x as deleted without removal and increments count. In the case of both insertions and deletions, if count reaches a threshold M the entire table is rebuilt, where M is some constant multiple of the size of S at the start of a new phase. Here phase refers to the time between full rebuilds. The amortized cost of delete is O(1).^[2] Note that here the -1 in "Delete(x)" is a representation of an element which is not in the set of all possible elements U.

function Delete(x) is count = count + 1; j = h(x); if position h(x) of subtable Tj contains x mark x as deleted; end if else return (x is not a member of S); end else if (count >= M) FullRehash(-1); end if end

A full rebuild of the table of S first starts by removing all elements marked as deleted and then setting the next threshold value M to some constant multiple of the size of S. A hash function, which partitions S into s(M) subsets, where the size of subset j is s_j, is repeatedly randomly chosen until:

$\sum_{0\le j\le s(M)} s_j \le \frac{32M^2}{s(M)} + 4M.$

Finally, for each subtable T_j a hash function h_j is repeatedly randomly chosen from H_sj until h_j is injective on the elements of T_j. The expected time for a full rebuild of the table of S with size n is O(n).^[2]

function FullRehash(x) is Put all unmarked elements of T in list L; if (x is in U) append x to L; end if count = length of list L; M = (1 + c) * max{count, 4}; repeat h = randomly chosen function in H_s(M); for all j < s(M) form a list L_j for h(x) = j; b_j = length of L_j; m_j = 2 * b_j; s_j = 2 * m_j * (m_j - 1); end for until the sum total of all s_j ≤ 32 * M² / s(M) + 4 * M for all j < s(M) Allocate space s_j for subtable T_j; repeat h_j = randomly chosen function in H_sj; until h_j is injective on the elements of list L_j; end for for all x on list L_j store x in position h_j(x) of T_j; end for end

References

^ ^a ^b Fredman, M. L., Komlós, J., and Szemerédi, E. 1984. Storing a Sparse Table with 0(1) Worst Case Access Time. J. ACM 31, 3 (Jun. 1984), 538-544 http://portal.acm.org/citation.cfm?id=1884#

^ ^a ^b ^c ^d ^e Dietzfelbinger, M., Karlin, A., Mehlhorn, K., Meyer auf der Heide, F., Rohnert, H., and Tarjan, R. E. 1994. Dynamic Perfect Hashing: Upper and Lower Bounds. SIAM J. Comput. 23, 4 (Aug. 1994), 738-761. http://portal.acm.org/citation.cfm?id=182370#

^ Erik Demaine, Jeff Lind. 6.897: Advanced Data Structures. MIT Computer Science and Artificial Intelligence Laboratory. Spring 2003. http://courses.csail.mit.edu/6.897/spring03/scribe_notes/L2/lecture2.pdf

Categories:
Hashing
Search algorithms
Computer programming stubs

Игры ⚽ Нужна курсовая?

Look at other dictionaries:

Perfect hash function — A perfect hash function for a set S is a hash function that maps distinct elements in S to a set of integers, with no collisions. A perfect hash function has many of the same applications as other hash functions, but with the advantage that no… … Wikipedia
Hash table — Not to be confused with Hash list or Hash tree. Unordered map redirects here. For the proposed C++ class, see unordered map (C++). Hash Table Type unsorted dictionary Invented 1953 Time complexity in big O notation Average Worst case Space … Wikipedia
Kurt Mehlhorn — (born August 29, 1949 in Ingolstadt, Germany) is a German computer scientist, a vice president of the Max Planck Society and director of the Max Planck Institute for Computer Science.Mehlhorn graduated in 1971 from the Technical University of… … Wikipedia
List of terms relating to algorithms and data structures — The [http://www.nist.gov/dads/ NIST Dictionary of Algorithms and Data Structures] is a reference work maintained by the U.S. National Institute of Standards and Technology. It defines a large number of terms relating to algorithms and data… … Wikipedia
Список терминов, относящихся к алгоритмам и структурам данных — Это служебный список статей, созданный для координации работ по развитию темы. Данное предупреждение не устанавливается на информационные списки и глоссарии … Википедия
Список терминов — Список терминов, относящихся к алгоритмам и структурам данных Это сл … Википедия
Bloom filter — The Bloom filter, conceived by Burton H. Bloom in 1970, is a space efficient probabilistic data structure that is used to test whether an element is a member of a set. False positives are possible, but false negatives are not. Elements can be… … Wikipedia
Algorithmic efficiency — In computer science, efficiency is used to describe properties of an algorithm relating to how much of various types of resources it consumes. Algorithmic efficiency can be thought of as analogous to engineering productivity for a repeating or… … Wikipedia
Endgame tablebase — A typical interface for querying a tablebase An endgame tablebase is a computerized database that co … Wikipedia
Index (search engine) — Search engine indexing collects, parses, and stores data to facilitate fast and accurate information retrieval. Index design incorporates interdisciplinary concepts from linguistics, cognitive psychology, mathematics, informatics, physics, and… … Wikipedia

Academic Dictionaries and Encyclopedias

Dynamic perfect hashing

Details

References

Look at other dictionaries:

Share the article and excerpts

Academic Dictionaries and Encyclopedias

Wikipedia

Dynamic perfect hashing

Details

References

Look at other dictionaries:

Share the article and excerpts

Direct link