Primary clustering

Primary clustering

Primary clustering is the tendency for certain open-addressing hash tables collision resolution schemes to create long sequences of filled slots. It is most commonly referred to in the context of problems with linear probing.

These long chains degrade the hash-table performance closer to O(n) performance instead of closer to O(1).

Consider the linear probing hash function h(k, i) : (h`(k) + i)mod N. With k being the key, i the probing-iteration, N being the number of slots in the hash-table and h`(k) being the secondary-hash function.

Thus the probing sequence for key k is: {h`(k), h`(k) + 1, h`(k) + 2, ..., h`(k) + n}. It is easy to see that the probing sequences for two different keys may overlap and create clustering.

Also since the probability of each slot being already full is equal to the load-factor of the hash-table, as the factor approaches 1.0 so does the chance that the next slot in the probing sequence will be full, this degrades performance of the hash-table to linear time.

Note: clusters of filled slots can grow exponentially in some cases. For example, imagine two large clusters of filled slots separated by one empty slot. Placing an entry into this slot will double the cluster size.

comp-sci-stub


Wikimedia Foundation. 2010.

Игры ⚽ Нужно сделать НИР?

Look at other dictionaries:

  • Primary biliary cirrhosis — Classification and external resources Micrograph of primary biliary cirrhosis showing bile duct inflammation and injury. H E stain …   Wikipedia

  • Primary care physician — A primary care physician, or PCP, is a physician/medical doctor who provides both the first contact for a person with an undiagnosed health concern as well as continuing care of varied medical conditions, not limited by cause, organ system, or… …   Wikipedia

  • Primary structure — In biochemistry, the primary structure of a biological molecule is the exact specification of its atomic composition and the chemical bonds connecting those atoms (including stereochemistry). For a typical unbranched, un crosslinked biopolymer… …   Wikipedia

  • List of terms relating to algorithms and data structures — The [http://www.nist.gov/dads/ NIST Dictionary of Algorithms and Data Structures] is a reference work maintained by the U.S. National Institute of Standards and Technology. It defines a large number of terms relating to algorithms and data… …   Wikipedia

  • Список терминов, относящихся к алгоритмам и структурам данных —   Это служебный список статей, созданный для координации работ по развитию темы.   Данное предупреждение не устанавливается на информационные списки и глоссарии …   Википедия

  • Список терминов — Список терминов, относящихся к алгоритмам и структурам данных   Это сл …   Википедия

  • CrimeStat — Primary File page of CrimeStat CrimeStat is a Windows based spatial statistic software program that conducts spatial and statistical analysis and is designed to interface with a Geographic Information System. The program is developed by Ned… …   Wikipedia

  • cosmos — /koz meuhs, mohs/, n., pl. cosmos, cosmoses for 2, 4. 1. the world or universe regarded as an orderly, harmonious system. 2. a complete, orderly, harmonious system. 3. order; harmony. 4. any composite plant of the genus Cosmos, of tropical… …   Universalium

  • Human genetic variation — is the natural variation in gene frequencies observed between the genomes of individuals or groups of humans. Variation can be measured at both the individual level (differences between individual people) and at the population level, i.e.… …   Wikipedia

  • Computer cluster — Not to be confused with data cluster. A computer cluster is a group of linked computers, working together closely thus in many respects forming a single computer. The components of a cluster are commonly, but not always, connected to each other… …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”