Consistent hashing

Consistent hashing

Consistent hashing is a special kind of hashing. In contrast, in most traditional hash tables, a change in the number of array slots causes nearly all keys to be remapped. By using consistent hashing, only K / n keys need to be remapped on average, where K is the number of keys, and n is the number of slots. Consistent hashing could play an increasingly important role as internet use increases and as distributed systems grow more prevalent.


Contents

History

Originally devised by Karger et. al. at MIT for use in distributed caching. The idea has now been expanded to other areas also. An academic paper from 1997 introduced the term "consistent hashing" as a way of distributing requests among a changing population of Web servers. Each slot is then represented by a node in a distributed system. The addition (joins) and removal (leaves/failures) of nodes only requires K / n items to be re-shuffled when the number of slots/nodes change.[1]

This same concept, however, appeared in 1996 within the Super Proxy Script technique created by SHARP for optimizing use by web browsers of multiple caching HTTP proxies.[2]

Consistent hashing has also been used to reduce the impact of partial system failures in large Web applications as to allow for robust caches without incurring the system wide fallout of a failure.[3]

The consistent hashing concept also applies to the design of distributed hash tables (DHTs). DHTs use consistent hashing to partition a keyspace among a distributed set of nodes, and additionally provide an overlay network that connects nodes such that the node responsible for any key can be efficiently located.

Need for consistent hashing

While running collections of caching machines some limitations are experienced. A common way of load balancing n cache machines is to put object o in cache machine number \mbox{hash}(o) \mod n. But this will not work if cache machine is added or removed because n changes and every object is hashed to a new location. This can be disastrous since the originating content servers are flooded with requests from the cache machines. Hence consistent hashing is needed to avoid swamping of servers.

Consistent hashing maps objects to the same cache machine, as far as possible. It means when a cache machine is added, it takes its share of objects from all the other cache machines and when it is removed, its objects are shared between the remaining machines.

The main idea behind the consistent hashing algorithm is to hash both objects and caches using the same hash function. This is done to map the cache to an interval, which contains a number of object hashes. If the cache is removed its interval is taken over by a cache with an adjacent interval. All the remaining caches are unchanged.

Technique

Like most hashing schemes, consistent hashing assigns a set of items to buckets so that each bin receives almost same number of items.But unlike standard hashing schemes, a small change in buckets does not induce a total remapping of items to bucket.

Consistent hashing is based on mapping items to a real angle (or equivalently a point on the edge of a circle). Each of the available machines (or other storage buckets) is also pseudo-randomly mapped on to a series of angles around the circle. The bucket where each item should be stored is then chosen by selecting the next highest angle to which an available bucket maps to. The result is that each bucket contains the resources mapping to an angle between it and the next smallest angle.

If a bucket becomes unavailable (for example because the computer it resides on is not reachable), then the angles it maps to will be removed. Requests for resources that would have mapped to each of those points now map to the next highest point. Since each bucket is associated with many pseudo-randomly distributed points, the resources that were held by that bucket will now map to many different buckets. The items that mapped to the lost bucket must be redistributed among the remaining ones, but values mapping to other buckets will still do so and do not need to be moved.

A similar process occurs when a bucket is added. By adding an angle, we make any resources between that and the next smallest angle map to the new bucket. These resources will no longer be associated with the previous bucket, and any value previously stored there will not be found by the selection method described above.

The portion of the keys associated with each bucket can be altered by altering the number of angles that bucket maps to.

Monotonic Keys

If it is known that key values will always increase monotonically, an alternative approach to consistent hashing is possible.

Properties

Some properties of consistent hashing make it a different and more improved method than other standard hashing schemes. They are:

  1. The 'Spread' property implies that even in the presence of inconsistent views of the world, the references given for a specific object are directed to only a small set of cache. Thus, all clients will be able to access data without using a lot of storage.
  2. The 'Load' property implies that any particular cache is not assigned an unreasonable number of objects.
  3. The 'Smoothness' property implies that smooth changes in the set of caching machines are matched by a smooth change in the location of the cache objects.
  4. The 'Balance' property implies that items are distributed to caches randomly.
  5. The 'Monotonic' property implies that when a bucket is added, only the items assigned to the new bucket are reassigned.

References

  1. ^ Karger, D.; Lehman, E.; Leighton, T.; Panigrahy, R.; Levine, M.; Lewin, D. (1997). "Consistent Hashing and Random Trees: Distributed Caching Protocols for Relieving Hot Spots on the World Wide Web". Proceedings of the Twenty-ninth Annual ACM Symposium on Theory of Computing. ACM Press New York, NY, USA. pp. 654–663. doi:10.1145/258533.258660. http://portal.acm.org/citation.cfm?id=258660. Retrieved 2008-06-17. 
  2. ^ Doi, Katsuo. "Super Proxy Script - How to make distributed proxy servers by URL hashing". http://www.pacfiles.com/images/SSP.htm. Retrieved 2011-08-04. 
  3. ^ Karger, D.; Sherman, A.; Berkheimer, A.; Bogstad, B.; Dhanidina, R.; Iwamoto, K.; Kim, B.; Matkins, L.; Yerushalmi, Y. (1999). "Web Caching with Consistent Hashing". Computer Networks 31 (11): 1203–1213. doi:10.1016/S1389-1286(99)00055-9. http://www8.org/w8-papers/2a-webserver/caching/paper2.html. Retrieved 2008-06-17. 

External links


Wikimedia Foundation. 2010.

Игры ⚽ Нужно сделать НИР?

Look at other dictionaries:

  • Extendible hashing — is a type of hash system which treats a hash as a bit string, and uses a trie for bucket lookup. [ Citation | title=Extendible Hashing A Fast Access Method for Dynamic Files | journal=ACM Transactions on Database Systems | volume=4 | issue=3 |… …   Wikipedia

  • Stable hashing — is a tool used to implement randomized load balancing and distributed lookup in peer to peer computer systems. See also * Hash function * Consistent hashing …   Wikipedia

  • Geometric hashing — In computer science, geometric hashing is a method for efficiently finding two dimensional objects represented by discrete points that have undergone an affine transformation. (Extensions exist to some other object representations and… …   Wikipedia

  • Hash table — Not to be confused with Hash list or Hash tree. Unordered map redirects here. For the proposed C++ class, see unordered map (C++). Hash Table Type unsorted dictionary Invented 1953 Time complexity in big O notation Average Worst case Space …   Wikipedia

  • Chord (peer-to-peer) — In computing, Chord is a protocol and algorithm for a peer to peer distributed hash table. A distributed hash table stores key value pairs by assigning keys to different computers (known as nodes ); a node will store the values for all the keys… …   Wikipedia

  • Chord (distributed hash table) — Chord is one of the original distributed hash table protocols. Chord is being developed at MIT and the current Chord source code can be downloaded and used under the MIT License.Overview Using the Chord lookup protocol, node keys are arranged in… …   Wikipedia

  • Amazon Dynamo — Amazon Logo Amazon Dynamo ist ein verteiltes Dateisystem und damit im Kontext von Infrastructure as a Service einzuordnen. Wie auch das Google File System ist Dynamo für eine konkrete Anwendung optimiert, die auf die Anforderungen einiger Amazon… …   Deutsch Wikipedia

  • Distributed hash table — A distributed hash table (DHT) is a class of a decentralized distributed system that provides a lookup service similar to a hash table; (key, value) pairs are stored in a DHT, and any participating node can efficiently retrieve the value… …   Wikipedia

  • Peer-to-peer — Not to be confused with point to point. This article is about peer to peer computing. For other uses, see Peer to peer (disambiguation). A peer to peer system of nodes without central infrastructure …   Wikipedia

  • Node (networking) — For other uses, see Node (disambiguation). In communication networks, a node (Latin nodus, ‘knot’) is a connection point, either a redistribution point or a communication endpoint (some terminal equipment). The definition of a node depends on the …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”