Universal hashing

Universal hashing

Universal hashing is a randomized algorithm for selecting a hash function "F" with the following property: for any two distinct inputs "x" and "y", the probability that "F(x)=F(y)" (i.e., that there is a hash collision between "x" and "y") is the same as if "F" was a random function. Thus, if "F" has function values in a range of size "r", the probability of any particular hash collision should be at most "1/r". There are universal hashing methods that give a function "F" that can be evaluated in a handful of computer instructions.

Introduction

Hashing has been used to associate with an input, usually a string, a small value that originally was used as an index to look up something about that input in a table. Since then hashing has found other uses. For example, two inputs might be compared by checking to see if their hash values are the same. Thus, we can see that hash functions are many-to-one mappings. The use of the word "hash" is mnemonic because the intent of a hash function is to take as many of the inputs usually encountered and assign different values to them, by scrambling them or making a hash of the inputs, using the meaning of hash from domains such as cooking. If for any given input there are too many collisions that is viewed as undesirable.

Universal Hashing

Because a hash function is a many-to-one mapping, there must exist some set of elements that will collide under the hash function, due to the pigeon-hole principle. One wants to design the hash function such that for the input sets, it is unlikely that elements collide. Proving in a mathematical sense that you are unlikely to encounter a particular set of inputs would appear to be an impossible task.

Randomized algorithms present a way of proving that you are unlikely to encounter a bad set of inputs. We can construct a "Universal Class" of hash functions with the property that for any given set of inputs they will scatter the inputs among the range of the function well -- essentially as well as choosing random values for those inputs. Thus, simply choosing a random function from the class allows a proof that the probabilistic expectation for any set of inputs is that they will be distributed randomly.

In fact, we are in many cases interested in only pairwise collisions. That is to say, the odds that any two inputs x and y collide will be approximately the same as the reciprocal of the size of the range. It might be that for any given universal class of hash functions there exist x, y and z such that if x and y collide then so does z. While some work has been done on the set issue, universal hashing only makes statements about pairwise collisions.

Definition

The formal definition involves a set of keys, "K", a set of values, "V", and a family of hash functions, "H", mapping keys to values. Let |"V"| denote size of "V", the number of possible values. Then for all pairs of distinct keys "x" and "y" in "K", "H" is a 2-universal family of hash functions if

: Pr_{h in H} [h(x) = h(y)] le frac{1} .

More stringently, "H" is a strongly 2-universal family if for all pairs of distinct keys "x" and "y" in "K", and for all values "x"′ and "y"′ in "V",

: Pr_{h in H} [h(x) = x' mbox{ and } h(y) = y'] = frac{1}{|V|^2} .

Without qualification, the latter definition is probably intended.

Examples

A simple universal class of hash function is all functions "h" of the form

: h(x)= f(g(x)) ,! and

: g(x)=((ax + b)mod p) ,!

"p" is a prime guaranteed larger than any possible input and each combination of a and b forming a different function in the class. "f" then becomes a mapping function to map elements from a domain which is 0 to "p"-1 to a range of say 0 to "n"-1. "f" then can simply be taking the result of "g" mod "n". There is only one "f" for all the functions in this class. To see why this class is universal, observe that for any two inputs and any two outputs, there are approximately "p"/"n" elements that can map to any output and for any of pair of those "p"/"n" elements you can solve the simultaneous equations in the field mod "p", so for any pair of inputs there is a unique pair of "a" and "b" that will take it to those elements.

Uses

Universal hashing has numerous uses in computer science, for example in cryptography and in implementations of hash tables. Since the function is randomly chosen, an adversary hoping to create many hash collisions is unlikely to succeed.

Universal hashing has been generalized in many ways, most notably to the notion of k-wise independent hash functions, where the function is required to act like a random function on any set of "k" inputs.

ee also

* Hash function.
* Universal One Way Hash Functions, UOWHF.

References


* cite book
last = Knuth
first = Donald Ervin
authorlink = Donald Ervin Knuth
title = [The Art of Computer Programming] , Vol. III: Sorting and Searching
edition = 2e
year = 1998
publisher = Addison-Wesley
location = Reading, Mass ; London
notes = v. 1. Fundamental algorithms -- v. 2. Seminumerical algorithms -- v. 3. Sorting and searching.
id = ISBN 0-201-89685-0

External links

J. Lawrence Carter and Mark N. Wegman's original paper [http://delivery.acm.org/10.1145/810000/803400/p106-carter.pdf?key1=803400&key2=8808704411&coll=GUIDE&dl=ACM&CFID=68671000&CFTOKEN=29201954 "Universal Classes of Hash Functions"]


Wikimedia Foundation. 2010.

Игры ⚽ Поможем сделать НИР

Look at other dictionaries:

  • Dynamic perfect hashing — In computer science, dynamic perfect hashing is a programming technique for resolving collisions in a hash table data structure.[1][2][3] This technique is useful for situations where fast queries, insertions, and deletions must be made on a… …   Wikipedia

  • List of terms relating to algorithms and data structures — The [http://www.nist.gov/dads/ NIST Dictionary of Algorithms and Data Structures] is a reference work maintained by the U.S. National Institute of Standards and Technology. It defines a large number of terms relating to algorithms and data… …   Wikipedia

  • Список терминов, относящихся к алгоритмам и структурам данных —   Это служебный список статей, созданный для координации работ по развитию темы.   Данное предупреждение не устанавливается на информационные списки и глоссарии …   Википедия

  • Список терминов — Список терминов, относящихся к алгоритмам и структурам данных   Это сл …   Википедия

  • Hash table — Not to be confused with Hash list or Hash tree. Unordered map redirects here. For the proposed C++ class, see unordered map (C++). Hash Table Type unsorted dictionary Invented 1953 Time complexity in big O notation Average Worst case Space …   Wikipedia

  • Хеширование — Хеш функция, отображающая множество имён в множество натуральныых чисел Хеширование (иногда «хэширование», англ. hashing)  преобразование по детерменированному алгоритму входного массива данных прои …   Википедия

  • UMAC — In cryptography, a message authentication code based on universal hashing, or UMAC, is a type of message authentication code (MAC) calculated choosing a hash function from a class of hash functions according to some secret (random) process and… …   Wikipedia

  • Hash function — A hash function is any well defined procedure or mathematical function for turning some kind of data into a relatively small integer, that may serve as an index into an array. The values returned by a hash function are called hash values, hash… …   Wikipedia

  • UMAC — (англ. message authentication code based on universal hashing, universal MAC, код аутентификации сообщения на основе универсального хеширования)  один из видов кода аутентичности сообщений (MAC). Содержание 1 История 2 Описание …   Википедия

  • MinHash — In computer science, MinHash (or the min wise independent permutations locality sensitive hashing scheme) is a technique for quickly estimating how similar two sets are. The scheme was invented by Andrei Broder (1997),[1] and initially used… …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”