Van Emde Boas tree

Van Emde Boas tree

A van Emde Boas tree (or van Emde Boas priority queue), also known as a vEB tree, is a tree data structure which implements an associative array with "m"-bit integer keys. It performs all operations in O(log "m") time. Notice that "m" is the "size" of the keys — therefore O(log "m") is O(log log "n") in a full tree, exponentially better than a self-balancing binary search tree. They also have good space efficiency when they contain a large number of elements, as discussed below. They were invented by a team led by Peter van Emde Boas in 1977 [#ref_1| [1] ] .

upported operations

The operations supported by a vEB tree are those of an "ordered associative array", which includes the usual associative array operations along with two more "order" operations, "FindNext" and "FindPrevious" [#ref_2| [2] ] :
*"Insert": insert a key/value pair with an "m"-bit key
*"Delete": remove the key/value pair with a given key
*"Lookup": find the value associated with a given key
*"FindNext": find the key/value pair with the smallest key at least a given "k"
*"FindPrev": find the key/value pair with the largest key at most a given "k"

How it works

Because of the constrained key set, the time boundaries depend on the representation of integers. The idea is to take the "m"-bit key and divide it into its "m"/2 most significant bits ("a") and its "m"/2 least significant bits ("b"). "a" is used to index into an array of 2"m"/2 vEB trees, each capable of holding "m"/2-bit numbers, and searching recursively for "b" in the "a"th one. The effect is to reduce the number of bits in the key by half for each recursive call.

In addition to their speed, the trees can be quite compact when they contain many elements, because no subtrees are created until something needs to be added to them. Initially, each element added creates about log("m") new trees containing about "m/2" pointers all together. As the tree grows, more and more subtrees are reused, especially the larger ones. In a full tree of 2"m" elements, only O(2"m") space is used. Moreover, unlike a binary search tree, most of this space is being used to store data: even for billions of elements, the pointers in a full vEB tree number in the thousands.

However, for small trees the overhead associated with vEB trees is enormous: on the order of 2"m"/2. This is one reason why they are not popular in practice. One way of addressing this limitation is to use only a fixed number of bits per level, which results in a trie.

The order operations are slightly more complicated. If the following information is added to each tree, including all subtrees:
*a flag to tell whether it is empty,
*a field giving the maximum value in the tree,
*a field giving the minimum value in the tree,then "FindNext" can be performed as follows: let "a" be the top half and "b" the bottom half of the bits of "k", the argument to "FindNext". If "b" lies below the maximum value of subtree "a", then the result is in that subtree, so "FindNext" is invoked on it recursively with "b". Otherwise, the first nonempty subtree is found with index > "a" and returning its minimum value.

This usually works, except for one small problem: the search could require as long as "m"/2 time. To speed it up, instead of storing flags, one more vEB tree able to hold numbers up to 2"m"/2 called "top" is added, which contains the indexes of all nonempty trees in the array. "FindNext" can then be invoked recursively on "top" to identify the first index > "a" with a nonempty tree, and its minimum element. "FindPrev" is similar.

Unfortunately, this makes things difficult, because now the "top" tree has to be maintained properly. Doing this the naive way, by adding and removing when trees become empty and nonempty, results in a double recursion that could take O("m") time. To fix this, first a "size" field is added. Next, instead of storing the minimum element in the tree itself it is stored in the "minimum" field. Now, adding an element to an empty tree is constant time, so there is time left to make a recursion on "top" to add the index. Likewise, removing the last element from a tree is constant time, leaving time to remove the tree's index from "top". All operations are, finally, O(log "m").

In practical implementations, especially on machines with "shift-by-k" and "find first zero" instructions, performance can further be improved by switching to a bit array once "m" equal to the word size (or a small multiple thereof) is reached. Since all operations on a single word are constant time, this does not affect the asymptotic performance, but it does avoid the majority of the pointer storage and several pointer dereferences, achieving a significant practical savings in time and space with this trick.

References

*Ref 1: Peter van Emde Boas, R. Kaas, and E. Zijlstra: "Design and Implementation of an Efficient Priority Queue" ("Mathematical Systems Theory" 10: 99-127, 1977)
*Ref 2: Gudmund Skovbjerg Frandsen: " [http://www.daimi.au.dk/~gudmund/dynamicF04/vEB.pdf Dynamic algorithms: Course notes on van Emde Boas trees (PDF)] " (University of Aarhus, Department of Computer Science)
* Erik Demaine, Shantonu Sen, and Jeff Lindy. Massachusetts Institute of Technology. 6.897: Advanced Data Structures (Spring 2003). [http://theory.csail.mit.edu/classes/6.897/spring03/scribe_notes/L1/lecture1.pdf Lecture 1 notes: Fixed-universe successor problem, van Emde Boas] . [http://theory.csail.mit.edu/classes/6.897/spring03/scribe_notes/L2/lecture2.pdf Lecture 2 notes: More van Emde Boas, ...] .


Wikimedia Foundation. 2010.

Игры ⚽ Поможем сделать НИР

Look at other dictionaries:

  • Tree (data structure) — A simple unordered tree; in this diagram, the node labeled 7 has two children, labeled 2 and 6, and one parent, labeled 2. The root node, at the top, has no parent. In computer science, a tree is a widely used data structure that emulates a… …   Wikipedia

  • Binary tree — Not to be confused with B tree. A simple binary tree of size 9 and height 3, with a root node whose value is 2. The above tree is unbalanced and not sorted. In computer science, a binary tree is a tree data structure in which each node has at… …   Wikipedia

  • Hash tree — A binary hash tree In cryptography and computer science Hash trees or Merkle trees are a type of data structure[citation needed] which contains a tree of summary information about a larger piece of da …   Wikipedia

  • Radix tree — In computer science, a radix tree (also patricia trie or radix trie) is a space optimized trie data structure where each node with only one child is merged with its child. The result is that every internal node has at least two children. Unlike… …   Wikipedia

  • Dancing tree — For the film Dancing tree, see Dancing tree (film) In computer science, a dancing tree is a tree data structure, which is similar to B+ tree. Invented by Hans Reiser, for use by the Reiser4 file system. As opposed to self balancing binary search… …   Wikipedia

  • Metric tree — This article is about the data structure. For the type of metric space, see Real tree. A metric tree is any tree data structure specialized to index data in metric spaces. Metric trees exploit properties of metric spaces such as the triangle… …   Wikipedia

  • Cover tree — The cover tree is a type of data structure in computer science that is specifically designed to facilitate the speed up of a nearest neighbor search. It is a refinement of the Navigating Net data structure, and related to a variety of other data… …   Wikipedia

  • Doubly chained tree — In computer science, a Doubly chained tree is a tree data structure in which each node has two pointers. Typically, each node has one pointer pointing to its child and one pointing at the node to its right. Then the following node will have one… …   Wikipedia

  • Priority queue — A priority queue is an abstract data type in computer programming. It is exactly like a regular queue or stack data structure, but additionally, each element is associated with a priority . stack: elements are pulled in last in first out order (e …   Wikipedia

  • Heap (data structure) — This article is about the programming data structure. For the dynamic memory area, see Dynamic memory allocation. Example of a complete binary max heap In computer science, a heap is a specialized tree based data structure that satisfies the heap …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”