- Red-black tree
A red-black tree is a type of
self-balancing binary search tree, a data structureused in computer science, typically used to implement associative arrays. The original structure was invented in 1972by Rudolf Bayerwho called them "symmetric binary B-trees", but acquired its modern name in a paper in 1978by Leonidas J. Guibasand Robert Sedgewick. It is complex, but has good worst-case running time for its operations and is efficient in practice: it can search, insert, and delete in O(log "n") time, where "n" is total number of elements in the tree.
A red-black tree is a special type of
binary tree, used in computer scienceto organize pieces of comparable data, such as numbers.
In red-black trees, the leaf nodes are not relevant and do not contain data. These leaves need not be explicit in computer memory — a null child pointer can encode the fact that this child is a leaf — but it simplifies some algorithms for operating on red-black trees if the leaves really are explicit nodes. To save memory, sometimes a single "sentinel" node performs the role of all leaf nodes; all references from internal nodes to leaf nodes then point to the sentinel node.
Red-black trees, like all
binary search trees, allow efficient in-order traversalof elements provided that there is a way to locate the parent of any node. The search-time results from the traversal from root to leaf, and therefore a balanced tree, having the least possible tree height, results in O(log "n") search time.
A red-black tree is a
binary search treewhere each node has a "color" attribute, the value of which is either "red" or "black". In addition to the ordinary requirements imposed on binary search trees, the following additional requirements of any valid red-black tree apply:
# A node is either red or black.
# The root is black. (This rule is used in some definitions and not others. Since the root can always be changed from red to black but not necessarily vice-versa this rule has little effect on analysis.)
# All leaves are black, even when the parent is black (The leaves are the "null" children.)
# Both children of every red node are black.
# Every simple path from a node to a descendant leaf contains the same number of black nodes, either counting or not counting the null black nodes. (Counting or not counting the null black nodes does not affect the structure as long as the choice is used consistently.)
These constraints enforce a critical property of red-black trees: that the longest path from the root to a leaf is no more than twice as long as the shortest path from the root to a leaf in that tree. The result is that the tree is roughly balanced. Since operations such as inserting, deleting, and finding values requires worst-case time proportional to the height of the tree, this theoretical upper bound on the height allows red-black trees to be efficient in the worst-case, unlike ordinary
binary search trees.
To see why these properties guarantee this, it suffices to note that no path can have two red nodes in a row, due to property 4. The shortest possible path has all black nodes, and the longest possible path alternates between red and black nodes. Since all maximal paths have the same number of black nodes, by property 5, this shows that no path is more than twice as long as any other path.
In many presentations of tree data structures, it is possible for a node to have only one child, and leaf nodes contain data. It is possible to present red-black trees in this paradigm, but it changes several of the properties and complicates the algorithms. For this reason, this article uses "null leaves", which contain no data and merely serve to indicate where the tree ends, as shown above. These nodes are often omitted in drawings, resulting in a tree which seems to contradict the above principles, but which in fact does not. A consequence of this is that all internal (non-leaf) nodes have two children, although one or both of those children may be null leaves.
Some explain a red-black tree as a binary search tree whose edges, instead of nodes, are colored in red or black, but this does not make any difference. The color of a node in this article's terminology corresponds to the color of the edge connecting the node to its parent, except that the root node is always black (property 2) whereas the corresponding edge does not exist.
Analogy to B-trees of order 4
A red-black tree is similar in structure to a
B-treeof order 4, where each node can contain between 1 to 3 values and (accordingly) between 2 to 4 child pointers. In such B-tree, each node will contain only one value matching the value in a black node of the red-black tree, with an optional value before and/or after it in the same node, both matching an equivalent red node of the red-black tree.
One way to see this equivalence is to "move up" the red nodes in a graphical representation of the red-black tree, so that they align horizontally with their parent black node, by creating together a horizontal cluster. In the B-tree, or in the modified graphical representation of the red-black tree, all leaf nodes are at the same depth.
The red-black tree is then structurally equivalent to a B-tree of order 4, with a minimum fill factor of 33% of values per cluster with a maximum capacity of 3 values.
However, the conversion of the structurally equivalent B-tree back to a red-black tree can produce multiple results. Effectively, if a B-tree cluster contains only 1 value, the minimum (and has two child pointers), then this value will be black in the red-black tree. If the cluster contains 3 values, then the central value will be black and each value stored on its sides will be red in the red-black tree. However if the cluster contains two values, each one can become the black node in the red-black tree (and the other one will be red).
So the order-4 B-tree does not maintain which of the values contained in each cluster is the root black tree for the whole cluster and the parent of the other values in the same cluster. Despite this, the operations on red-black trees are more economical because you don't have to maintain the vector of values (it may be costly if values are stored directly in each node rather than being stored by simple reference. But B-tree nodes are more economical in space, because you don't need to store the colors attribute for each node (instead you have to know which slot in the vector of 3 values of the cluster is used; if values are stored by references, for example they are objects, you just need to be able to make a distinction with null references, and so the cluster is just represented by a vector containing 3 slots for value pointers plus 4 slots for child references in the tree). In that case, the B-tree can be more compact in memory, it will increase the data locality.
The same analogy can be made with B-trees with larger orders that can be structurally equivalent to a colored binary tree: you just need more colors. Suppose that you add blue, then the blue-red-black tree defined like red-black trees but with the additional constraint that no two successive nodes in the hierarchy will be blue and all blue nodes will be children of a red node, then it becomes equivalent to a B-tree whose clusters will have at most 7 values in the following colors: blue, red, blue, black, blue, red, blue (For each cluster, there will be at most 1 black node, 2 red nodes, and 4 blue nodes).
For moderate volumes of values, insertions and deletions in a colored binary tree are faster compared to B-trees because colored trees don't attempt to maximize the fill factor of each horizontal cluster of nodes (only the minimum fill factor is guaranteed in colored binary trees, limiting the number of splits or junctions of clusters). B-trees will be faster for performing rotations (because rotations will frequently occur within the same cluster rather than with multiple separate nodes in a colored binary tree). However for storing large volumes, B-trees will be much faster as they will be more compact by grouping several children in the same cluster where they can be accessed locally.
All optimizations possible in B-trees to increase the average fill factors of clusters are possible in the equivalent multicolored binary tree. Notably, maximizing the average fill factor in a structurally equivalent B-tree is the same as reducing the total height of the multicolored tree, by increasing the number of non-black nodes. The worst case occurs when all nodes in a colored binary tree are black, the best case occurs when only a third of them are black (and the other two thirds are red nodes).
Applications and related data structures
Red-black trees offer worst-case guarantees for insertion time, deletion time, and search time. Not only does this make them valuable in time-sensitive applications such as real-time applications, but it makes them valuable building blocks in other data structures which provide worst-case guarantees; for example, many data structures used in
computational geometrycan be based on red-black trees.
AVL treeis another structure supporting O(log "n") search, insertion, and removal. It is more rigidly balanced than Red-Black trees, leading to slower insertion and removal but faster retrieval.
Red-black trees are also particularly valuable in
functional programming, where they are one of the most common persistent data structures, used to construct associative arrays and sets which can retain previous versions after mutations. The persistent version of red-black trees requires O(log "n") space for each insertion or deletion, in addition to time.
Red-black trees are an
isometryof 2-3-4 trees. In other words, for every 2-3-4 tree, there is a corresponding red-black tree with data elements in the same order. The insertion and deletion operations on 2-3-4 trees are also equivalent to color-flipping and rotations in red-black trees. This makes 2-3-4 trees an important tool for understanding the logic behind red-black trees, and this is why many introductory algorithm texts introduce 2-3-4 trees just before red-black trees, even though 2-3-4 trees are not often used in practice.
2008, Sedgewick introduced a simpler version of red-black trees called [http://www.cs.princeton.edu/~rs/talks/LLRB/RedBlack.pdf Left-Leaning Red-Black Trees] by eliminating a previously unspecified degree of freedom in the implementation. This page, however, describes the 1978 version.
Read-only operations on a red-black tree require no modification from those used for
binary search trees, because every red-black tree is a special case of a simple binary search tree. However, the immediate result of an insertion or removal may violate the properties of a red-black tree. Restoring the red-black properties requires a small number (O(log "n") or amortized O(1)) of color changes (which are very quick in practice) and no more than three tree rotations (two for insertion). Although insert and delete operations are complicated, their times remain O(log "n").
Insertion begins by adding the node as we would in a simple binary search tree and coloring it red. What happens next depends on the color of other nearby nodes. The term "uncle node" will be used to refer to the sibling of a node's parent, as in human family trees. Note that:
* Property 3 (All leaves, including "null"s, are black) always holds.
* Property 4 (Both children of every red node are black) is threatened only by adding a red node, repainting a black node red, or a rotation.
* Property 5 (All paths from any given node to its leaf nodes contain the same number of black nodes) is threatened only by adding a black node, repainting a red node black, or a rotation.
: "Note": The label N will be used to denote the node being inserted; P will denote N's parent node, G will denote N's grandparent, and U will denote N's uncle. Note that in between some cases, the roles and labels of the nodes are exchanged, but in each case, every label continues to represent the same node it represented at the beginning of the case. Any color shown in the diagram is either assumed in its case or implied by these assumptions.Each case will be demonstrated with example C code. The uncle and grandparent nodes can be found by these functions:
Case 1: The new node N is at the root of the tree. In this case, it is repainted black to satisfy Property 2 (The root is black). Since this adds one black node to every path at once, Property 5 (All paths from any given node to its leaf nodes contain the same number of black nodes) is not violated.
Case 2: The new node's parent P is black, so Property 4 (Both children of every red node are black) is not invalidated. In this case, the tree is still valid. Property 5 (All paths from any given node to its leaf nodes contain the same number of black nodes) is not threatened, because the new node N has two black leaf children, but because N is red, the paths through each of its children have the same number of black nodes as the path through the leaf it replaced, which was black, and so this property remains satisfied.
: "Note:" In the following cases it can be assumed that N has a grandparent node G, because its parent P is red, and if it were the root, it would be black. Thus, N also has an uncle node U, although it may be a leaf in cases 4 and 5.
Note that inserting is actually in-place, since all the calls above use
In a normal binary search tree, when deleting a node with two non-leaf children, we find either the maximum element in its left subtree or the minimum element in its right subtree, and move its value into the node being deleted (as shown here). We then delete the node we copied the value from, which must have less than two non-leaf children. Because merely copying a value does not violate any red-black properties, this reduces the problem of deleting to the problem of deleting a node with at most one non-leaf child. It does not matter whether this node is the node we originally wanted to delete or the node we copied the value from.
For the remainder of this discussion we can assume we are deleting a node with at most one non-leaf child, which we will call its child (if it has only leaf children, let one of them be its child).If we are deleting a red node, we can simply replace it with its child, which must be black. All paths through the deleted node will simply pass through one less red node, and both the deleted node's parent and child must be black, so properties 3 (All leaves, including "null"s, are black) and 4 (Both children of every red node are black) still hold.Another simple case is when the deleted node is black and its child is red. Simply removing a black node could break Properties 4 (Both children of every red node are black) and 5 (All paths from any given node to its leaf nodes contain the same number of black nodes), but if we repaint its child black, both of these properties are preserved.
The complex case is when both the node to be deleted and its child are black. We begin by replacing the node to be deleted with its child. We will call (or "label") this child (in its new position) N, and its sibling (its new parent's other child) S.In the diagrams below, we will also use P for N's new parent, SL for S's left child, and SR for S's right child (it can be shown that S cannot be a leaf).
: "Note": In between some cases, we exchange the roles and labels of the nodes, but in each case, every label continues to represent the same node it represented at the beginning of the case. Any color shown in the diagram is either assumed in its case or implied by these assumptions. White represents an unknown color (either red or black).
We will find the sibling using this function:
: "Note": In order that the tree remains well-defined, we need that every null leaf remains a leaf after all transformations (that it will not have any children). If the node we are deleting has a non-leaf (non-null) child N, it is easy to see that the property is satisfied. If, on the other hand, N would be a null leaf, it can be verified from the diagrams (or code) for all the cases that the property is satisfied as well.
We can perform the steps outlined above with the following code, where the function
n's place in the tree. For convenience, code in this section will assume that null leaves are represented by actual node objects rather than NULL (the code in the "Insertion" section works with either representation).
: "Note": If N is a null leaf and we do not want to represent null leaves as actual node objects, we can modify the algorithm by first calling delete_case1() on its parent (the node that we delete,
nin the code above) and deleting it afterwards. We can do this because the parent is black, so it behaves in the same way as a null leaf (and is sometimes called a 'phantom' leaf). And we can safely delete it at the end as
nwill remain a leaf after all operations, as shown above.
If both N and its original parent are black, then deleting this original parent causes paths which proceed through N to have one fewer black node than paths that do not. As this violates Property 5 (All paths from any given node to its leaf nodes contain the same number of black nodes), the tree must be rebalanced. There are several cases to consider:
Case 1: N is the new root. In this case, we are done. We removed one black node from every path, and the new root is black, so the properties are preserved.
: "Note": In cases 2, 5, and 6, we assume N is the left child of its parent P. If it is the right child, "left" and "right" should be reversed throughout these three cases. Again, the code examples take both cases into account.
Again, the function calls all use
tail recursion, so the algorithm is in-place.In the algorithm above, all cases are chained in order, except in delete case 3 where it can recurse to case 1 back to the parent node: this is the only case where an in-place implementation will effectively loop (after only one rotation in case 3).
Additionally, no tail recursion ever occurs on a child node, so the tail recursion loop can only move from a child back to its successive ancestors. No more than O(log "n") loops back to case 1 will occur (where "n" is the total number of nodes in the tree before deletion). If a rotation occurs in case 2 (which is the only possibility of rotation within the loop of cases 1–3), then the parent of the node N becomes red after the rotation and we will exit the loop. Therefore at most one rotation will occur within this loop. Since no more than two additional rotations will occur after exiting the loop, at most three rotations occur in total.
A sample development of the tail recursion loop optimization is shown below.
Proof of asymptotic bounds
A red black tree which contains "n" internal nodes has a height of O(log(n)).
*h("v") = height of subtree rooted at node "v"
*bh("v") = the number of black nodes (not counting "v" if it is black) from "v" to any leaf in the subtree (called the black-height).
Lemma: A subtree rooted at node "v" has at least internal nodes.
Proof of Lemma (by induction height):
Basis: h("v") = 0
If "v" has a height of zero then it must be "null", therefore bh("v") = 0. So:
Inductive Step: "v" such that h("v") = k, has at least internal nodes implies that such that h() = k+1 has at least internal nodes.
Since has h() > 0 it is an internal node. As such it has two children each of which has a black-height of either bh() or bh()-1 (depending on whether the child is red or black). By the inductive hypothesis each child has at least internal nodes, so has at least:
Using this lemma we can now show that the height of the tree is logarithmic. Since at least half of the nodes on any path from the root to a leaf are black (property 4 of a red black tree), the black-height of the root is at least h(root)/2. By the lemma we get:
Therefore the height of the root is O(log(n)).
In the tree code there is only one loop where the node of the root of the red-black property that we wish to restore, x, can be moved up the tree by one level at each iteration.
Since the original height of the tree is O(log n), there are O(log n) iterations. So overall the insert routine has O(log n) complexity.
* [http://mathworld.wolfram.com/Red-BlackTree.html Mathworld: Red-Black Tree]
* [http://www.eli.sdsu.edu/courses/fall95/cs660/notes/RedBlackTree/RedBlack.html#RTFToC2 San Diego State University: CS 660: Red-Black tree notes] , by Roger Whitney
Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. " Introduction to Algorithms", Second Edition. MIT Press and McGraw-Hill, 2001. ISBN 0-262-03293-7 . Chapter 13: Red-Black Trees, pp.273–301.
*cite web|last = Pfaff|first = Ben|title = Performance Analysis of BSTs in System Software| publisher = Stanford University|year = 2004|month = June|url = http://www.stanford.edu/~blp/papers/libavl.pdf|format = PDF
*cite web|last=Okasaki|first=Chris|title=Red-Black Trees in a Functional Setting|url=http://www.eecs.usma.edu/webs/people/okasaki/jfp99.ps|format=PS
* [http://people.ksp.sk/~kuko/bak/index.html Red Black Tree Applet] , a demo of red-black and many more search trees by Kubo Kovac
* [http://webpages.ull.es/users/jriera/Docencia/AVL/AVL%20tree%20applet.htm Red Black Tree Applet] , a demo of red-black trees, avl, rotation and many more.
* [http://www.ececs.uc.edu/~franco/C321/html/RedBlack/redblack.html Red/Black Tree Demonstration] , an interactive demo of insertion and removal with Java source code
* [http://www.ibr.cs.tu-bs.de/lehre/ss98/audii/applets/BST/RedBlackTree-Example.html Red-Black Tree Animation] , a demo of worst-case insertion
* [http://www.aisee.com/anim/maple.htm aiSee: Visualization of a Sorting Algorithm] , an animated GIF showing insertion (200KB)
* [http://geocities.com/dmh2000/articles/code/red-blacktree.html Red-Black Tree Demonstration] , a demo of insertion with Java source code by
David M. Howard
* [http://webpages.ull.es/users/jriera/Docencia/AVL/AVL%20tree%20applet.htm AVL Tree Applet] , an interactive demo of insertion to and removal from AVL, splay and red-black trees
* In the
C++ Standard Template Library, the containers
std::map<Key,Value>are often based on red-black trees
* [http://eternallyconfuzzled.com/tuts/datastructures/jsw_tut_rbtree.aspx Efficient implementation of Red-Black Trees]
* [http://efsa.sourceforge.net/archive/durian/red_black_tree.htm RBT: A SmallEiffel Red-Black Tree Library]
* [http://www.stanford.edu/~blp/avl/libavl.html/Red_002dBlack-Trees.html Red-Black Tree in GNU libavl C library by Ben Pfaff]
* [http://libredblack.sourceforge.net/ libredblack: A C Red-Black Tree Library]
* [http://web.mit.edu/~emin/www/source_code/red_black_tree/index.html Red-Black Tree C Code]
* [http://www.javaresearch.org/source/jdk142/java/util/TreeMap.java.html Red-Black Tree Java implementation in java.util.TreeMap]
* [http://dragonflybsd.org DragonFlyBSD VM subsystems utilize Red-Black trees]
* [http://scctoolkit.atspace.com NATURAL/ADABAS implementation by Paul Macgowan]
* [http://fxr.watson.org/fxr/source//sys/tree.h FreeBSD's single header file implementation]
* [http://www.ibm.com/developerworks/linux/library/l-cfs/?ca=dgr-lnxw04CFC4Linux The default scheduler of Linux 2.6.23, called Completely Fair Scheduler, is implemented via a Red-Black tree]
* [http://datadraw.svn.sourceforge.net/viewvc/datadraw/trunk/examples DataDraw has benchmarks of Red-Black trees vs Treaps]
Wikimedia Foundation. 2010.