- Kd-tree
In
computer science , a "k"d-tree (short for "k-dimensional tree") is a space-partitioningdata structure for organizing points in a "k"-dimensional space. "k"d-trees are a useful data structure for several applications, such as searches involving a multidimensional search key (e.g. range searches andnearest neighbour search es). "k"d-trees are a special case ofBSP tree s.Operations on "k"d-trees
Constructing a "k"d-tree
Since there are many possible ways to choose axis-aligned splitting planes, there are many different ways to construct "k"d-trees. The canonical method of "k"d-tree construction has the following constraints:
* As one moves down the tree, one cycles through the axes used to select the splitting planes. (For example, the root would have an "x"-aligned plane, the root's children would both have "y"-aligned planes, the root's grandchildren would all have "z"-aligned planes, and so on.)
* At each step, the point selected to create the splitting plane is themedian of the points being put into the "k"d-tree, with respect to their coordinates in the axis being used. (Note the assumption that we feed the entire set of points into the algorithm up-front.)This method leads to a balanced "k"d-tree, in which each leaf node is about the same distance from the root. However, balanced trees are not necessarily optimal for all applications.Note also that it is not "required" to select the median point. In that case, the result is simply that there is no guarantee that the tree will be balanced. A simple heuristic to avoid coding a complex linear-time median-finding algorithm nor using an O("n" log "n") sort is to use sort to find the median of a "fixed" number of "randomly" selected points to serve as the cut line. Practically this technique results in nicely balanced trees.
Given a list of "n" points, the following
algorithm will construct a balanced "k"d-tree containing those points.function kdtree ("list of points" pointList, "int" depth) { if pointList is empty return nil; else { "// Select axis based on depth so that axis cycles through all valid values" var "int" axis := depth mod k; "// Sort point list and choose median as pivot element" select median from pointList; "// Create node and construct subtrees" var "tree_node" node; node.location := median; node.leftChild := kdtree(points in pointList before median, depth+1); node.rightChild := kdtree(points in pointList after median, depth+1); return node; } }It is common that points "after" the median include only ones that are greater than or equal to the median. Another approach is to define a "superkey" function that compares the points in other dimensions. Lastly, it may be acceptable to let points equal to the median lie on either side.
This algorithm implemented in the Python programming language is as follows:
def kdtree(pointList, depth=0): if not pointList: return
# Select axis based on depth so that axis cycles through all valid values k = len(pointList [0] ) # assumes all points have the same dimension axis = depth % k
# Sort point list and choose median as pivot element pointList.sort(key=lambda point: point [axis] ) median = len(pointList)/2 # choose median
# Create node and construct subtrees node = Node() node.location = pointList [median] node.leftChild = kdtree(pointList [0:median] , depth+1) node.rightChild = kdtree(pointList [median+1:] , depth+1) return node
Example usage would be:
pointList = [(2,3), (5,4), (9,6), (4,7), (8,1), (7,2)] tree = kdtree(pointList)
The tree generated is shown on the right.
This algorithm creates the invariant that for any node, all the nodes in the left
subtree are on one side of a splitting plane, and all the nodes in the right subtree are on the other side. Points that lie on the splitting plane may appear on either side. The splitting plane of a node goes through the point associated with that node (referred to in the code as "node.location").Adding elements to a "k"d-tree
One adds a new point to a "k"d-tree in the same way as one adds an element to any other tree. First, traverse the tree, starting from the root and moving to either the left or the right child depending on whether the point to be inserted is on the "left" or "right" side of the splitting plane. Once you get to a leaf node, add the new point as either the left or right child of the leaf node, again depending on which side of the node's splitting plane contains the new point.
Removing elements from a "k"d-tree
Remove a point from an existing "k"d-tree, without breaking the invariant.Easiest way is to get all nodes/leaves, and recreate that part of tree, you have to save axis/pivot information in node.
Balancing a "k"d-tree
Balancing a "k"d-tree requires care. Because "k"d-trees are sorted in multiple dimensions, the
tree rotation technique cannot be used to balance them — this may break the invariant.Nearest neighbor in a "k"d-tree
The nearest neighbor (NN) algorithm, to find the NN to a given target point not in the tree, relies on the ability to discard large portions of the tree by performing a simple test. To perform the NN calculation, the tree is searched in a depth-first fashion and at each stage it makes an approximation to the nearest distance. When the algorithm decides that there cannot possibly be a closer point it terminates, giving the nearest neighbour.
First the root node is examined with an initial assumption that the smallest distance to the next point is infinite. The subdomain (right or left), which is a
hyperrectangle (in 3D space this is a rectangular prism), containing the target point is searched. This is done recursively until a final minimum region containing the node is found. The algorithm then (through recursion) examines each parent node, seeing if it is possible for the other domain to contain a point that is closer. This is performed by testing for the possibility of intersection between the hyperrectangle and thehypersphere (a plain sphere in 3D) formed by target node and distance to the current best NN estimate. If the rectangle that has not been recursively examined yet does not intersect this sphere, then there is no way that the rectangle can contain a point that is a better nearest neighbour. This is repeated until all domains are either searched or discarded, thus leaving the nearest neighbour as the final result. In addition the algorithm not only provides the NN, but also the square of the distance to the NN.Finding the nearest point is an O(log N) operation. Analyses of binary search trees has found that the worst case search time for an k-dimensional KD tree containing M nodes is given by the following equationcitation
last1 = Lee | first1 = D. T.
last2 = Wong | first2 = C. K.
year = 1977
title = Worst-case analysis for region and partial region searches in multidimensional binary search trees and balanced quad trees
journal = Acta Informatica
volume = 9
issue = 1
pages = 23–29
doi = 10.1007/BF00263763] .t_{worst} = O(k cdot M^{1-frac{1}{k)
The algorithm can be extended in several ways by simple modifications. For example, approximate nearest neighbour searching can be achieved by simply setting an upper bound on the number points to examine in the tree, or by interrupting the search process based upon a real time clock (which may be more appropriate in hardware implementations). Nearest neighbour for points that are in the tree already can be achieved by not updating the refinement for nodes that give zero distance as the result, this has the downside of discarding points that are not unique, but are co-located with the original search point.
Approximate nearest neighbor is useful in real time applications such as robotics due to the significant speed increased gained by not searching for the best point exhaustively. One of its implementations is
Best Bin First .High-Dimensional Data
"k"d-trees are not suitable for efficiently finding the nearest neighbour in high dimensional spaces. As a general rule, if the dimensionality is "D", then number of points in the data, "N", should be "N >> 2D". Otherwise, when "k"d-trees are used with high-dimensional data, most of the points in the tree will be evaluated and the efficiency is no better than exhaustive search. The problem of finding NN in high-dimensional data is thought to be
NP-hard . [Piotr Indyk. Nearest neighbors in high-dimensional spaces. Handbook of Discrete and Computational Geometry, chapter 39. Editors: Jacob E. Goodman and Joseph O'Rourke, CRC Press, 2nd edition, 2004.] , and approximate nearest-neighbour methods are used instead.Complexity
* Building a static "k"d-tree from "n" points takes O("n" log 2 "n") time if an O("n" log "n") sort is used to compute the median at each level. The complexity is O("n" log "n") if a linear median-finding algorithm such as the one described in Cormen "et al" [Introduction to Algorithms Chapter 10.] is used.
* Inserting a new point into a balanced "k"d-tree takes O(log "n") time.
* Removing a point from a balanced "k"d-tree takes O(log "n") time.
* Querying an axis-parallel range in a balanced "k"d-tree takes O("n"1-1/d +"k") time, where k is the number of the reported points, and d the dimension of the "k"d-tree.Variations
Instead of points
Instead of points, a "k"d-tree can also contain
rectangle s or hyperrectangles. A 2D rectangle is considered a 4D object (xlow, xhigh, ylow, yhigh). Thus range search becomes the problem of returning all rectangles intersecting the search rectangle. The tree is constructed the usual way with all the rectangles at the leaves. In anorthogonal range search , the "opposite" coordinate is used when comparing against the median. For example, if the current level is split along xhigh, we check the xlow coordinate of the search rectangle. If the median is less than the xlow coordinate of the search rectangle, then no rectangle in the left branch can ever intersect with the search rectangle and so can be pruned. Otherwise both branches should be traversed. See alsointerval tree , which is a 1-dimensional special case.ee also
* implicit "k"d-tree
* min/max "k"d-tree
*Octree
*Bounding Interval Hierarchy
*Nearest neighbor search
*Klee's measure problem
* [http://libkdtree.alioth.debian.org libkdtree++] , an open-source STL-like implementation of "k"d-trees in C++.References
* Bentley, J. L. 1975. [http://portal.acm.org/citation.cfm?id=361007 Multidimensional binary search trees used for associative searching] . Commun. ACM 18, 9 (Sep. 1975), 509–517.
* Bentley, J. L. 1990. [http://doi.acm.org/10.1145/98524.98564 K-d Trees for Semidynamic Point Sets] . SCG '90: Proc. 6th Annual Symposium on Computational Geometry (1990), 187–197
* H. Samet, "The Design and Analysis of Spatial Data Structures", Addison-Wesley, Reading, MA, 1990.
* Section 5.2: Kd-Trees, pp.99–105.
Wikimedia Foundation. 2010.