Canopy clustering algorithm
- Canopy clustering algorithm
The canopy clustering algorithm is an unsupervised clustering algorithm related to the K-means algorithm.
It is intended to speed up clustering operations on large data sets, where using another algorithm directly may be impractical because of the size of the data set.
The algorithm proceeds as follows:
* Cheaply partition the data into overlapping subsets, called 'canopies'
* Perform more expensive clustering, but only within these canopies
Benefits
* The number of instances of training data that must be compared at each step is reduced
* There is some evidence that the resulting clusters are improved
References
[http://www.kamalnigam.com/papers/canopy-kdd00.pdf McCallum, Nigamy and Ungar: "Efficient Clustering of High Dimensional Data Sets with Application to Reference Matching"]
External links
* [http://www.youtube.com/watch?v=1ZDybXl212Q Cluster Computing and MapReduce Lecture 4] from Google
ee also
* Data clustering
* K-means algorithm
* Linde-Buzo-Gray algorithm
Wikimedia Foundation.
2010.
Look at other dictionaries:
k-means clustering — In statistics and data mining, k means clustering is a method of cluster analysis which aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean. This results into a partitioning of… … Wikipedia
Linde-Buzo-Gray algorithm — The Linde Buzo Gray algorithm is a vector quantization algorithm to derive a good codebook.It is similar to the k means method in data clustering.The algorithm At each iteration, each vector is split into two new vectors.*A initial state:… … Wikipedia
List of mathematics articles (C) — NOTOC C C closed subgroup C minimal theory C normal subgroup C number C semiring C space C symmetry C* algebra C0 semigroup CA group Cabal (set theory) Cabibbo Kobayashi Maskawa matrix Cabinet projection Cable knot Cabri Geometry Cabtaxi number… … Wikipedia
Cluster analysis — The result of a cluster analysis shown as the coloring of the squares into three clusters. Cluster analysis or clustering is the task of assigning a set of objects into groups (called clusters) so that the objects in the same cluster are more… … Wikipedia