- Cluster assumption
-
The cluster assumption is a type of data modeling used in machine learning specifically in Supervised learning and Semi-supervised learning. It states that if points are in the same cluster, they are likely to be of the same class.[1] There may be multiple clusters forming a single class.
Introduction
The cluster assumption is implicitly assumed in many machine learning algorithms such as the K-nearest neighbor classification algorithm and the K-means clustering algorithm. As the word "likely" appears in the definition, there is no clear border differentiating whether the assumption does hold or does not hold. In contrast the amount of adherence of data to this assumption can be quantitatively measured.
Properties
The cluster assumption is equivalent to the Low density separation assumption which states that the decision boundary should lie on a low-density region. To prove this, suppose the decision boundary crosses one of the clusters. Then this cluster will contain points from two different classes, therefore it is violated on this cluster.
Notes
- ^ O. Chapelle and B. Schölkopf and A. Zien, Semi-Supervised Learning, MIT Press, 2006
Categories:
Wikimedia Foundation. 2010.