- VC dimension
In
computational learning theory , the VC dimension (for Vapnik-Chervonenkis dimension) is a measure of thecapacity of astatistical classification algorithm , defined as thecardinality of the largest set of points that the algorithm can shatter. It is a core concept inVapnik-Chervonenkis theory , and was originally defined byVladimir Vapnik andAlexey Chervonenkis .Informally, the capacity of a classification model is related to how complicated it can be. For example, consider the thresholding of a high-degree
polynomial : if the polynomial evaluates above zero, that point is classified as positive, otherwise as negative. A high-degree polynomial can be wiggly, so it can fit a given set of training points well. But one can expect that the classifier will make errors on other points, because it is too wiggly. Such a polynomial has a high capacity. A much simpler alternative is to threshold a linear function. This polynomial may not fit the training set well, because it has a low capacity. We make this notion of capacity more rigorous below.Shattering
A classification model with some parameter vector is said to "shatter" a set of data points () if, for all assignments of labels to those points, there exists a such that the model makes no errors when evaluating that set of data points.
VC dimension of a model is where is the maximum such that some data point set of
cardinality can be shattered by .For example, consider a straight line as the classification model: the model used by a
perceptron . The line should separate positive data points from negative data points. When there are 3 points that are not collinear, the line can shatter them. However, the line cannot shatter four points. Thus, the VC dimension of this particular classifier is 3. It is important to remember that one can choose the arrangement of points, but then cannot change it as the labels on the points are permuted. Note, only 3 of the 8 possible permutations are shown for the 3 points.Uses
The VC dimension has utility in statistical learning theory, because it can predict a
probabilistic upper bound on the test error of a classification model.The bound on the test error of a classification model (on data that is drawn i.i.d. from the same distribution as the training set) is given by
:Training error +
with probability , where is the VC dimension of the classification model, and is the size of the training set (restriction: this formula is valid when the VC dimension is small
Wikimedia Foundation. 2010.