- Shattering
The concept of shattering of a set of points plays an important role in
Vapnik-Chervonenkis theory , also known as VC-theory. Shattering and VC-theory are used in the study ofempirical process es as well as in statisticalcomputational learning theory .Definition
Suppose we have a class "C" of sets and a given set "A". "C" is said to "shatter" "A" if, for each subset "T" of "A", there is some element "U" of "C" such that
:
Equivalently, "C" shatters "A" when the
power set "P"("A") is the set { "U" ∩ "A" | "U" ∈ "C" }.For example, the class "C" of all discs in the plane (two-dimensional space) cannot shatter every set "A" of four points, yet the class of all
convex set s in the plane shatters every finite set on the (unit) circle. (For the collection of all convex sets, connect the dots! )We employ the letter "C" to refer to a "class" or "collection" of sets, as in a Vapnik-Chervonenkis class (VC-class). The set "A" is often assumed to be finite because, in empirical processes, we are interested in the shattering of finite sets of data points.
hatter coefficient
To quantify the richness of a collection of sets, we use the concept of "shattering coefficients" (also known as "shatter coefficients" or the "growth function"). For a collection of sets Ω, Ω being any space, often a
probability space , we definethe "n"th "shattering coefficient" of as:
where "card" denotes the
cardinality , that is the number of elements of a set.is equal to the largest number of subsets of any set of "n" points that can be formed by intersecting with the sets in collection .
It is obvious that :1. for all "n". :2. If , that means there is a set of cardinality "n", which can be shattered by "C". :3. If for some then for all The third property means that if "C" cannot shatter any set of cardinality "N" then it cannot shatter sets of larger cardinalities.
Vapnik-Chervonenkis class
The
VC dimension of a class "C" is defined as:or, alternatively, as:Note that .
If for any "n" there is a set of cardinality "n" which can be shattered by "C", then for all "n" and the VC dimension of this class "C" is infinite.
A class with finite VC dimension is called a "Vapnik-Chervonenkis class" or "VC class". A class "C" is uniformly Glivenko-Cantelli if and only if it is a VC class.
References
*Wencour, R.S., R.M. Dudley (1981), "Some special Vapnik-Chervonenkis classes", Discrete Math, 33, 1981, 313-318.
*Steele, J.M. (1975), "Combinatorial Entropy and Uniform Limit Laws", Stanford University Ph.d Thesis (Mathematics)
*Steele, J.M. (1978), "Empirical discrepancies and subadditive processes", Annals of Probability, 6, 118--227.
Wikimedia Foundation. 2010.