- Empirical measure
In
probability theory , an empirical measure is arandom measure arising from a particular realization of a (usually finite) sequence ofrandom variable s. The precise definition is found below. Empirical measures are relevant tomathematical statistics .The motivation for studying empirical measures is that it is often impossible to know the true underlying
probability measure P. We collect observations X_1, X_2, dots , X_n and computerelative frequencies . We can estimate P, or a related distribution function F by means of the empirical measure or empirical distribution function, respectively. These are uniformly good estimates under certain conditions. Theorems in the area ofempirical process es provide rates of this convergence.Definition
Let X_1, X_2, dots be a sequence of independent identically distributed
random variable s with values in the state space "S" withprobability measure "P".Definition :The "empirical measure" P_n is defined for measurable subsets of "S" and given by::P_n(A) = {1 over n} sum_{i=1}^n I_A(X_i)=frac{1}{n}sum_{i=1}^n delta_{X_i}(A):where I_A is the
indicator function and delta_X is theDirac measure .For a fixed measurable set "A", nP_n(A) is a binomial random variable with mean "nP(A)" and variance "nP(A)(1-P(A))". In particular, P_n(A) is an unbiased estimator of "P(A)".
Definition:igl(P_n(c)igr)_{cinmathcal{C is the "empirical measure" indexed by mathcal{C}, a collection of measurable subsets of "S".
To generalize this notion further, observe that the empirical measure P_n maps
measurable function s f:S o mathbb{R} to their "empirical mean ",:fmapsto P_n f=int_S fdP_n=frac{1}{n}sum_{i=1}^n f(X_i)
In particular, the empirical measure of "A" is simply the empirical mean of the indicator function, P_n(A)=P_n I_A.
For a fixed measurable function "f", P_nf is a random variable with mean mathbb{E}f and variance frac{1}{n}mathbb{E}(f -mathbb{E} f)^2.
By the strong
law of large numbers , P_n(A) converges to "P(A)"almost surely for fixed "A". Similarly P_nf converges to mathbb{E} f almost surely for a fixed measurable function "f". The problem of uniform convergence of P_n to "P" was open untilVapnik andChervonenkis solved it in 1968.If the class mathcal{C} (or mathcal{F}) is Glivenko-Cantelli with respect to "P" then P_n converges to "P" uniformly over cinmathcal{C} (or fin mathcal{F}). In other words, with probability 1 we have:P_n-P|_mathcal{C}=sup_{cinmathcal{C|P_n(c)-P(c)| o 0,:P_n-P|_mathcal{F}=sup_{finmathcal{F|P_nf-mathbb{E}f| o 0.
Empirical distribution function
The "empirical distribution function" provides an example of empirical measures. For real-valued
iid random variables X_1,dots,X_n it is given by:F_n(x)=P_n((-infty,x] )=P_nI_{(-infty,x] }.
In this case, empirical measures are indexed by a class mathcal{C}={(-infty,x] :xinmathbb{R}}. It has been shown that mathcal{C} is a uniform
Glivenko-Cantelli class , in particular,:sup_F|F_n(x)-F(x)|_infty o 0
with probability 1.
ee also
*
Empirical process
*Poisson random measure References
* P. Billingsley, Probability and Measure, John Wiley and Sons, New York, third edition, 1995.
* M.D. Donsker, Justification and extension of Doob's heuristic approach to the Kolmogorov-Smirnov theorems, Annals of Mathematical Statistics, 23:277--281, 1952.
* R.M. Dudley, Central limit theorems for empirical measures, Annals of Probability, 6(6): 899–929, 1978.
* R.M. Dudley, Uniform Central Limit Theorems, Cambridge Studies in Advanced Mathematics, 63, Cambridge University Press, Cambridge, UK, 1999.
* J. Wolfowitz, Generalization of the theorem of Glivenko-Cantelli. Annals of Mathematical Statistics, 25, 131-138, 1954.
Wikimedia Foundation. 2010.