- Sample mean and sample covariance
Sample mean and sample covariance are
statistic s computed from a collection of data, thought of as being random.ample mean and covariance
Given a
random sample extstyle mathbf{x}_{1},ldots,mathbf{x}_{N} from an extstyle n-dimension alrandom variable extstyle mathbf{X} (i.e., realizations of extstyle N independent random variables with the same distribution as extstyle mathbf{X}), thesample mean is:mathbf{ar{x=frac{1}{N}sum_{k=1}^{N}mathbf{x}_{k}.
In coordinates, writing the vectors as columns,
:mathbf{x}_{k}=left [ egin{array} [c] {c}x_{1k}\ vdots\ x_{nk}end{array} ight] ,quadmathbf{ar{x=left [ egin{array} [c] {c}ar{x}_{1}\ vdots\ ar{x}_{n}end{array} ight] ,
the entries of the sample mean are
:ar{x}_{i}=frac{1}{N}sum_{k=1}^{N}x_{ik},quad i=1,ldots,n.
The sample covariance of extstyle mathbf{x}_{1},ldots,mathbf{x}_{N} is the extstyle n by extstyle n matrix extstyle mathbf{Q}=left [ q_{ij} ight] with the entries given by
:q_{ij}=frac{1}{N-1}sum_{k=1}^{N}left( x_{ik}-ar{x}_{i} ight) left( x_{jk}-ar{x}_{j} ight)
The sample mean and the sample covariance matrix are unbiased estimates of the
mean and thecovariance matrix of therandom variable extstyle mathbf{X}. The reason why the sample covariance matrix has extstyle N-1 in the denominator rather than extstyle N is essentially that the population mean E(X) is not known and is replaced by the sample mean extstylear{x}. If the population mean E(X) is known, the analogous unbiased estimate:q_{ij}=frac{1}{N}sum_{k=1}^{N}left( x_{ik}-E(X_i) ight) left( x_{jk}-E(X_j) ight)
with the population mean indeed does have extstyle N. This is an example why in probability and statistics it is essential to distinguish between upper case letters (
random variable s) and lower case letters (realizations of the random variables).The
maximum likelihood estimate of the covariance:q_{ij}=frac{1}{N}sum_{k=1}^{N}left( x_{ik}-ar{x}_{i} ight) left( x_{jk}-ar{x}_{j} ight)
for the
Gaussian distribution case has extstyle N as well. The difference of course diminishes for large extstyle N.Weighted samples
In a weighted sample, each vector extstyle extbf{x}_{k} is assigned a weight extstyle w_{k}geq0. Without loss of generality, assume that the weights are normalized:
:sum_{k=1}^{N}w_{k}=1.
(If they are not, divide the weights by their sum.)Then the
weighted mean extstyle mathbf{ar{x and the weighted covariance matrix extstyle mathbf{Q}=left [ q_{ij} ight] are given by:mathbf{ar{x=sum_{k=1}^{N}w_{k}mathbf{x}_{k}
:q_{ij}=frac{sum_{k=1}^{N}w_{k}left( x_{ik}-ar{x}_{i} ight) left( x_{jk}-ar{x}_{j} ight) }{1-sum_{k=1}^{N}w_{k}^{2.
If all weights are the same, extstyle w_{k}=1/N, the weighted mean and covariance reduce to the sample mean and covariance above.
References
ee also
*
Unbiased estimation of standard deviation
*Estimation of covariance matrices
*Scatter matrix
*Arithmetic mean
*Estimation theory
*Linear regression
*Weighted least squares
*Weighted mean
*Standard error (statistics)
Wikimedia Foundation. 2010.