- Sample mean and sample covariance
**Sample mean**and**sample covariance**arestatistic s computed from a collection of data, thought of as being random.**ample mean and covariance**Given a

random sample $extstyle\; mathbf\{x\}\_\{1\},ldots,mathbf\{x\}\_\{N\}$ from an $extstyle\; n$-dimension alrandom variable $extstyle\; mathbf\{X\}$ (i.e., realizations of $extstyle\; N$ independent random variables with the same distribution as $extstyle\; mathbf\{X\}$), thesample mean is:$mathbf\{ar\{x=frac\{1\}\{N\}sum\_\{k=1\}^\{N\}mathbf\{x\}\_\{k\}.$

In coordinates, writing the vectors as columns,

:$mathbf\{x\}\_\{k\}=left\; [\; egin\{array\}\; [c]\; \{c\}x\_\{1k\}\backslash \; vdots\backslash \; x\_\{nk\}end\{array\}\; ight]\; ,quadmathbf\{ar\{x=left\; [\; egin\{array\}\; [c]\; \{c\}ar\{x\}\_\{1\}\backslash \; vdots\backslash \; ar\{x\}\_\{n\}end\{array\}\; ight]\; ,$

the entries of the sample mean are

:$ar\{x\}\_\{i\}=frac\{1\}\{N\}sum\_\{k=1\}^\{N\}x\_\{ik\},quad\; i=1,ldots,n.$

The sample covariance of $extstyle\; mathbf\{x\}\_\{1\},ldots,mathbf\{x\}\_\{N\}$ is the $extstyle\; n$ by $extstyle\; n$ matrix $extstyle\; mathbf\{Q\}=left\; [\; q\_\{ij\}\; ight]$ with the entries given by

:$q\_\{ij\}=frac\{1\}\{N-1\}sum\_\{k=1\}^\{N\}left(\; x\_\{ik\}-ar\{x\}\_\{i\}\; ight)\; left(\; x\_\{jk\}-ar\{x\}\_\{j\}\; ight)$

The sample mean and the sample covariance matrix are unbiased estimates of the

mean and thecovariance matrix of therandom variable $extstyle\; mathbf\{X\}$. The reason why the sample covariance matrix has $extstyle\; N-1$ in the denominator rather than $extstyle\; N$ is essentially that the population mean $E(X)$ is not known and is replaced by the sample mean $extstylear\{x\}$. If the population mean $E(X)$ is known, the analogous unbiased estimate:$q\_\{ij\}=frac\{1\}\{N\}sum\_\{k=1\}^\{N\}left(\; x\_\{ik\}-E(X\_i)\; ight)\; left(\; x\_\{jk\}-E(X\_j)\; ight)$

with the population mean indeed does have $extstyle\; N$. This is an example why in probability and statistics it is essential to distinguish between upper case letters (

random variable s) and lower case letters (realizations of the random variables).The

maximum likelihood estimate of the covariance:$q\_\{ij\}=frac\{1\}\{N\}sum\_\{k=1\}^\{N\}left(\; x\_\{ik\}-ar\{x\}\_\{i\}\; ight)\; left(\; x\_\{jk\}-ar\{x\}\_\{j\}\; ight)$

for the

Gaussian distribution case has $extstyle\; N$ as well. The difference of course diminishes for large $extstyle\; N$.**Weighted samples**In a weighted sample, each vector $extstyle\; extbf\{x\}\_\{k\}$ is assigned a weight $extstyle\; w\_\{k\}geq0$. Without loss of generality, assume that the weights are normalized:

:$sum\_\{k=1\}^\{N\}w\_\{k\}=1.$

(If they are not, divide the weights by their sum.)Then the

weighted mean $extstyle\; mathbf\{ar\{x$ and the weighted covariance matrix $extstyle\; mathbf\{Q\}=left\; [\; q\_\{ij\}\; ight]$ are given by:$mathbf\{ar\{x=sum\_\{k=1\}^\{N\}w\_\{k\}mathbf\{x\}\_\{k\}$

:$q\_\{ij\}=frac\{sum\_\{k=1\}^\{N\}w\_\{k\}left(\; x\_\{ik\}-ar\{x\}\_\{i\}\; ight)\; left(\; x\_\{jk\}-ar\{x\}\_\{j\}\; ight)\; \}\{1-sum\_\{k=1\}^\{N\}w\_\{k\}^\{2.$

If all weights are the same, $extstyle\; w\_\{k\}=1/N$, the weighted mean and covariance reduce to the sample mean and covariance above.

**References****ee also***

Unbiased estimation of standard deviation

*Estimation of covariance matrices

*Scatter matrix

*Arithmetic mean

*Estimation theory

*Linear regression

*Weighted least squares

*Weighted mean

*Standard error (statistics)

*Wikimedia Foundation.
2010.*