- Sample mean and sample covariance
Sample mean and sample covariance are statistics computed from a collection of data, thought of as being random. Given a

random sample $extstyle\; mathbf\{x\}\_\{1\},ldots,mathbf\{x\}\_\{N\}$ from an $extstyle\; n$-dimension alrandom variable $extstyle\; mathbf\{X\}$ (i.e., realizations of $extstyle\; N$ independent random variables with the same distribution as $extstyle\; mathbf\{X\}$), thesample mean is:$mathbf\{ar\{x=frac\{1\}\{N\}sum\_\{k=1\}^\{N\}mathbf\{x\}\_\{k\}.$

In coordinates, writing the vectors as columns,

:$mathbf\{x\}\_\{k\}=left\; [\; egin\{array\}\; [c]\; \{c\}x\_\{1k\}\backslash \; vdots\backslash \; x\_\{nk\}end\{array\}\; ight]\; ,quadmathbf\{ar\{x=left\; [\; egin\{array\}\; [c]\; \{c\}ar\{x\}\_\{1\}\backslash \; vdots\backslash \; ar\{x\}\_\{n\}end\{array\}\; ight]\; ,$

the entries of the sample mean are

:$ar\{x\}\_\{i\}=frac\{1\}\{N\}sum\_\{k=1\}^\{N\}x\_\{ik\},quad\; i=1,ldots,n.$

The sample covariance of $extstyle\; mathbf\{x\}\_\{1\},ldots,mathbf\{x\}\_\{N\}$ is the $extstyle\; n$ by $extstyle\; n$ matrix $extstyle\; mathbf\{Q\}=left\; [\; q\_\{ij\}\; ight]$ with the entries given by

:$q\_\{ij\}=frac\{1\}\{N-1\}sum\_\{k=1\}^\{N\}left(\; x\_\{ik\}-ar\{x\}\_\{i\}\; ight)\; left(\; x\_\{jk\}-ar\{x\}\_\{j\}\; ight)$

The sample mean and the sample covariance matrix are unbiased estimates of the

mean and thecovariance matrix of therandom variable $extstyle\; mathbf\{X\}$. The reason why the sample covariance matrix has $extstyle\; N-1$ in the denominator rather than $extstyle\; N$ is essentially that the population mean $E(X)$ is not known and is replaced by the sample mean $extstylear\{x\}$. If the population mean $E(X)$ is known, the analogous unbiased estimate:$q\_\{ij\}=frac\{1\}\{N\}sum\_\{k=1\}^\{N\}left(\; x\_\{ik\}-E(X\_i)\; ight)\; left(\; x\_\{jk\}-E(X\_j)\; ight)$

with the population mean indeed does have $extstyle\; N$. This is an example why in probability and statistics it is essential to distinguish between upper case letters (

random variable s) and lower case letters (realizations of the random variables).The

maximum likelihood estimate of the covariance:$q\_\{ij\}=frac\{1\}\{N\}sum\_\{k=1\}^\{N\}left(\; x\_\{ik\}-ar\{x\}\_\{i\}\; ight)\; left(\; x\_\{jk\}-ar\{x\}\_\{j\}\; ight)$

for the

Gaussian distribution case has $extstyle\; N$ as well. The difference of course diminishes for large $extstyle\; N$.**Weighted samples**In a weighted sample, each vector $extstyle\; extbf\{x\}\_\{k\}$ is assigned a weight $extstyle\; w\_\{k\}geq0$. Without loss of generality, assume that the weights are normalized:

:$sum\_\{k=1\}^\{N\}w\_\{k\}=1.$

(If they are not, divide the weights by their sum.)Then the

weighted mean $extstyle\; mathbf\{ar\{x$ and the weighted covariance matrix $extstyle\; mathbf\{Q\}=left\; [\; q\_\{ij\}\; ight]$ are given by:$mathbf\{ar\{x=sum\_\{k=1\}^\{N\}w\_\{k\}mathbf\{x\}\_\{k\}$

:$q\_\{ij\}=frac\{sum\_\{k=1\}^\{N\}w\_\{k\}left(\; x\_\{ik\}-ar\{x\}\_\{i\}\; ight)\; left(\; x\_\{jk\}-ar\{x\}\_\{j\}\; ight)\; \}\{1-sum\_\{k=1\}^\{N\}w\_\{k\}^\{2.$

If all weights are the same, $extstyle\; w\_\{k\}=1/N$, the weighted mean and covariance reduce to the sample mean and covariance above.

Unbiased estimation of standard deviation

Estimation of covariance matrices

Scatter matrix

Arithmetic mean

Estimation theory

Linear regression

Weighted least squares

Weighted mean

Standard error (statistics)

