Sample mean and sample covariance

Sample mean and sample covariance

Sample mean and sample covariance are statistics computed from a collection of data, thought of as being random.

ample mean and covariance

Given a random sample extstyle mathbf{x}_{1},ldots,mathbf{x}_{N} from an extstyle n-dimensional random variable extstyle mathbf{X} (i.e., realizations of extstyle N independent random variables with the same distribution as extstyle mathbf{X}), the sample mean is

: mathbf{ar{x=frac{1}{N}sum_{k=1}^{N}mathbf{x}_{k}.

In coordinates, writing the vectors as columns,

: mathbf{x}_{k}=left [ egin{array} [c] {c}x_{1k}\ vdots\ x_{nk}end{array} ight] ,quadmathbf{ar{x=left [ egin{array} [c] {c}ar{x}_{1}\ vdots\ ar{x}_{n}end{array} ight] ,

the entries of the sample mean are

: ar{x}_{i}=frac{1}{N}sum_{k=1}^{N}x_{ik},quad i=1,ldots,n.

The sample covariance of extstyle mathbf{x}_{1},ldots,mathbf{x}_{N} is the extstyle n by extstyle n matrix extstyle mathbf{Q}=left [ q_{ij} ight] with the entries given by

: q_{ij}=frac{1}{N-1}sum_{k=1}^{N}left( x_{ik}-ar{x}_{i} ight) left( x_{jk}-ar{x}_{j} ight)

The sample mean and the sample covariance matrix are unbiased estimates of the mean and the covariance matrix of the random variable extstyle mathbf{X}. The reason why the sample covariance matrix has extstyle N-1 in the denominator rather than extstyle N is essentially that the population mean E(X) is not known and is replaced by the sample mean extstylear{x}. If the population mean E(X) is known, the analogous unbiased estimate

: q_{ij}=frac{1}{N}sum_{k=1}^{N}left( x_{ik}-E(X_i) ight) left( x_{jk}-E(X_j) ight)

with the population mean indeed does have extstyle N. This is an example why in probability and statistics it is essential to distinguish between upper case letters (random variables) and lower case letters (realizations of the random variables).

The maximum likelihood estimate of the covariance

: q_{ij}=frac{1}{N}sum_{k=1}^{N}left( x_{ik}-ar{x}_{i} ight) left( x_{jk}-ar{x}_{j} ight)

for the Gaussian distribution case has extstyle N as well. The difference of course diminishes for large extstyle N.

Weighted samples

In a weighted sample, each vector extstyle extbf{x}_{k} is assigned a weight extstyle w_{k}geq0. Without loss of generality, assume that the weights are normalized:

: sum_{k=1}^{N}w_{k}=1.

(If they are not, divide the weights by their sum.)Then the weighted mean extstyle mathbf{ar{x and the weighted covariance matrix extstyle mathbf{Q}=left [ q_{ij} ight] are given by

: mathbf{ar{x=sum_{k=1}^{N}w_{k}mathbf{x}_{k}

and Mark Galassi, Jim Davies, James Theiler, Brian Gough, Gerard Jungman, Michael Booth, and Fabrice Rossi. [http://www.gnu.org/software/gsl/manual GNU Scientific Library - Reference manual, Version 1.9] , 2007. [http://www.gnu.org/software/gsl/manual/html_node/Weighted-Samples.html Sec. 20.6 Weighted Samples] ]

: q_{ij}=frac{sum_{k=1}^{N}w_{k}left( x_{ik}-ar{x}_{i} ight) left( x_{jk}-ar{x}_{j} ight) }{1-sum_{k=1}^{N}w_{k}^{2.

If all weights are the same, extstyle w_{k}=1/N, the weighted mean and covariance reduce to the sample mean and covariance above.

References

ee also

*Unbiased estimation of standard deviation
*Estimation of covariance matrices
*Scatter matrix
*Arithmetic mean
*Estimation theory
*Linear regression
*Weighted least squares
*Weighted mean
*Standard error (statistics)


Wikimedia Foundation. 2010.

Игры ⚽ Поможем решить контрольную работу

Look at other dictionaries:

  • Mean and predicted response — In linear regression mean response and predicted response are values of the dependent variable calculated from the regression parameters and a given value of the independent variable. The values of these two responses are the same, but their… …   Wikipedia

  • Covariance — This article is about the measure of linear relation between random variables. For other uses, see Covariance (disambiguation). In probability theory and statistics, covariance is a measure of how much two variables change together. Variance is a …   Wikipedia

  • Sample size determination — is the act of choosing the number of observations to include in a statistical sample. The sample size is an important feature of any empirical study in which the goal is to make inferences about a population from a sample. In practice, the sample …   Wikipedia

  • Mean — This article is about the statistical concept. For other uses, see Mean (disambiguation). In statistics, mean has two related meanings: the arithmetic mean (and is distinguished from the geometric mean or harmonic mean). the expected value of a… …   Wikipedia

  • Sample (statistics) — In statistics, a sample is a subset of a population. Typically, the population is very large, making a census or a complete enumeration of all the values in the population impractical or impossible. The sample represents a subset of manageable… …   Wikipedia

  • Estimation of covariance matrices — In statistics, sometimes the covariance matrix of a multivariate random variable is not known but has to be estimated. Estimation of covariance matrices then deals with the question of how to approximate the actual covariance matrix on the basis… …   Wikipedia

  • Correlation and dependence — This article is about correlation and dependence in statistical data. For other uses, see correlation (disambiguation). In statistics, dependence refers to any statistical relationship between two random variables or two sets of data. Correlation …   Wikipedia

  • Arithmetic mean — In mathematics and statistics, the arithmetic mean, often referred to as simply the mean or average when the context is clear, is a method to derive the central tendency of a sample space. The term arithmetic mean is preferred in mathematics and… …   Wikipedia

  • Errors and residuals in statistics — For other senses of the word residual , see Residual. In statistics and optimization, statistical errors and residuals are two closely related and easily confused measures of the deviation of a sample from its theoretical value . The error of a… …   Wikipedia

  • Glossary of probability and statistics — The following is a glossary of terms. It is not intended to be all inclusive. Concerned fields *Probability theory *Algebra of random variables (linear algebra) *Statistics *Measure theory *Estimation theory Glossary *Atomic event : another name… …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”