Computational formula for the variance

Computational formula for the variance

In probability theory and statistics, the computational formula for the variance Var(X) of a random variable X is the formula

\operatorname{Var}(X) = \operatorname{E}(X^2) - [\operatorname{E}(X)]^2\,

where E(X) is the expected value of X.

A closely related identity can be used to calculate the sample variance, which is often used as an unbiased estimate of the population variance:


\hat{\sigma}^2 := \frac{1}{N-1}\sum_{i=1}^N(x_i-\bar{x})^2 = \frac{N}{N-1}\left(\frac{1}{N}\left(\sum_{i=1}^N x_i^2\right) - \bar{x}^2\right)

The second result is sometimes, unwisely, used in practice to calculate the variance. The problem is that subtracting two values having a similar value can lead to catastrophic cancellation[1].

Contents

Proof

The computational formula for the population variance follows in a straightforward manner from the linearity of expected values and the definition of variance:


\begin{array}{ccl}
\operatorname{Var}(X)&=&\operatorname{E}\left[(X - \operatorname{E}(X))^2\right]\\
                     &=&\operatorname{E}\left[X^2 - 2X\operatorname{E}(X) + [\operatorname{E}(X)]^2\right]\\
                     &=&\operatorname{E}(X^2) - \operatorname{E}[2X\operatorname{E}(X)] + [\operatorname{E}(X)]^2\\
                     &=&\operatorname{E}(X^2) - 2\operatorname{E}(X)\operatorname{E}(X) + [\operatorname{E}(X)]^2\\
                     &=&\operatorname{E}(X^2) - 2[\operatorname{E}(X)]^2 + [\operatorname{E}(X)]^2\\
                     &=&\operatorname{E}(X^2) - [\operatorname{E}(X)]^2
\end{array}

Generalization to covariance

This formula can be generalized for covariance, with two random variables Xi and Xj:

\operatorname{Cov}(X_i, X_j) = \operatorname{E}(X_iX_j) -\operatorname{E}(X_i)\operatorname{E}(X_j)

as well as for the n by n covariance matrix of a random vector of length n:

 \operatorname{Var}(\mathbf{X}) = \operatorname{E}(\mathbf{X X^\top}) - \operatorname{E}(\mathbf{X})\operatorname{E}(\mathbf{X})^\top

and for the n by m cross-covariance matrix between two random vectors of length n and m:


\operatorname{Cov}(\textbf{X},\textbf{Y})=
\operatorname{E}(\mathbf{X Y^\top}) - \operatorname{E}(\mathbf{X})\operatorname{E}(\mathbf{Y})^\top

where expectations are taken element-wise and \mathbf{X}=\{X_1,X_2,\ldots,X_n\} and \mathbf{Y}=\{Y_1,Y_2,\ldots,Y_m\} are random vectors of respective lengths n and m.

Applications

Its applications in systolic geometry include Loewner's torus inequality.

See also

  1. ^ Donald E. Knuth (1998). The Art of Computer Programming, volume 2: Seminumerical Algorithms, 3rd edn., p. 232. Boston: Addison-Wesley.

Wikimedia Foundation. 2010.

Игры ⚽ Нужен реферат?

Look at other dictionaries:

  • Algorithms for calculating variance — play a major role in statistical computing. A key problem in the design of good algorithms for this problem is that formulas for the variance may involve sums of squares, which can lead to numerical instability as well as to arithmetic overflow… …   Wikipedia

  • Computational — may refer to: Computer Computational algebra Computational Aeroacoustics Computational and Information Systems Laboratory Computational and Systems Neuroscience Computational archaeology Computational auditory scene analysis Computational biology …   Wikipedia

  • Variance — In probability theory and statistics, the variance of a random variable, probability distribution, or sample is one measure of statistical dispersion, averaging the squared distance of its possible values from the expected value (mean). Whereas… …   Wikipedia

  • Monte Carlo method for photon transport — Modeling photon propagation with Monte Carlo methods is a flexible yet rigorous approach to simulate photon transport. In the method, local rules of photon transport are expressed as probability distributions which describe the step size of… …   Wikipedia

  • List of statistics topics — Please add any Wikipedia articles related to statistics that are not already on this list.The Related changes link in the margin of this page (below search) leads to a list of the most recent changes to the articles listed below. To see the most… …   Wikipedia

  • List of mathematics articles (C) — NOTOC C C closed subgroup C minimal theory C normal subgroup C number C semiring C space C symmetry C* algebra C0 semigroup CA group Cabal (set theory) Cabibbo Kobayashi Maskawa matrix Cabinet projection Cable knot Cabri Geometry Cabtaxi number… …   Wikipedia

  • Expected value — This article is about the term used in probability theory and statistics. For other uses, see Expected value (disambiguation). In probability theory, the expected value (or expectation, or mathematical expectation, or mean, or the first moment)… …   Wikipedia

  • Determining the number of clusters in a data set — Determining the number of clusters in a data set, a quantity often labeled k as in the k means algorithm, is a frequent problem in data clustering, and is a distinct issue from the process of actually solving the clustering problem. For a certain …   Wikipedia

  • Systolic geometry — In mathematics, systolic geometry is the study of systolic invariants of manifolds and polyhedra, as initially conceived by Charles Loewner, and developed by Mikhail Gromov and others, in its arithmetic, ergodic, and topological manifestations.… …   Wikipedia

  • Loewner's torus inequality — In differential geometry, Loewner s torus inequality is an inequality due to Charles Loewner for the systole of an arbitrary Riemannian metric on the 2 torus.tatementIn 1949 Charles Loewner proved that every metric on the 2 torus mathbb T^2… …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”