Computational formula for the variance

In probability theory and statistics, the computational formula for the variance Var(X) of a random variable X is the formula

$\operatorname{Var}(X) = \operatorname{E}(X^2) - [\operatorname{E}(X)]^2\,$

where E(X) is the expected value of X.

A closely related identity can be used to calculate the sample variance, which is often used as an unbiased estimate of the population variance:

$\hat{\sigma}^2 := \frac{1}{N-1}\sum_{i=1}^N(x_i-\bar{x})^2 = \frac{N}{N-1}\left(\frac{1}{N}\left(\sum_{i=1}^N x_i^2\right) - \bar{x}^2\right)$

The second result is sometimes, unwisely, used in practice to calculate the variance. The problem is that subtracting two values having a similar value can lead to catastrophic cancellation^[1].

1 Proof
2 Generalization to covariance
3 Applications
4 See also

Proof

The computational formula for the population variance follows in a straightforward manner from the linearity of expected values and the definition of variance:

$\begin{array}{ccl} \operatorname{Var}(X)&=&\operatorname{E}\left[(X - \operatorname{E}(X))^2\right]\\ &=&\operatorname{E}\left[X^2 - 2X\operatorname{E}(X) + [\operatorname{E}(X)]^2\right]\\ &=&\operatorname{E}(X^2) - \operatorname{E}[2X\operatorname{E}(X)] + [\operatorname{E}(X)]^2\\ &=&\operatorname{E}(X^2) - 2\operatorname{E}(X)\operatorname{E}(X) + [\operatorname{E}(X)]^2\\ &=&\operatorname{E}(X^2) - 2[\operatorname{E}(X)]^2 + [\operatorname{E}(X)]^2\\ &=&\operatorname{E}(X^2) - [\operatorname{E}(X)]^2 \end{array}$

Generalization to covariance

This formula can be generalized for covariance, with two random variables X_i and X_j:

$\operatorname{Cov}(X_i, X_j) = \operatorname{E}(X_iX_j) -\operatorname{E}(X_i)\operatorname{E}(X_j)$

as well as for the n by n covariance matrix of a random vector of length n:

$\operatorname{Var}(\mathbf{X}) = \operatorname{E}(\mathbf{X X^\top}) - \operatorname{E}(\mathbf{X})\operatorname{E}(\mathbf{X})^\top$

and for the n by m cross-covariance matrix between two random vectors of length n and m:

$\operatorname{Cov}(\textbf{X},\textbf{Y})= \operatorname{E}(\mathbf{X Y^\top}) - \operatorname{E}(\mathbf{X})\operatorname{E}(\mathbf{Y})^\top$

where expectations are taken element-wise and $\mathbf{X}=\{X_1,X_2,\ldots,X_n\}$ and $\mathbf{Y}=\{Y_1,Y_2,\ldots,Y_m\}$ are random vectors of respective lengths n and m.

Applications

Its applications in systolic geometry include Loewner's torus inequality.

Algorithms for calculating variance — play a major role in statistical computing. A key problem in the design of good algorithms for this problem is that formulas for the variance may involve sums of squares, which can lead to numerical instability as well as to arithmetic overflow… … Wikipedia
Computational — may refer to: Computer Computational algebra Computational Aeroacoustics Computational and Information Systems Laboratory Computational and Systems Neuroscience Computational archaeology Computational auditory scene analysis Computational biology … Wikipedia
Variance — In probability theory and statistics, the variance of a random variable, probability distribution, or sample is one measure of statistical dispersion, averaging the squared distance of its possible values from the expected value (mean). Whereas… … Wikipedia
Monte Carlo method for photon transport — Modeling photon propagation with Monte Carlo methods is a flexible yet rigorous approach to simulate photon transport. In the method, local rules of photon transport are expressed as probability distributions which describe the step size of… … Wikipedia
List of statistics topics — Please add any Wikipedia articles related to statistics that are not already on this list.The Related changes link in the margin of this page (below search) leads to a list of the most recent changes to the articles listed below. To see the most… … Wikipedia
List of mathematics articles (C) — NOTOC C C closed subgroup C minimal theory C normal subgroup C number C semiring C space C symmetry C* algebra C0 semigroup CA group Cabal (set theory) Cabibbo Kobayashi Maskawa matrix Cabinet projection Cable knot Cabri Geometry Cabtaxi number… … Wikipedia
Expected value — This article is about the term used in probability theory and statistics. For other uses, see Expected value (disambiguation). In probability theory, the expected value (or expectation, or mathematical expectation, or mean, or the first moment)… … Wikipedia
Determining the number of clusters in a data set — Determining the number of clusters in a data set, a quantity often labeled k as in the k means algorithm, is a frequent problem in data clustering, and is a distinct issue from the process of actually solving the clustering problem. For a certain … Wikipedia
Systolic geometry — In mathematics, systolic geometry is the study of systolic invariants of manifolds and polyhedra, as initially conceived by Charles Loewner, and developed by Mikhail Gromov and others, in its arithmetic, ergodic, and topological manifestations.… … Wikipedia
Loewner's torus inequality — In differential geometry, Loewner s torus inequality is an inequality due to Charles Loewner for the systole of an arbitrary Riemannian metric on the 2 torus.tatementIn 1949 Charles Loewner proved that every metric on the 2 torus mathbb T^2… … Wikipedia

Academic Dictionaries and Encyclopedias

Computational formula for the variance

Contents

Proof

Generalization to covariance

Applications

See also

Look at other dictionaries:

Share the article and excerpts

Academic Dictionaries and Encyclopedias

Wikipedia

Computational formula for the variance

Contents

Proof

Generalization to covariance

Applications

See also

Look at other dictionaries:

Share the article and excerpts

Direct link