Covariance matrix

A bivariate Gaussian probability density function centered at (0,0), with covariance matrix [ 1.00, .50 ; .50, 1.00 ].

Sample points from a multivariate Gaussian distribution with a standard deviation of 3 in roughly the lower left-upper right direction and of 1 in the orthogonal direction. Because the x and y components co-vary, the variances of x and y do not fully describe the distribution. A 2×2 covariance matrix is needed; the directions of the arrows correspond to the eigenvectors of this covariance matrix and their lengths to the square roots of the eigenvalues.

In probability theory and statistics, a covariance matrix (also known as dispersion matrix) is a matrix whose element in the i, j position is the covariance between the i ^th and j ^th elements of a random vector (that is, of a vector of random variables). Each element of the vector is a scalar random variable, either with a finite number of observed empirical values or with a finite or infinite number of potential values specified by a theoretical joint probability distribution of all the random variables.

Intuitively, the covariance matrix generalizes the notion of variance to multiple dimensions. As an example, the variation in a collection of random points in two-dimensional space cannot be characterized fully by a single number, nor would the variances in the x and y directions contain all of the necessary information; a 2×2 matrix would be necessary to fully characterize the two-dimensional variation.

Analogous to the fact that it is necessary to build a Hessian matrix to fully describe the concavity of a multivariate function, a covariance matrix is necessary to fully describe the variation in a distribution.

1 Definition
- 1.1 Generalization of the variance
2 Conflicting nomenclatures and notations
3 Properties
4 As a linear operator
5 Which matrices are covariance matrices?
6 How to find a valid covariance matrix
7 Complex random vectors
8 Estimation
9 Probability density function
10 See also
11 References
12 Further reading

Definition

Throughout this article, boldfaced unsubscripted X and Y are used to refer to random vectors, and unboldfaced subscripted X_i and Y_i are used to refer to random scalars. If the entries in the column vector

$\mathbf{X} = \begin{bmatrix}X_1 \\ \vdots \\ X_n \end{bmatrix}$

are random variables, each with finite variance, then the covariance matrix Σ is the matrix whose (i, j) entry is the covariance

$\Sigma_{ij} = \mathrm{cov}(X_i, X_j) = \mathrm{E}\begin{bmatrix} (X_i - \mu_i)(X_j - \mu_j) \end{bmatrix}$

where

$\mu_i = \mathrm{E}(X_i)\,$

is the expected value of the ith entry in the vector X. In other words, we have

$\Sigma = \begin{bmatrix} \mathrm{E}[(X_1 - \mu_1)(X_1 - \mu_1)] & \mathrm{E}[(X_1 - \mu_1)(X_2 - \mu_2)] & \cdots & \mathrm{E}[(X_1 - \mu_1)(X_n - \mu_n)] \\ \\ \mathrm{E}[(X_2 - \mu_2)(X_1 - \mu_1)] & \mathrm{E}[(X_2 - \mu_2)(X_2 - \mu_2)] & \cdots & \mathrm{E}[(X_2 - \mu_2)(X_n - \mu_n)] \\ \\ \vdots & \vdots & \ddots & \vdots \\ \\ \mathrm{E}[(X_n - \mu_n)(X_1 - \mu_1)] & \mathrm{E}[(X_n - \mu_n)(X_2 - \mu_2)] & \cdots & \mathrm{E}[(X_n - \mu_n)(X_n - \mu_n)] \end{bmatrix}.$

The inverse of this matrix, $Σ - 1$ , is the inverse covariance matrix, also known as the concentration matrix or precision matrix.^[1] The elements of the precision matrix have an interpretation in terms of partial correlations and partial variances.

Generalization of the variance

The definition above is equivalent to the matrix equality

$\Sigma=\mathrm{E} \left[ \left( \textbf{X} - \mathrm{E}[\textbf{X}] \right) \left( \textbf{X} - \mathrm{E}[\textbf{X}] \right)^\top \right]$

This form can be seen as a generalization of the scalar-valued variance to higher dimensions. Recall that for a scalar-valued random variable X

$\sigma^2 = \mathrm{var}(X) = \mathrm{E}[(X-\mu)^2], \,$

where

$\mu = \mathrm{E}(X).\,$

Conflicting nomenclatures and notations

Nomenclatures differ. Some statisticians, following the probabilist William Feller, call this matrix the variance of the random vector $X$ , because it is the natural generalization to higher dimensions of the 1-dimensional variance. Others call it the covariance matrix, because it is the matrix of covariances between the scalar components of the vector $X$ . Thus

$\operatorname{var}(\textbf{X}) = \operatorname{cov}(\textbf{X}) = \mathrm{E} \left[ (\textbf{X} - \mathrm{E} [\textbf{X}]) (\textbf{X} - \mathrm{E} [\textbf{X}])^\top \right].$

However, the notation for the cross-covariance between two vectors is standard:

$\operatorname{cov}(\textbf{X},\textbf{Y}) = \mathrm{E} \left[ (\textbf{X} - \mathrm{E}[\textbf{X}]) (\textbf{Y} - \mathrm{E}[\textbf{Y}])^\top \right].$

The var notation is found in William Feller's two-volume book An Introduction to Probability Theory and Its Applications, but both forms are quite standard and there is no ambiguity between them.

The matrix $Σ$ is also often called the variance-covariance matrix since the diagonal terms are in fact variances.

Properties

For $\Sigma=\mathrm{E} \left[ \left( \textbf{X} - \mathrm{E}[\textbf{X}] \right) \left( \textbf{X} - \mathrm{E}[\textbf{X}] \right)^\top \right]$ and $\mu = \mathrm{E}(\textbf{X})$ , where X is a random p-dimensional variable and Y a random q-dimensional variable, the following basic properties apply:

$\Sigma = \mathrm{E}(\mathbf{X X^\top}) - \mathbf{\mu}\mathbf{\mu^\top}$
$\Sigma \,$ is positive-semidefinite and symmetric.
$\operatorname{cov}(\mathbf{A X} + \mathbf{a}) = \mathbf{A}\, \operatorname{cov}(\mathbf{X})\, \mathbf{A^\top}$
$\operatorname{cov}(\mathbf{X},\mathbf{Y}) = \operatorname{cov}(\mathbf{Y},\mathbf{X})^\top$
$\operatorname{cov}(\mathbf{X}_1 + \mathbf{X}_2,\mathbf{Y}) = \operatorname{cov}(\mathbf{X}_1,\mathbf{Y}) + \operatorname{cov}(\mathbf{X}_2, \mathbf{Y})$
If p = q, then $\operatorname{var}(\mathbf{X} + \mathbf{Y}) = \operatorname{var}(\mathbf{X}) + \operatorname{cov}(\mathbf{X},\mathbf{Y}) + \operatorname{cov}(\mathbf{Y}, \mathbf{X}) + \operatorname{var}(\mathbf{Y})$
$\operatorname{cov}(\mathbf{AX}, \mathbf{B}^\top\mathbf{Y}) = \mathbf{A}\, \operatorname{cov}(\mathbf{X}, \mathbf{Y}) \,\mathbf{B}$
If $\mathbf{X}$ and $\mathbf{Y}$ are independent, then $\operatorname{cov}(\mathbf{X}, \mathbf{Y}) = 0$

where $\mathbf{X}, \mathbf{X}_1$ and $\mathbf{X}_2$ are random p×1 vectors, $\mathbf{Y}$ is a random q×1 vector, $\mathbf{a}$ is q×1 vector, $\mathbf{A}$ and $\mathbf{B}$ are q×p matrices.

This covariance matrix is a useful tool in many different areas. From it a transformation matrix can be derived that allows one to completely decorrelate the data or, from a different point of view, to find an optimal basis for representing the data in a compact way (see Rayleigh quotient for a formal proof and additional properties of covariance matrices). This is called principal components analysis (PCA) and Karhunen-Loève transform (KL-transform).

As a linear operator

Applied to one vector, the covariance matrix maps a linear combination, c, of the random variables, X, onto a vector of covariances with those variables: $\mathbf c^\top\Sigma = \operatorname{cov}(\mathbf c^\top\mathbf X,\mathbf X)$ . Treated as a 2-form, it yields the covariance between the two linear combinations: $\mathbf d^\top\Sigma\mathbf c=\operatorname{cov}(\mathbf d^\top\mathbf X,\mathbf c^\top\mathbf X)$ . The variance of a linear combination is then $\mathbf c^\top\Sigma\mathbf c$ , its covariance with itself.

Similarly, the (pseudo-)inverse covariance matrix provides an inner product, $\langle c-\mu|\Sigma^+|c-\mu\rangle$ which induces the Mahalanobis distance, a measure of the "unlikelihood" of c.

Which matrices are covariance matrices?

From the identity just above (let $\mathbf{b}$ be a $(p \times 1)$ real-valued vector)

$\operatorname{var}(\mathbf{b}^\top\mathbf{X}) = \mathbf{b}^\top \operatorname{var}(\mathbf{X}) \mathbf{b},\,$

the fact that the variance of any real-valued random variable is nonnegative, and the symmetry of the covariance matrix's definition it follows that only a positive-semidefinite matrix can be a covariance matrix. The answer to the converse question, whether every symmetric positive semi-definite matrix is a covariance matrix, is "yes." To see this, suppose M is a p×p positive-semidefinite matrix. From the finite-dimensional case of the spectral theorem, it follows that M has a nonnegative symmetric square root, which let us call M^1/2. Let $\mathbf{X}$ be any p×1 column vector-valued random variable whose covariance matrix is the p×p identity matrix. Then

$\operatorname{var}(M^{1/2}\mathbf{X}) = M^{1/2} (\operatorname{var}(\mathbf{X})) M^{1/2} = M.\,$

How to find a valid covariance matrix

In some applications (e.g. building data models from only partially observed data) one wants to find the “nearest” covariance matrix to a given symmetric matrix (e.g. of observed covariances). In 2002, Higham^[2] formalized the notion of nearness using a weighted Frobenius norm and provided a method for computing the nearest covariance matrix.

Complex random vectors

The variance of a complex scalar-valued random variable with expected value μ is conventionally defined using complex conjugation:

$\operatorname{var}(z) = \operatorname{E} \left[ (z-\mu)(z-\mu)^{*} \right]$

where the complex conjugate of a complex number $z$ is denoted $z *$ ; thus the variance of a complex number is a real number.

If $Z$ is a column-vector of complex-valued random variables, then we take the conjugate transpose by both transposing and conjugating, getting a square matrix:

$\operatorname{E} \left[ (Z-\mu)(Z-\mu)^{H} \right]$

where $Z H$ denotes the conjugate transpose, which is applicable to the scalar case since the transpose of a scalar is still a scalar. The matrix so obtained will be Hermitian positive-semidefinite,^[3] with real numbers in the main diagonal and complex numbers off-diagonal.

Estimation

The derivation of the maximum-likelihood estimator of the covariance matrix of a multivariate normal distribution is perhaps surprisingly subtle. See estimation of covariance matrices.

Probability density function

If a vector of n possibly correlated random variables is jointly normally distributed, or more generally elliptically distributed, then its probability density function can be expressed in terms of the covariance matrix.

References

^ Wasserman, Larry (2004). All of Statistics: A Concise Course in Statistical Inference. ISBN 0387402721.
^ Higham, Nicholas J.. "Computing the nearest correlation matrix—a problem from finance". IMA Journal of Numerical Analysis 22 (3): 329–343. doi:10.1093/imanum/22.3.329.
^ Brookes, Mike. "Stochastic Matrices". The Matrix Reference Manual. http://www.ee.ic.ac.uk/hp/staff/dmb/matrix/expect.html.

Academic Dictionaries and Encyclopedias

Covariance matrix

Contents

Definition

Generalization of the variance

Conflicting nomenclatures and notations

Properties

As a linear operator

Which matrices are covariance matrices?

How to find a valid covariance matrix

Complex random vectors

Estimation

Probability density function

See also

References

Further reading

Look at other dictionaries:

Share the article and excerpts

Academic Dictionaries and Encyclopedias

Wikipedia

Covariance matrix

Contents

Definition

Generalization of the variance

Conflicting nomenclatures and notations

Properties

As a linear operator

Which matrices are covariance matrices?

How to find a valid covariance matrix

Complex random vectors

Estimation

Probability density function

See also

References

Further reading

Look at other dictionaries:

Share the article and excerpts

Direct link