- Multivariate random variable
-
In mathematics, probability, and statistics, a multivariate random variable or random vector is a list of mathematical variables each of whose values is unknown, either because the value has not yet occurred or because there is imperfect knowledge of its value.
More formally, a multivariate random variable is a column vector X = (X1, ..., Xn)T (or its transpose, which is a row vector) whose components are scalar-valued random variables on the same probability space (Ω, , P), where Ω is the sample space, is the sigma-algebra (the collection of all events), and P is the probability measure (a function returning every event's probability).
Contents
Probability distribution
Every random vector gives rise to a probability measure on Rn with the Borel algebra as the underlying sigma-algebra. This measure is also known as the joint probability distribution, the joint distribution, or the multivariate distribution of the random vector.
The distributions of each of the component random variables Xi are called marginal distributions. The conditional probability distribution of Xi given Xj is the probability distribution of Xi when Xj is known to be a particular value.
Operations on random vectors
Random vectors can be subjected to the same kinds of algebraic operations as can non-random vectors: addition, subtraction, multiplication by a scalar, and the taking of inner products.
Expected value, covariance, and cross-covariance
The expected value or mean of a random vector X is a fixed vector E(X) whose elements are the expected values of the respective random variables.
The covariance matrix (also called the variance-covariance matrix) of an n× 1 random vector is an n × n matrix whose i, j element is the covariance between the ith and the jth random variables. The covariance matrix is the expected value, element by element, of the n × n matrix computed as [X – E(X)][X-E(X)]T, where the superscript T refers to the transpose of the indicated vector:
By extension, the cross-covariance matrix between two random vectors X and Y (X having n elements and Y having p elements) is the n × p matrix
where again the indicated matrix expectation is taken element-by-element in the matrix. The cross-covariance matrix Cov(Y, X) is simply the transpose of the matrix Cov(X, Y).
Further properties
One can take the expectation of a quadratic form in the random vector X as follows:[1]:p.170-171
where C is the covariance matrix of X and tr refers to the trace of a matrix — that is, to the sum of the elements on its main diagonal (from upper left to lower right). Since the quadratic form is a scalar, so is its expectation.
One can take the expectation of the product of two different quadratic forms in a zero-mean Gaussian random vector X as follows:[1]:pp. 162-176
where again C is the covariance matrix of X. Again, since both quadratic forms are scalars and hence their product is a scalar, the expectation of their product is also a scalar.
Applications
Portfolio theory
In portfolio theory in finance, an objective often is to choose a portfolio of risky assets such that the distribution of the random portfolio return has desirable properties. For example, one might want to choose the portfolio return having the lowest variance for a given expected value. Here the random vector is the vector r of random returns on the individual assets, and the portfolio return p (a random scalar) is the inner product of the vector of random returns with a vector w of portfolio weights — the fractions of the portfolio placed in the respective assets. Since p = wTr, the expected value of the portfolio return is wTE(r) and the variance of the portfolio return can be shown to be wTCw, where C is the covariance matrix of r.
Regression theory
In linear regression theory, we have data on n observations on a dependent variable y and n observations on each of k independent variables xj. The observations on the dependent variable are stacked into a column vector y; the observations on each independent variable are also stacked into column vectors, and these latter column vectors are combined into a matrix X of observations on the independent variables. Then the following regression equation is postulated as a description of the process that generated the data:
where is a postulated fixed but unknown vector of k response coefficients, and e is an unknown random vector reflecting random influences on the dependent variable. By some chosen technique such as ordinary least squares, a vector is chosen as an estimate of , and the estimate of the vector , denoted , is computed as
Then the statistician must analyze the properties of and , which are viewed as random vectors since a randomly different selection of n cases to observe would have resulted in different values for them.
References
Wikimedia Foundation. 2010.