- Gauss–Markov theorem
:"This article is

**not**aboutGauss–Markov process es."In

statistics , the**Gauss–Markov theorem**, named afterCarl Friedrich Gauss andAndrey Markov , states that in alinear model in which the errors have expectation zero and areuncorrelated and have equalvariance s, a best linear unbiasedestimator (**BLUE**) of the coefficients is given by theleast-squares estimator. The errors are "not" assumed to be normally distributed, nor are they assumed to be independent (but onlyuncorrelated — a weaker condition), nor are they assumed to be identically distributed (but only having zero mean and equal variances).**Statement**Suppose we have

:$Y\_i=sum\_\{j=1\}^\{K\}eta\_j\; X\_\{ij\}+varepsilon\_i$

for "i" = 1, . . ., "n", where "β"

_{ "j"}are non-random but**un**observable parameters, "X_{ij}" are non-random and observable (called the "explanatory variables"), "ε"_{ "i"}are random , and so "Y"_{ "i"}are random. The random variables "ε"_{ "i"}are called the "errors" (not to be confused with "residuals"; seeerrors and residuals in statistics ). Note that to include a constant in the model above, one can choose to include the "X_{iK}" = 1.The

**Gauss–Markov**assumptions state that*$\{\; m\; E\}left(varepsilon\_i\; ight)=0,$

*$\{\; m\; Var\}left(varepsilon\_i\; ight)=sigma^2,\; math>(i.e.,\; all\; errors\; have\; the\; same\; variance;\; that\; is\; "homoscedasticity"),\; and$

*$\{\; m\; Cov\}left(varepsilon\_i,varepsilon\_j\; ight)=0$for "i" ≠ "j"; that is "uncorrelatedness." A

**linear estimator**of "β"_{ "j"}is a linear combination:$widehateta\_j\; =\; c\_\{1j\}Y\_1+cdots+c\_\{nj\}Y\_n$

in which the coefficients "c

_{ij}" are not allowed to depend on the earlier coefficients "β", since those are not observable, but are allowed to depend on "X", since this data is observable, and whose expected value remains "β"_{ "j"}even if the values of "X" change. (The dependence of the coefficients on "X" is typically nonlinear; the estimator is linear in "Y" and hence in "ε" which is random; that is why this is "linear" regression.) The estimator is**unbiased**iff :$\{\; m\; E\}(widehateta\_j)=eta\_j.,$

Now, let $sum\_\{j=1\}^Klambda\_jeta\_j$ be some linear combination of the coefficients. Then the

of the corresponding estimation is defined asmean squared error :$\{\; m\; E\}\; left(sum\_\{j=1\}^Klambda\_j(widehateta\_j-eta\_j)^2\; ight)$

i.e., it is the expectation of the square of the difference between the estimator and the parameter to be estimated. (The mean squared error of an estimator coincides with the estimator's variance if the estimator is unbiased; for biased estimators the mean squared error is the sum of the variance and the square of the bias.) A

**best linear unbiased estimator**of "β" is the one with the smallest mean squared error for every linear combination "λ". This is equivalent to the condition that:$\{\; m\; Var\}(widehateta)-\{\; m\; Var\}(\; ildeeta)$

is a positive semi-definite matrix for every other linear unbiased estimator $ildeeta$.

The

**ordinary least squares estimator (OLS)**is the function:$widehateta=(X^\{T\}X)^\{-1\}X^\{T\}Y$

of "Y" and "X" that minimizes the

**sum of squares of residuals**:$sum\_\{i=1\}^nleft(Y\_i-widehat\{Y\}\_i\; ight)^2=sum\_\{i=1\}^nleft(Y\_i-sum\_\{j=1\}^Kwidehateta\_j\; X\_\{ij\}\; ight)^2.$

(It is easy to confuse the concept of "error" introduced early in this article, with this concept of "residual". For an account of the differences and the relationship between them, see

errors and residuals in statistics ).The theorem now states that the OLS estimator is a BLUE. The main idea of the proof is that the least-squares estimator isuncorrelated with every linear unbiased estimator of zero, i.e., with every linear combination $a\_1Y\_1+cdots+a\_nY\_n$whose coefficients do not depend upon the unobservable β but whose expected value is always zero.

**Generalized least squares estimator**The GLS or Aitken estimator extends the Gauss-Markov Theorem to the case where the error vector has a non-scalar covariance matrixndashthe Aitken estimator is also a BLUE. [

*A. C. Aitken, "On Least Squares and Linear Combinations of Observations", "Proceedings of the Royal Society of Edinburgh", 1935, vol. 55, pp. 42–48.*]**ee also***

Independent and identically-distributed random variables

*Linear regression

*Measurement uncertainty

*Best linear unbiased prediction **Notes****References*** Plackett, R.L. (1950) "Some Theorems in Least Squares", "Biometrika"

**37**: 149–157**External links*** [

*http://members.aol.com/jeff570/g.html Earliest Known Uses of Some of the Words of Mathematics: G*] (brief history and explanation of its name)

* [*http://www.xycoon.com/ols1.htm Proof of the Gauss Markov theorem for multiple linear regression*] (makes use of matrix algebra)

* [*http://emlab.berkeley.edu/GMTheorem/index.html A Proof of the Gauss Markov theorem using geometry*]

*Wikimedia Foundation.
2010.*