- Linear model
In

statistics the**linear model**is given by:$Y\; =\; X\; eta\; +\; varepsilon$

where "Y" is an "n"×1 column vector of random variables, "X" is an "n"×"p" matrix of "known" (i.e. observable and non-random) quantities, whose rows correspond to

statistical unit s, β is a "p"×1 vector of (unobservable) parameters, and ε is an "n"×1 vector of "errors", which are uncorrelatedrandom variable s each with expected value 0 and variance σ^{2}.Much of the theory of linear models is associated with inferring the values of the

parameter s β and σ^{2}. Typically this is done using the method ofmaximum likelihood , which in the case of normal errors is equivalent to the method ofleast squares .**Assumptions****Multivariate normal errors**Often one takes the components of the vector of errors to be independent and normally distributed, giving "Y" a

multivariate normal distribution with mean "X"β and co-variance matrix σ^{2}"I", where "I" is theidentity matrix . Having observed the values of "X" and "Y", the statistician must estimate β and σ^{2}.**Rank of "X"**We usually assume that "X" is of full rank "p", which allows us to invert the "p" × "p" matrix $X^\{\; op\}\; X$. The essence of this assumption is that the parameters are not linearly dependent upon one another, which would make little sense in a linear model. This also ensures the model is identifiable.

**Methods of inference****Maximum likelihood****β**The log-likelihood function (for $epsilon\_i$ independent and normally distributed) is

:$l(eta,\; sigma^2;\; Y)\; =\; -frac\{n\}\{2\}\; log\; (2\; pi\; sigma^2)\; -\; frac\{1\}\{2sigma^2\}\; sum\_\{i=1\}^n\; left(Y\_i\; -\; x\_i^\{\; op\}\; eta\; ight)^2$

where $x\_i^\{\; op\}$ is the "i"th row of "X". Differentiating with respect to β

_{j}, we get:$frac\{partial\; l\}\{partial\; eta\_j\}\; =\; frac\{1\}\{sigma^2\}\; sum\_\{i=1\}^n\; x\_\{ij\}\; left(\; Y\_i\; -\; x\_i^\{\; op\}\; eta\; ight)$

so setting this set of "p" equations to zero and solving for β gives

:$X^\{\; op\}\; X\; hat\{eta\}\; =\; X^\{\; op\}\; Y.$

Now, using the assumption that "X" has rank "p", we can invert the matrix on the left hand side to give the maximum likelihood estimate for β:

:$hat\{eta\}\; =\; (X^\{\; op\}\; X)^\{-1\}\; X^\{\; op\}\; Y$.

We can check that this is a maximum by looking at the

Hessian matrix of the log-likelihood function.**σ**^{2}By setting the right hand side of

:$frac\{partial\; l\}\{partial\; sigma^2\}\; =\; -frac\{n\}\{2sigma^2\}\; +\; frac\{1\}\{2\; sigma^4\}\; sum\_\{i=1\}^n\; left(Y\_i\; -\; x\_i^\{\; op\}\; eta\; ight)^2$

to zero and solving for σ

^{2}we find that:$hat\{sigma\}^2\; =\; frac\{1\}\{n\}\; sum\_\{i=1\}^n\; left(Y\_i\; -\; x\_i^\{\; op\}\; hat\{eta\}\; ight)^2\; =\; frac\{1\}\{n\}\; |\; Y\; -\; X\; hat\{eta\}\; |^2.$

**Accuracy of maximum likelihood estimation**Since we have that "Y" follows a

multivariate normal distribution with mean "X"β and co-variance matrix σ^{2}"I", we can deduce the distribution of the MLE of β::$hat\{eta\}\; =\; (X^\{\; op\}\; X)^\{-1\}\; X^\{\; op\}\; Y\; sim\; N\_p\; (eta,\; (X^\{\; op\}X)^\{-1\}\; sigma^2\; ).$

So this estimate is unbiased for β, and we can show that this variance achieves the

Cramér-Rao bound .A more complicated argumentA.C. Davidson "Statistical Models". Cambridge University Press (2003).] shows that

:$hat\{sigma\}^2\; sim\; frac\{sigma^2\}\{n\}\; chi^2\_\{n-p\};$

since a

chi-squared distribution with "n" − "p" degrees of freedom has mean "n" − "p", this is only asymptotically unbiased.**Generalizations****Generalized least squares**If, rather than taking the variance of ε to be σ

^{2}"I", where "I" is the "n"×"n" identity matrix, one assumes the variance is σ^{2}"Ω", where "Ω" is a known matrix other than the identity matrix, then one estimates β by the method of "generalized least squares", in which, instead of minimizing the sum of squares of the residuals, one minimizes a different quadratic form in the residuals — the quadratic form being the one given by the matrix "Ω"^{−1}:::$\{min\_\{etaleft(y-Xeta\; ight)\text{'}Omega^\{-1\}left(y-Xeta\; ight).$

This has the effect of "de-correlating" normal errors, and leads to the estimator

::$widehat\{eta\}=left(X\text{'}Omega^\{-1\}X\; ight)^\{-1\}X\text{'}Omega^\{-1\}y$

which is the best linear unbiased estimator for $eta$. If all of the off-diagonal entries in the matrix "Ω" are 0, then one normally estimates β by the method of

weighted least squares , with weights proportional to the reciprocals of the diagonal entries. The GLS estimator is also known as the**Aitken estimator**, afterAlexander Aitken , the Professor in theUniversity of Otago Statistics Department who pioneered it. [*[*]*http://web.uvic.ca/econ/aitken.html Alexander Craig Aitken*]**Generalized linear models**Generalized linear model s, for which rather than: E("Y") = "X"β,

one has

:"g"(E("Y")) = "X"β,

where "g" is the "link function". The variance is also not restricted to being normal.

An example is the

Poisson regression model, which states that:"Y"

_{"i"}has a Poisson distribution with expected value "e"^{γ+δ"x""i"}.The link function is thenatural logarithm function.Having observed "x"_{"i"}and "Y"_{"i"}for"i" = 1, ..., "n", one can estimate γ and δ by the method ofmaximum likelihood .**General linear model**The

general linear model (or multivariate regression model) is a linear model with multiple measurements per object. Each object may be represented in a vector.**References****ee also***

ANOVA , or analysis of variance, is historically a precursor to the development of linear models. Here the model parameters themselves are not computed, but "X" column contributions and their significance are identified using the ratios of within-group variances to the error variance and applying the F test.

*Linear regression

*Robust regression

*Wikimedia Foundation.
2010.*