- Linear model
statisticsthe linear model is given by
where "Y" is an "n"×1 column vector of random variables, "X" is an "n"×"p" matrix of "known" (i.e. observable and non-random) quantities, whose rows correspond to
statistical units, β is a "p"×1 vector of (unobservable) parameters, and ε is an "n"×1 vector of "errors", which are uncorrelated random variables each with expected value 0 and variance σ2.
Much of the theory of linear models is associated with inferring the values of the
parameters β and σ2. Typically this is done using the method of maximum likelihood, which in the case of normal errors is equivalent to the method of least squares.
Multivariate normal errors
Often one takes the components of the vector of errors to be independent and normally distributed, giving "Y" a
multivariate normal distributionwith mean "X"β and co-variance matrix σ2 "I", where "I" is the identity matrix. Having observed the values of "X" and "Y", the statistician must estimate β and σ2.
Rank of "X"
We usually assume that "X" is of full rank "p", which allows us to invert the "p" × "p" matrix . The essence of this assumption is that the parameters are not linearly dependent upon one another, which would make little sense in a linear model. This also ensures the model is identifiable.
Methods of inference
The log-likelihood function (for independent and normally distributed) is
where is the "i"th row of "X". Differentiating with respect to βj, we get
so setting this set of "p" equations to zero and solving for β gives
Now, using the assumption that "X" has rank "p", we can invert the matrix on the left hand side to give the maximum likelihood estimate for β:
We can check that this is a maximum by looking at the
Hessian matrixof the log-likelihood function.
By setting the right hand side of
to zero and solving for σ2 we find that
Accuracy of maximum likelihood estimation
Since we have that "Y" follows a
multivariate normal distributionwith mean "X"β and co-variance matrix σ2 "I", we can deduce the distribution of the MLE of β:
So this estimate is unbiased for β, and we can show that this variance achieves the
A more complicated argumentA.C. Davidson "Statistical Models". Cambridge University Press (2003).] shows that
chi-squared distributionwith "n" − "p" degrees of freedom has mean "n" − "p", this is only asymptotically unbiased.
Generalized least squares
If, rather than taking the variance of ε to be σ2"I", where "I" is the "n"×"n" identity matrix, one assumes the variance is σ2"Ω", where "Ω" is a known matrix other than the identity matrix, then one estimates β by the method of "generalized least squares", in which, instead of minimizing the sum of squares of the residuals, one minimizes a different quadratic form in the residuals — the quadratic form being the one given by the matrix "Ω"−1:
This has the effect of "de-correlating" normal errors, and leads to the estimator
which is the best linear unbiased estimator for . If all of the off-diagonal entries in the matrix "Ω" are 0, then one normally estimates β by the method of
weighted least squares, with weights proportional to the reciprocals of the diagonal entries. The GLS estimator is also known as the Aitken estimator, after Alexander Aitken, the Professor in the University of OtagoStatistics Department who pioneered it. [ [http://web.uvic.ca/econ/aitken.html Alexander Craig Aitken ] ]
Generalized linear models
Generalized linear models, for which rather than
: E("Y") = "X"β,
:"g"(E("Y")) = "X"β,
where "g" is the "link function". The variance is also not restricted to being normal.
An example is the
Poisson regressionmodel, which states that
:"Y""i" has a Poisson distribution with expected value "e"γ+δ"x""i".The link function is the
natural logarithmfunction.Having observed "x""i" and "Y""i" for"i" = 1, ..., "n", one can estimate γ and δ by the method of maximum likelihood.
General linear model
general linear model(or multivariate regression model) is a linear model with multiple measurements per object. Each object may be represented in a vector.
ANOVA, or analysis of variance, is historically a precursor to the development of linear models. Here the model parameters themselves are not computed, but "X" column contributions and their significance are identified using the ratios of within-group variances to the error variance and applying the F test.
Wikimedia Foundation. 2010.