- Squared deviations
In
probability theory andstatistics , the definition ofvariance is either theexpected value (when considering a theoretical distribution), or average (for actual experimental data) of squared deviations from the mean. Computations foranalysis of variance involve the partitioning of a sum of squared deviations. An understanding of the complex computations involved is greatly enhanced by a detailed study of the statistical value:: operatorname{E}( X ^ 2 ).
It is well-known that for a
random variable X with mean mu and variance sigma^2:: sigma^2 = operatorname{E}( X ^ 2 ) - mu^2 [Mood & Graybill: "An introduction to the Theory of Statistics" (McGraw Hill)]
Therefore
: operatorname{E}( X ^ 2 ) = sigma^2 + mu^2.
From the above, the following are readly derived:
: operatorname{E}left( sumleft( X ^ 2 ight) ight) = nsigma^2 + nmu^2
: operatorname{E}left( left(sum X ight)^ 2 ight) = nsigma^2 + n^2mu^2
Sample variance
The sum of squared deviations needed to calculate variance (before deciding whether to divide by "n" or "n" − 1) is most easily calculated as
: S = sum x ^ 2 - left(sum x ight)^2/n
From the two derived expectations above the expected value of this sum is
: operatorname{E}(S) = nsigma^2 + nmu^2 - (nsigma^2 + n^2mu^2)/n
which implies
: operatorname{E}(S) = (n - 1)sigma^2.
This effectively proves the use of the divisor n - 1) in the calculation of an unbiased sample estimate of sigma^2
Partition — analysis of variance
In the situation where data is available for "k" different treatment groups having size "ni" where "i" varies from 1 to "k", then it is assumed that the expected mean of each group is
: operatorname{E}(mu_i) = mu + T_i
and the variance of each treatment group is unchanged from the population variance sigma^2.
Under the Null Hyporthesis that the treatments have no effect, then each of the T_i will be zero.
It is now possible to calculate three sums of squares:
;Individual
:I = sum x^2
:operatorname{E}(I) = nsigma^2 + nmu^2
;Treatments
:T = sum_{i=1}^k left(left(sum x ight)^2/n_i ight)
:operatorname{E}(T) = ksigma^2 + sum_{i=1}^k n_i(mu + T_i)^2
:operatorname{E}(T) = ksigma^2 + nmu^2 + 2mu sum_{i=1}^k (n_iT_i) + sum_{i=1}^k n_i(T_i)^2
Under the null hypothesis that the treatments cause no differences and all the T_i are zero, the expectation simplifies to
:operatorname{E}(T) = ksigma^2 + nmu^2.
;Combination
:C = left(sum x ight)^2/n
:operatorname{E}(C) = sigma^2 + nmu^2
ums of squared deviations
Under the null hypothesis, the difference of any pair of "I", "T", and "C" does not contain any dependency on mu, only sigma^2.
:operatorname{E}(I - C) = (n - 1)sigma^2 total squared deviations
:operatorname{E}(T - C) = (k - 1)sigma^2 treatment squared deviations
:operatorname{E}(I - T) = (n - k)sigma^2 residual squared deviations
The constants ("n" − 1), ("k" − 1), and ("n" − "k") are normally referred to as the number of degrees of freedom.
Example
In a very simple example, 5 observations arise from two treatments. The first treatment gives three values 1, 2, and 3, and the second treatment gives two values 4, and 6.
:I = frac{1^2}{1} + frac{2^2}{1} + frac{3^2}{1} + frac{4^2}{1} + frac{6^2}{1} = 66
:T = frac{(1 + 2 + 3)^2}{3} + frac{(4 + 6)^2}{2} = 12 + 50 = 62
:C = frac{(1 + 2 + 3 + 4 + 6)^2}{5} = 256/5 = 51.2
Giving
: Total squared deviations = 66 − 51.2 = 14.8 with 4 degrees of freedom.: Treatment squared deviations = 62 − 51.2 = 10.8 with 1 degree of freedom.: Residual squared deviations = 66 − 62 = 4 with 3 degrees of freedom.
Two-way analysis of variance
The following hypothetical example gives the yields of 15 plants subject to two different environmental variations, and three different fertilisers.
ee also
*
Variance decomposition
*Errors and residuals in statistics References
Wikimedia Foundation. 2010.