- Goodness of fit
The goodness of fit of a
statistical model describes how well it fits a set of observations. Measures of goodness of fit typically summarize the discrepancy between observed values and the values expected under the model in question. Such measures can be used instatistical hypothesis testing , e.g. to test for normality of residuals, to test whether two samples are drawn from identical distributions (seeKolmogorov-Smirnov test), or whether outcome frequencies follow a specified distribution (seePearson's chi-square test ). In theanalysis of variance , one of the components into which the variance is partitioned may be alack-of-fit sum of squares .Example
The
chi-square statistic is a sum of differences between observed and expected outcome frequencies, each squared and divided by the expectation:: where::"O" = an observed frequency:"E" = an expected (theoretical) frequency, asserted by the
null hypothesis The resulting value can be compared to the
chi-square distribution to determine the goodness of fit.In order to determine the degrees of Freedom of the Chi-Squared distribution, one takes the total number of observed frequencies and subtracts one. For example, if there are eight different frequencies, one would compare to a chi-squared with seven degrees of freedom.
There is also a reduced chi-squared statistic, which is weighted based on measurement error.:where is the
variance of the observation. [ [http://www.sns.gov/workshops/sns_hfir_users/posters/Laub_Chi-Square_Data_Fitting.pdf Chi-Square Data Fitting ] ]Binomial case
A binomial experiment is a sequence of independent trials in which the trials can result in one of two outcomes, success or failure. There are "n" trials each with probability of success, denoted by "p". Provided that "np""i" ≫ 1 for every "i" (where "i" = 1, 2, ..., "k"), then
:
This has approximately a chi-squared distribution with "k" − 1 df. The fact that df = "k" − 1 is a consequence of the restriction . We know there are "k" observed cell counts, however, once any "k" − 1 are known, the remaining one is uniquely determined. Basically, one can say, there are only "k" − 1 freely determined cell counts, thus df = "k" − 1.
References
Wikimedia Foundation. 2010.