Bayesian information criterion

Bayesian information criterion

In statistics, in order to describe a particular dataset, one can use non-parametric methods or parametric methods. In parametric methods, there might be various candidate models with different number of parameters to represent a dataset. The number of parameters in a model plays an important role. The likelihood of the training data is increased when the number of parameters in the model is increased but it might result in overtraining problem if the number of parameters is too large. In order to overcome this problem one can use Bayesian information criterion (parametric method) which is a statistical criterion for model selection.

The BIC is sometimes also named the Schwarz Criterion, or Schwarz Information Criterion (SIC). It is so named because Gideon E. Schwarz (1978) gave a Bayesian argument for adopting it.

Mathematically

The BIC is an asymptotic result derived under the assumptions that the data distribution is in the exponential family. Let:
*"x" = the observed data;
*"n" = the number of data points in x, the number of observations, or equivalently, the sample size;
*"k" = the number of free parameters to be estimated. If the estimated model is a linear regression, "k" is the number of regressors, including the constant;
*"p(x|k)" = the likelihood of the observed data given the number of parameters;
*"L" = the maximized value of the likelihood function for the estimated model.

The formula for the BIC ishttp://xxx.adelaide.edu.au/PS_cache/astro-ph/pdf/0701/0701113v2.pdf] ::{-2 cdot ln{p(x|k) approx mathrm{BIC} = {-2 cdot ln{L} + k ln(n) }.

Under the assumption that the model errors or disturbances are normally distributed, this becomes (up to an additive constant, which depends only on "n" and not on the model)::mathrm{BIC} = n lnleft(frac{mathrm{RSS{n} ight) + k ln(n). where "RSS" is the residual sum of squares from the estimated model. Note that the term for -2 cdot ln{L} used is this specialization is equal to the rescaled normal loglikelihood up to an additive constant that depends only on n

Given any two estimated models, the model with the lower value of BIC is the one to be preferred. The BIC is an increasing function of "RSS" and an increasing function of "k". That is, unexplained variation in the dependent variable and the number of explanatory variables increase the value of BIC. Hence, lower BIC implies either fewer explanatory variables, better fit, or both. The BIC penalizes free parameters more strongly than does the Akaike information criterion.

It is important to keep in mind that the BIC can be used to compare estimated models only when the numerical values of the dependent variable are identical for all estimates being compared. The models being compared need not be nested, unlike the case when models are being compared using an F or likelihood ratio test.

Characteristics of the Bayesian information criterion

# It is independent of the prior.
# It can measure the efficiency of the parameterized model in terms of predicting the data.
# It penalizes the complexity of the model where complexity refers to the number of parameters in model.
# It is exactly equal to the minimum description length criterion but with negative sign.
# It can be used to choose the number of clusters according to the intrinsic complexity present in a particular dataset.
# It is closely related to other penalized likelihood criteria such as RIC and AIC.

Applications

BIC has been widely used for model identification in time series and linear regression. It can, however, be applied quite widely to any set of maximum likelihood-based models. However, it should be noted that in many applications (for example, selecting a black body or power law spectrum for an astronomical source), BIC simply reduces to maximum likelihood selection because the number of parameters is equal for the models of interest.

References

* Liddle, A.R., "Information criteria for astrophysical model selection", http://xxx.adelaide.edu.au/PS_cache/astro-ph/pdf/0701/0701113v2.pdf
* McQuarrie, A. D. R., and Tsai, C.-L., 1998. "Regression and Time Series Model Selection". World Scientific.
* Schwarz, G., 1978. "Estimating the dimension of a model". "Annals of Statistics" 6(2):461-464.

See also

*Akaike information criterion
*Bayesian model comparison
*Deviance information criterion
*Jensen-Shannon divergence
*Kullback-Leibler divergence
*Model selection

External links

* [http://econ.la.psu.edu/~hbierens/INFORMATIONCRIT.PDF Information Criteria and Model Selection]


Wikimedia Foundation. 2010.

Игры ⚽ Поможем написать реферат

Look at other dictionaries:

  • Information criterion — may refer to: *Akaike information criterion, a measure of the goodness fit of an estimated statistical model *Bayesian information criterion also known as the Schwarz information criterion, a statistical criterion for model selection *Hannan… …   Wikipedia

  • Deviance information criterion — The deviance information criterion (DIC) is a hierarchical modeling generalization of the AIC (Akaike information criterion) and BIC (Bayesian information criterion, also known as the Schwarz criterion). It is particularly useful in Bayesian… …   Wikipedia

  • Akaike Information Criterion — Ein Informationskriterium ist ein Kriterium zur Auswahl eines Modells in der angewandten Statistik bzw. der Ökonometrie. Dabei gehen die Anpassungsgüte des geschätzten Modells an die vorliegenden empirischen Daten (Stichprobe) und Komplexität des …   Deutsch Wikipedia

  • Akaike information criterion — Akaike s information criterion, developed by Hirotsugu Akaike under the name of an information criterion (AIC) in 1971 and proposed in Akaike (1974), is a measure of the goodness of fit of an estimated statistical model. It is grounded in the… …   Wikipedia

  • Hannan-Quinn information criterion — Information criteria are often used as a guide in model selection (see forexample, Grasa 1989). The Kullback Leibler quantity of information contained in a model is the distance from the“true” model and is measured by the log likelihood function …   Wikipedia

  • Deviance Information Criterion — In der Statistik ist das Abweichungsinformationskriterium (engl. deviance information criterion, DIC) ein Maß (Kriterium) für den Vorhersagefehler eines Modells. Diese Maßzahl ist ein Informationskriterium und gehört in das Umfeld der… …   Deutsch Wikipedia

  • Bayesian probability — Bayesian statistics Theory Bayesian probability Probability interpretations Bayes theorem Bayes rule · Bayes factor Bayesian inference Bayesian network Prior · Posterior · Likelihood …   Wikipedia

  • Bayesian — refers to methods in probability and statistics named after the Reverend Thomas Bayes (ca. 1702 ndash;1761), in particular methods related to: * the degree of belief interpretation of probability, as opposed to frequency or proportion or… …   Wikipedia

  • Bayesian model comparison — A common problem in statistical inference is to use data to decide between two or more competing models. Frequentist statistics uses hypothesis tests for this purpose. There are several Bayesian approaches. One approach is through Bayes… …   Wikipedia

  • Bayesian experimental design — provides a general probability theoretical framework from which other theories on experimental design can be derived. It is based on Bayesian inference to interpret the observations/data acquired during the experiment. This allows accounting for… …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”