- Bayesian information criterion
In
statistics , in order to describe a particular dataset, one can use non-parametric methods or parametric methods. In parametric methods, there might be various candidate models with different number of parameters to represent a dataset. The number of parameters in a model plays an important role. The likelihood of the training data is increased when the number of parameters in the model is increased but it might result in overtraining problem if the number of parameters is too large. In order to overcome this problem one can use Bayesian information criterion (parametric method) which is a statistical criterion formodel selection .The BIC is sometimes also named the Schwarz Criterion, or Schwarz Information Criterion (SIC). It is so named because Gideon E. Schwarz (
1978 ) gave a Bayesian argument for adopting it.Mathematically
The BIC is an asymptotic result derived under the assumptions that the data distribution is in the
exponential family . Let:
*"x" = the observed data;
*"n" = the number of data points in x, the number ofobservation s, or equivalently, the sample size;
*"k" = the number of freeparameter s to be estimated. If the estimated model is alinear regression , "k" is the number ofregressor s, including the constant;
*"p(x|k)" = thelikelihood of the observed data given the number of parameters;
*"L" = the maximized value of thelikelihood function for the estimated model.The formula for the BIC ishttp://xxx.adelaide.edu.au/PS_cache/astro-ph/pdf/0701/0701113v2.pdf] ::
Under the assumption that the model errors or disturbances are normally distributed, this becomes (up to an additive constant, which depends only on "n" and not on the model)::where "RSS" is the
residual sum of squares from the estimated model. Note that the term for used is this specialization is equal to the rescaled normal loglikelihood up to an additive constant that depends only onGiven any two estimated models, the model with the lower value of BIC is the one to be preferred. The BIC is an increasing function of "RSS" and an increasing function of "k". That is, unexplained variation in the
dependent variable and the number of explanatory variables increase the value of BIC. Hence, lower BIC implies either fewer explanatory variables, better fit, or both. The BIC penalizes free parameters more strongly than does theAkaike information criterion .It is important to keep in mind that the BIC can be used to compare estimated models only when the numerical values of the dependent variable are identical for all estimates being compared. The models being compared need not be nested, unlike the case when models are being compared using an F or
likelihood ratio test .Characteristics of the Bayesian information criterion
# It is independent of the prior.
# It can measure the efficiency of the parameterized model in terms of predicting the data.
# It penalizes the complexity of the model where complexity refers to the number of parameters in model.
# It is exactly equal to theminimum description length criterion but with negative sign.
# It can be used to choose the number of clusters according to the intrinsic complexity present in a particular dataset.
# It is closely related to other penalized likelihood criteria such as RIC and AIC.Applications
BIC has been widely used for model identification in time series and linear regression. It can, however, be applied quite widely to any set of
maximum likelihood -based models. However, it should be noted that in many applications (for example, selecting ablack body orpower law spectrum for an astronomical source), BIC simply reduces to maximum likelihood selection because the number of parameters is equal for the models of interest.References
* Liddle, A.R., "Information criteria for astrophysical model selection", http://xxx.adelaide.edu.au/PS_cache/astro-ph/pdf/0701/0701113v2.pdf
* McQuarrie, A. D. R., and Tsai, C.-L., 1998. "Regression and Time Series Model Selection". World Scientific.
* Schwarz, G., 1978. "Estimating the dimension of a model". "Annals of Statistics" 6(2):461-464.See also
*
Akaike information criterion
*Bayesian model comparison
*Deviance information criterion
*Jensen-Shannon divergence
*Kullback-Leibler divergence
*Model selection External links
* [http://econ.la.psu.edu/~hbierens/INFORMATIONCRIT.PDF Information Criteria and Model Selection]
Wikimedia Foundation. 2010.