Variational Bayesian methods
- Variational Bayesian methods
Variational Bayesian methods, also called ensemble learning, are a family of techniques for approximating intractable integrals arising in Bayesian statistics and machine learning. They can be used to lower bound the marginal likelihood (i.e. "evidence") of several models with a view to performing model selection, and often provide an analytical approximation to the parameter posterior which is useful for prediction. It is an alternative to Monte Carlo sampling methods for making use of a posterior distribution that is difficult to sample from directly.
Mathematical derivation
In variational inference, the posterior distribution over a set of latent variables given some data is approximatedby a variational distribution
:
The variational distribution is restricted to belong to a family of distributions of simplerform than . This family is selected with the intention that can be made very similarto the true posterior. The difference between and this true posterior is measured in terms ofa dissimilarity function and hence inference is performed by selecting the distribution that minimises . One choice of dissimilarity function where this minimisation is tractableis the Kullback-Leibler divergence (KL divergence), defined as
:
We can write the log evidence as
:
As the log evidence is fixed with respect to , maximising the final term will minimise the KL divergence between and . By appropriate choice of , we can make tractable to compute and to maximise. Hence we have both a lower bound on the evidence and an analytical approximation to the posterior .
ee also
* Variational message passing: a modular algorithm for variational Bayesian inference.
* Expectation-maximization algorithm: a related approach which corresponds to a special case of variational Bayesian inference.
External links
* [http://www.variational-bayes.org Variational-Bayes.org] - a repository of papers, software, and links related to the use of variational Bayesian methods.
* [http://www.inference.phy.cam.ac.uk/mackay/itila/ The on-line textbook: Information Theory, Inference, and Learning Algorithms] , by David J.C. MacKay provides an introduction to variational methods (p. 422).
* [http://www.cse.buffalo.edu/faculty/mbeal/thesis/index.html Variational Algorithms for Approximate Bayesian Inference] , by M. J. Beal includes comparisons of EM to Variational Bayesian EM and derivations of several models including Variational Bayesian HMMs.
Wikimedia Foundation.
2010.
Look at other dictionaries:
Bayesian probability — Bayesian statistics Theory Bayesian probability Probability interpretations Bayes theorem Bayes rule · Bayes factor Bayesian inference Bayesian network Prior · Posterior · Likelihood … Wikipedia
Bayesian inference — is statistical inference in which evidence or observations are used to update or to newly infer the probability that a hypothesis may be true. The name Bayesian comes from the frequent use of Bayes theorem in the inference process. Bayes theorem… … Wikipedia
Variational message passing — (VMP) is an approximate inference technique for continuous valued Bayesian networks, with conjugate exponential parents, developed by John Winn in his PhD thesis. John Winn is now at Microsoft Research. VMP was developed as a means of… … Wikipedia
Bayesian network — A Bayesian network, Bayes network, belief network or directed acyclic graphical model is a probabilistic graphical model that represents a set of random variables and their conditional dependencies via a directed acyclic graph (DAG). For example … Wikipedia
Monte Carlo methods for electron transport — The Monte Carlo method for electron transport is a semiclassical Monte Carlo(MC) approach of modeling semiconductor transport. Assuming the carrier motion consists of free flights interrupted by scattering mechanisms, a computer is utilized to… … Wikipedia
One-shot learning — is an object categorization problem of current research interest in computer vision. Whereas most machine learning based object categorization algorithms require training on hundreds or thousands of images and very large datasets, one shot… … Wikipedia
List of mathematics articles (V) — NOTOC Vac Vacuous truth Vague topology Valence of average numbers Valentin Vornicu Validity (statistics) Valuation (algebra) Valuation (logic) Valuation (mathematics) Valuation (measure theory) Valuation of options Valuation ring Valuative… … Wikipedia
Expectation-maximization algorithm — An expectation maximization (EM) algorithm is used in statistics for finding maximum likelihood estimates of parameters in probabilistic models, where the model depends on unobserved latent variables. EM alternates between performing an… … Wikipedia
Байесовская вероятность — Байесовская вероятность это интерпретация понятия вероятности, используемое в байесовской теории. Вероятность определяется как степень уверенности в истинности суждения. Для определения степени уверенности в истинности суждения при… … Википедия
Latent Dirichlet allocation — In statistics, latent Dirichlet allocation (LDA) is a generative model that allows sets of observations to be explained by unobserved groups which explain why some parts of the data are similar. For example, if observations are words collected… … Wikipedia