Variational message passing

Variational message passing

Variational message passing (VMP) is an approximate inference technique for continuous-valued Bayesian networks, with conjugate-exponential parents, developed by John Winn in his PhD thesis. John Winn is now at Microsoft Research. VMP was developed as a means of generalizing the approximate variational methods used by such techniques as Latent Dirichlet allocation and works by updating an approximate distribution at each node through messages in the node's Markov blanket.

Likelihood Lower Bound

Given some set of hidden variables H and observed variables V, the goal of approximate inference is to lower-bound the probability that a graphical model is in the configuration V. Over some probability distribution Q (to be defined later),

: ln P(V) = sum_H Q(H) ln frac{P(H,V)}{P(H|V)} = sum_{H} Q(H) [ ln frac{P(H,V)}{Q(H)} - ln frac{P(H|V)}{Q(H)} ] .

So, if we define our lower bound to be

: L(Q) = sum_{H} Q(H) ln frac{P(H,V)}{Q(H)} ,

then the likelihood is simply this bound plus the relative entropy between P and Q. Because the relative entropy is non-negative, the function L defined above is indeed a lower bound of the log likelihood of our observation V. The distribution Q will have a simpler character than that of P because marginalizing over P is intractable for all but the simplest of graphical models. In particular, VMP uses a factorized distribution Q:

: Q(H) = prod_i Q_i(H_i)

where H_i is a disjoint part of the graphical model.

Determining the Update Rule

The likelihood estimate needs to be as large as possible; because it's a lower bound, getting closer log P improves the approximation of the log likelihood. By substituting in the factorized version of Q, L(Q), parameterized over the hidden nodes H_i as above, is simply the negative relative entropy between Q_j and Q_j^* plus other terms independent of Q_j if Q_j^* is defined as as

:Q_j^*(H_j) = frac{1}{Z} e^{mathbb{E}_{-j}{ln P(H,V) ,

where mathbb{E}_{-j}{ln P(H,V)} is the expectation over all distributions Q_i except Q_j. Thus, if we set Q_j to be Q_j^*, the bound L is maximized.

Messages in Variational Message Passing

Parents send their children the expectation of their sufficient statistic while children send their parents their natural parameter, which also requires messages to be sent from the co-parents of the node.

Relationship to Exponential Families

Because all nodes in VMP come from exponential families and all parents of nodes are conjugate to their children nodes, the expectation of the sufficient statistic can be computed from the normalization factor.

VMP Algorithm

The algorithm begins by computing the expected value of the sufficient statistics for that vector. Then, until the likelihood converges to a stable value (this is usually accomplished by setting a small threshold value and running the algorithm until it increases by less than that threshold value), do the following at each node:

  1. Get all messages from parents
  2. Get all messages from children (this might require the children to get messages from the co-parents)
  3. Compute the expected value of the nodes sufficient statistics

Constraints

Because every child must be conjugate to its parent, this limits the types of distributions that can be used in the model. For example, the parents of a Gaussian distribution must be a Gaussian distribution (corresponding to the Mean) and a gamma distribution (corresponding to the precision, or one over sigma in more common paramaterizations). Discrete variables can have Dirichlet parents, and Poisson and exponential nodes must have gamma parents. However, if the data can be modeled in this manner, VMP offers a generalized framework for providing inference.

External links

* An [http://vibes.sourceforge.net/ implementation] of VMP with usage examples

References

* M. J. Beal (2003). [http://www.cs.toronto.edu/~beal/thesis/beal03.pdf Variational Algorithms for Approximate Bayesian Inference] . PhD Thesis, [http://www.gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit] , University College London, 2003.
* J. M. Winn and C. Bishop (2005). [http://research.microsoft.com/users/Cambridge/jwinn/papers/VMP2005.pdf Variational Message Passing] . Journal of Machine Learning Research 6: 661-694.


Wikimedia Foundation. 2010.

Игры ⚽ Нужно сделать НИР?

Look at other dictionaries:

  • Message-passing method — Message passing methods are a set of algorithms in statistics/machine learning for doing inference through local computation. Belief propagation on Bayesian networks is a good example of a message passing method. Variational message passing… …   Wikipedia

  • Variational Bayesian methods — Variational Bayesian methods, also called ensemble learning, are a family of techniques for approximating intractable integrals arising in Bayesian statistics and machine learning. They can be used to lower bound the marginal likelihood (i.e.… …   Wikipedia

  • Passing — may refer to:ociology*Passing (sociology), presenting oneself as a member of another sociological group *Passing (gender), presenting oneself as a member of the opposite gender *Passing (racial identity), presenting oneself as a member of another …   Wikipedia

  • One-shot learning — is an object categorization problem of current research interest in computer vision. Whereas most machine learning based object categorization algorithms require training on hundreds or thousands of images and very large datasets, one shot… …   Wikipedia

  • Belief propagation — is a message passing algorithm for performing inference on graphical models, such as Bayesian networks and Markov random fields. It calculates the marginal distribution for each unobserved node, conditional on any observed nodes. Belief… …   Wikipedia

  • Expectation-maximization algorithm — An expectation maximization (EM) algorithm is used in statistics for finding maximum likelihood estimates of parameters in probabilistic models, where the model depends on unobserved latent variables. EM alternates between performing an… …   Wikipedia

  • Bayesian inference — is statistical inference in which evidence or observations are used to update or to newly infer the probability that a hypothesis may be true. The name Bayesian comes from the frequent use of Bayes theorem in the inference process. Bayes theorem… …   Wikipedia

  • Bayesian probability — Bayesian statistics Theory Bayesian probability Probability interpretations Bayes theorem Bayes rule · Bayes factor Bayesian inference Bayesian network Prior · Posterior · Likelihood …   Wikipedia

  • Application checkpointing — Checkpointing is a technique for inserting fault tolerance into computing systems. It basically consists of storing a snapshot of the current application state, and later on, use it for restarting the execution in case of failure. Contents 1… …   Wikipedia

  • Байесовская вероятность — Байесовская вероятность  это интерпретация понятия вероятности, используемое в байесовской теории. Вероятность определяется как степень уверенности в истинности суждения. Для определения степени уверенности в истинности суждения при… …   Википедия

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”