- Markov network
A Markov network, or Markov random field, is a model of the (full)
joint probability distribution of a set ofrandom variables having theMarkov property . A Markov network is similar to aBayesian network in its representation of dependencies. It can represent certain dependencies that a Bayesian network cannot (such as cyclic dependencies); on the other hand, it can't represent certain dependencies that a Bayesian network can (such as induced dependencies). The prototypical Markov network is theIsing model ; indeed, the Markov network was introduced as the general setting for the Ising model [Ross Kindermann and J. Laurie Snell, [http://www.ams.org/online_bks/conm1/ Markov Random Fields and Their Applications] (1980) American Mathematical Society, ISBN 0-8218-5001-6] .Formal Definition
Formally, a Markov network consists of:
* an
undirected graph "G" = ("V","E"), where each vertex "v" ∈"V" represents a random variable in and each edge {"u","v"} ∈ "E" represents a dependency between the random variables "u" and "v",
* a set of functions (also called "factors" or "clique factors" and sometimes "features"), where each has the domain of some clique (or subclique) "k" in "G". Each is a mapping from possible joint assignments (to the elements of "k") tonon-negative real number s.Joint Distribution Function
The joint distribution (or
Gibbs measure ) represented by a Markov network is given by::
where is the state of the random variables in the "k"th clique, and the product runs over all the cliques in the graph. Here, is the partition function, so that
:.
In practice, a Markov network is often conveniently expressed as a log-linear model, given by
:
so that
:
with partition function
:.
In this context, the 's are weights and the 's are
potential function s from the clique to the reals. These functions are also sometimes called Gibbs potentials; the term "potential" comes from physics, where these can be literally interpreted as thepotential energy between nearest neighbors.Log-linear models are especially convenient for their interpretation. A log-linear model can provide a much more compact representation for many distributions, especially when variables have large domains. They are convenient too because their negative log likelihoods are convex. Unfortunately, though the likelihood of a log-linear Markov network is convex, evaluating the likelihood or gradient of the likelihood of a model requires inference in the model, which is in general computationally infeasible.
Markov property
Markov networks have the
Markov property : the probability that a vertex "u" of the graph is in the state depends only on the nearest neighbors of the vertex "u"; the vertex isconditionally independent of all of the other verticies in the graph. One writes this as:
The set of the nearest neighbors of "u" is sometimes referred to as the
Markov blanket of "u".Conversely, one may "define" a Markov network to be a collection of random variables having the Markov property; in this case, it may be proven that such a network can "always" be written in terms of the Gibbs measure, as shown above. The potential thus obtained is unique, up to an additive constant; this potential is called the canonical potential. Precisely, there is at least one state of the variables "X" which has maximum probability; this state is referred to as the
ground state . The canonical potential is defined so that the potential of the ground state is zero.Inference
As in a Bayesian network, one may calculate the
conditional distribution of a set of nodes given values to another set of nodes in the Markov network by summing over all possible assignments to ; this is calledexact inference . However, exact inference is a #P-complete problem, and thus computationally intractable in the general case. Approximation techniques such asMarkov chain Monte Carlo and loopybelief propagation are often more feasible in practice. Some particular subclasses of MRFs, such as trees, have polynomial-time inference algorithms; discovering such subclasses is an active research topic. There are also subclasses of MRFs which permit efficient MAP, or most likely assignment, inference; examples of these include associative networks.Conditional Random Fields
One notable variant of a Markov network is a
conditional random field , in which each random variable may also be conditioned upon a set of global observations . In this model, each function is a mapping from all assignments to both the clique "k" and the observations to the nonnegative real numbers. This form of the Markov network may be more appropriate for producing discriminative classifiers, which do not model the distribution over the observations.See also
*
Maximum entropy method
*Hopfield network
*Graphical model
*Markov chain
*Markov logic network References
Wikimedia Foundation. 2010.