Markov random field

A Markov random field, Markov network or undirected graphical model is a set of variables having a Markov property described by an undirected graph. A Markov random field is similar to a Bayesian network in its representation of dependencies. It can represent certain dependencies that a Bayesian network cannot (such as cyclic dependencies); on the other hand, it can't represent certain dependencies that a Bayesian network can (such as induced dependencies). The prototypical Markov random field is the Ising model; indeed, the Markov random field was introduced as the general setting for the Ising model.^[1] A Markov random field is used to model various low- to mid-level tasks in image processing and computer vision.^[2] For example, MRFs are used for image restoration, image completion, segmentation, texture synthesis, super-resolution and stereo matching.

1 Definition
2 Clique factorization
3 Examples
- 3.1 Log-linear models
- 3.2 Gaussian Markov random field
4 Inference
5 Conditional random fields
6 See also
7 References

Definition

Given an undirected graph G = (V, E), a set of random variables X = (X_v)_v ∈ V indexed by V form a Markov random field with respect to G if they satisfy the following equivalent Markov properties:

Pairwise Markov property: Any two non-adjacent variables are conditionally independent given all other variables:

$X_u \perp\!\!\!\perp X_v | X_{V \setminus \{u,v\}} \quad \text{if } \{u,v\} \notin E$

Local Markov property: A variable is conditionally independent of all other variables given its neighbours:

$X_v \perp\!\!\!\perp X_{V\setminus \operatorname{cl}(v)} | X_{\operatorname{ne}(v)}$

where ne(v) is the set of neighbours of v, and cl(v) = {v} ∪ ne(v) is the closed neighbourhood of v.

Global Markov property: Any two subsets of variables are conditionally independent given a separating subset:

$X_A \perp\!\!\!\perp X_B | X_S$

where every path from a node in A to a node in B passes through S.

In other words, a graph G is considered a Markov random field with respect to the joint probability distribution P(X=x) over a set of random variables X if and only if graph separation in G implies conditional independence: If two nodes corresponding to $X_i \in X$ and $X_j \in X$ are separated in G after removing from G a set of nodes Z, then P(X=x) must assert that $X i$ and $X j$ are conditionally independent given the random variables corresponding to Z. If this condition is satisfied, we say that G is an independency map (or I-Map) of the probability distribution.

Many definitions go even further and require that G be a minimal I-Map, i.e. an I-Map from which none of the edges could be removed without destroying its I-Mapness. (It is reasonable to require this as it leads to the most compact representation, which involves as few dependencies as possible; note that a complete graph is trivially an I-Map.) In the case where G is not only an I-Map (i.e. it represents no independencies that are not indicated by P(X=x)) but also represents no dependencies that are not indicated by P(X=x), G is called a perfect map of P(X=x), as it represents precisely the set of independencies indicated by P(X=x).

Clique factorization

As the Markov properties of an arbitrary probability distribution can be difficult to establish, a commonly used class of Markov random fields are those that can be factorized according to the cliques of the graph.

Given a set of random variables X = (X_v)_v ∈ V for which the joint density (with respect to a product measure) can be factorized over the cliques of G:

$p(x) = \prod_{C \in \operatorname{cl}(G)} \phi_C (x_C)$

then X forms a Markov random field with respect to G, where cl(G) is the set of cliques of G (the definition is equivalent if only maximal cliques are used). The functions φ_C are often referred to as factor potentials or clique potentials.

Although some MRFs do not factorize (a simple example can be constructed on a cycle of 4 nodes^[3]), in certain cases they can be shown to be equivalent conditions:

if the density is positive (by the Hammersley–Clifford theorem),
if the graph is chordal (by equivalence to a Bayesian network).

When such a factorization does exist, it is possible to construct a factor graph for the network.

Examples

Log-linear models

A log-linear model is a Markov random field with feature functions $f k$ such that the full-joint distribution can be written as

$P(X=x) = \frac{1}{Z} \exp \left( \sum_{k} w_k^{\top} f_k (x_{ \{ k \}}) \right) = \frac{1}{Z} \exp \left( \sum_k \sum_{i=1}^{N_k} w_{k,i} \cdot f_{k,i}(x_{\{k\}}) \right)$

with partition function

$Z = \sum_{x \isin \mathcal{X}} \exp \left(\sum_{k} w_k^{\top} f_k(x_{ \{ k \} })\right)$ .

where $\mathcal{X}$ is the set of possible assignments of values to all the network's random variables. Usually, the feature functions $f k, i$ are defined such that they are indicators of the clique's configuration, i.e. $f k, i (x {k}) = 1$ if $x {k}$ corresponds to the i-th possible configuration of the k-th clique and 0 otherwise. This model is thus equivalent to the general formulation above if $N k = | d o m (C k) |$ and the weight of a feature $f k, i$ corresponds to the logarithm of the corresponding clique potential, i.e. $w k, i = log ϕ(c k, i)$ , where $c k, i$ is the i-th possible configuration of the k-th clique, i.e. the i-th value in the domain of the clique $C k$ . Note that this is only possible if all clique potentials are non-zero, i.e. if none of the elements of $\mathcal{X}$ are assigned a probability of 0.

Log-linear models are especially convenient for their interpretation. A log-linear model can provide a much more compact representation for many distributions, especially when variables have large domains. They are convenient too because their negative log likelihoods are convex. Unfortunately, though the likelihood of a log-linear Markov network is convex, evaluating the likelihood or gradient of the likelihood of a model requires inference in the model, which is in general computationally infeasible.

Gaussian Markov random field

A multivariate normal distribution forms a Markov random field with respect to a graph G = (V, E) if the missing edges correspond to zeros on the precision matrix (the inverse covariance matrix):

$X=(X_v)_{v\in V} \sim \mathcal N (\boldsymbol \mu, \Sigma) \qquad \text{such that} \qquad (\Sigma^{-1})_{uv} =0 \quad \text{if} \quad \{u,v\} \notin E .$ ^[4]

Inference

As in a Bayesian network, one may calculate the conditional distribution of a set of nodes $V' = {v 1,..., v i}$ given values to another set of nodes $W' = {w 1,..., w j}$ in the Markov random field by summing over all possible assignments to $u \notin V',W'$ ; this is called exact inference. However, exact inference is a #P-complete problem, and thus computationally intractable in the general case. Approximation techniques such as Markov chain Monte Carlo and loopy belief propagation are often more feasible in practice. Some particular subclasses of MRFs, such as trees see Chow-Liu tree, have polynomial-time inference algorithms; discovering such subclasses is an active research topic. There are also subclasses of MRFs that permit efficient MAP, or most likely assignment, inference; examples of these include associative networks.

Conditional random fields

One notable variant of a Markov random field is a conditional random field, in which each random variable may also be conditioned upon a set of global observations $o$ . In this model, each function $φ k$ is a mapping from all assignments to both the clique k and the observations $o$ to the nonnegative real numbers. This form of the Markov network may be more appropriate for producing discriminative classifiers, which do not model the distribution over the observations.

References

^ Kindermann, Ross; Snell, J. Laurie (1980). Markov Random Fields and Their Applications. American Mathematical Society. ISBN 0-8218-5001-6. MR 0620955. http://www.ams.org/online_bks/conm1/.
^ Li, S. Z. (2009). Markov Random Field Modeling in Image Analysis. Springer.
^ Moussouris, John (1974). "Gibbs and Markov random systems with constraints". Journal of Statistical Physics 10 (1): 11–33. doi:10.1007/BF01011714. MR 0432132.
^ Rue, Håvard; Held, Leonhard (2005). Gaussian Markov random fields: theory and applications. CRC Press. ISBN 1584884320.

Categories:

Wikimedia Foundation. 2010.

Игры ⚽ Нужна курсовая?

Look at other dictionaries:

Markov Random Field — Ein Markov Random Field (kurz MRF) oder Markow Netzwerk ist ein nach dem Mathematiker A. Markow benanntes statistisches Modell, welches ungerichtete Zusammenhänge (z. B. die Ausrichtung von Elementarmagneten) in einem Feld beschreibt. Das… … Deutsch Wikipedia
Random field — A random field is a generalization of a stochastic process such that the underlying parameter need no longer be a simple real, but can instead be a multidimensional vector space or even a manifold.At its most basic, discrete case, a random field… … Wikipedia
Conditional random field — A conditional random field (CRF) is a statistical modelling method often applied in pattern recognition. More specifically it is a type of discriminative undirected probabilistic graphical model. It is used to encode known relationships between… … Wikipedia
Conditional Random Field — Ein Conditional Random Field (CRF) ist ein ungerichtetes graphisches Modell. Oft werden CRFs zum Taggen von sequentiellen Daten verwendet. Das bedeutet, das CRF erhält eine Sequenz X als Eingabe und gibt eine gleichlange Sequenz Y aus. Im… … Deutsch Wikipedia
Markov model — In probability theory, a Markov model is a stochastic model that assumes the Markov property. Generally, this assumption enables reasoning and computation with the model that would otherwise be intractable. Contents 1 Introduction 2 Markov chain… … Wikipedia
Markov network — A Markov network, or Markov random field, is a model of the (full) joint probability distribution of a set mathcal{X} of random variables having the Markov property. A Markov network is similar to a Bayesian network in its representation of… … Wikipedia
Markov property — In probability theory and statistics, the term Markov property refers to the memoryless property of a stochastic process. It was named after the Russian mathematician Andrey Markov.[1] A stochastic process has the Markov property if the… … Wikipedia
Markov-Filter — Der Markow Filter (nach Andrei Andrejewitsch Markow) ist ein Spamfilter basierend auf einem Verborgenen Markow Modell und stellt eine Weiterentwicklung des Bayes Filters dar. Während bei einem Bayes Filter die Wahrscheinlichkeit einzelner Wörter… … Deutsch Wikipedia
Markov chain geostatistics — refer to the Markov chain models, simulation algorithms and associated spatial correlation measures (e.g., transiogram) based on the Markov chain random field theory, which extends a single Markov chain into a multi dimensional field for… … Wikipedia
Markov chain — A simple two state Markov chain. A Markov chain, named for Andrey Markov, is a mathematical system that undergoes transitions from one state to another, between a finite or countable number of possible states. It is a random process characterized … Wikipedia

Academic Dictionaries and Encyclopedias

Markov random field

Contents

Definition

Clique factorization