Divergence (statistics)

In statistics and information geometry, divergence or a contrast function is a function which establishes the “distance” of one probability distribution to the other on a statistical manifold. The divergence is a weaker notion than that of the distance in mathematics, in particular the divergence need not be symmetric (that is, in general the divergence from p to q is not equal to the divergence from q to p), and need not satisfy the triangle inequality.

1 Definition
2 Geometrical properties
3 Examples
4 See also
5 References

Definition

Suppose S is a space of all probability distributions with common support. Then a divergence on S is a function D(· || ·): S×S → R satisfying ^[1]

D(p || q) ≥ 0 for all p, q ∈ S,
D(p || q) = 0 if and only if p = q,
The matrix g^(D) (see definition in the “geometrical properties” section) is strictly positive-definite everywhere on S.^[2]

The dual divergence D* is defined as

$D^*(p \parallel q) = D(q \parallel p).$

Geometrical properties

Many properties of divergences can be derived if we restrict S to be a statistical manifold, meaning that it can be parametrized with a finite-dimensional coordinate system θ, so that for a distribution p ∈ S we can write p = p(θ).

For a pair of points p, q ∈ S with coordinates θ_p and θ_q, denote the partial derivatives of D(p || q) as

$\begin{align} D((\partial_i)_p \parallel q) \ \ &\stackrel\mathrm{def}=\ \ \tfrac{\partial}{\partial\theta^i_p} D(p \parallel q), \\ D((\partial_i\partial_j)_p \parallel (\partial_k)_q) \ \ &\stackrel\mathrm{def}=\ \ \tfrac{\partial}{\partial\theta^i_p} \tfrac{\partial}{\partial\theta^j_p}\tfrac{\partial}{\partial\theta^k_q}D(p \parallel q), \ \ \mathrm{etc.} \end{align}$

Now we restrict these functions to a diagonal p = q, and denote ^[3]

$\begin{align} D[\partial_i\parallel\cdot]\ &:\ p \mapsto D((\partial_i)_p \parallel p), \\ D[\partial_i\parallel\partial_j]\ &:\ p \mapsto D((\partial_i)_p \parallel (\partial_j)_p),\ \ \mathrm{etc.} \end{align}$

By definition, the function D(p || q) is minimized at p = q, and therefore

$\begin{align} & D[\partial_i\parallel\cdot] = D[\cdot\parallel\partial_i] = 0, \\ & D[\partial_i\partial_j\parallel\cdot] = D[\cdot\parallel\partial_i\partial_j] = -D[\partial_i\parallel\partial_j] \ \equiv\ g_{ij}^{(D)}, \end{align}$

where matrix g^(D) is positive semi-definite and defines a unique Riemannian metric on the manifold S.

Divergence D(· || ·) also defines a unique torsion-free affine connection ∇^(D) with coefficients

$\Gamma_{ij,k}^{(D)} = -D[\partial_i\partial_j\parallel\partial_k],$

and the dual to this connection ∇* is generated by the dual divergence D*.

Thus, a divergence D(· || ·) generates on a statistical manifold a unique dualistic structure (g^(D), ∇^(D), ∇^(D*)). The converse is also true: every torsion-free dualistic structure on a statistical manifold is induced from some globally defined divergence function (which however need not be unique).^[4]

For example, when D is an f-divergence for some function ƒ(·), then it generates the metric g^(D_f) = c·g and the connection ∇^(D_f) = ∇^(α), where g is the canonical Fisher information metric, ∇^(α) is the α-connection, c = ƒ′′(1), and α = 3 + 2ƒ′′′(1)/ƒ′′(1).

Examples

The largest and most frequently used class of divergences form the so-called f-divergences, however other types of divergence functions are also encountered in the literature.

f-divergences

Main article: f-divergence

This family of divergences are generated through functions f(u), convex on u > 0 and such that f(1) = 0. Then an f-divergence is defined as

$D_f(p\parallel q) = \int p(x)f\bigg(\frac{q(x)}{p(x)}\bigg) dx$

Kullback-Leibler divergence:	$D_\mathrm{KL}(p \parallel q) = \int p(x)\big( \ln p(x) - \ln q(x) \big) dx$
squared Hellinger distance:	$H^2(p,\, q) = 2 \int \Big( \sqrt{p(x)} - \sqrt{q(x)}\, \Big)^2 dx$
Jeffrey’s divergence:	$D_J(p \parallel q) = \int (p(x) - q(x))\big( \ln p(x) - \ln q(x) \big) dx$
Chernoff’s α-divergence:	$D^{(\alpha)}(p \parallel q) = \frac{4}{1-\alpha^2}\bigg(1 - \int p(x)^\frac{1-\alpha}{2} q(x)^\frac{1+\alpha}{2} dx \bigg)$
exponential divergence:	$D_e(p \parallel q) = \int p(x)\big( \ln p(x) - \ln q(x) \big)^2 dx$
Kagan’s divergence:	$D_{\chi^2}(p \parallel q) = \frac12 \int \frac{(p(x) - q(x))^2}{p(x)} dx$
(α,β)-product divergence:	$D_{\alpha,\beta}(p \parallel q) = \frac{2}{(1-\alpha)(1-\beta)} \int \Big(1 - \Big(\tfrac{q(x)}{p(x)}\Big)^{\!\!\frac{1-\alpha}{2}} \Big) \Big(1 - \Big(\tfrac{q(x)}{p(x)}\Big)^{\!\!\frac{1-\beta}{2}} \Big) p(x) dx$

M-divergences

S-divergences

Amari, Shun-ichi; Nagaoka, Hiroshi (2000). Methods of information geometry. Oxford University Press. ISBN 0-8218-0531-2.
Eguchi, Shinto (1985). "A differential geometric approach to statistical inference on the basis of contrast functionals". Hiroshima mathematical journal 15 (2): 341–391. http://projecteuclid.org/euclid.hmj/1206130775.
Eguchi, Shinto (1992). "Geometry of minimum contrast". Hiroshima mathematical journal 22 (3): 631–647. http://projecteuclid.org/euclid.hmj/1206128508.
Matumoto, Takao (1993). "Any statistical manifold has a contrast function — on the C³-functions taking the minimum at the diagonal of the product manifold". Hiroshima mathematical journal 23 (2): 327–332. http://projecteuclid.org/euclid.hmj/1206128255.

Categories:

Statistical distance measures
F-divergences

Wikimedia Foundation. 2010.

Игры ⚽ Нужен реферат?

Look at other dictionaries:

Divergence (disambiguation) — Divergence can refer to: In mathematics: Divergence, a function that associates a scalar with every point of a vector field Divergence (computer science), a computation which does not terminate (or terminates in an exceptional state) Divergence… … Wikipedia
Divergence De Kullback-Leibler — En théorie des probabilités et en théorie de l information, la divergence de Kullback Leibler[1] [2] (ou divergence K L ou encore Entropie relative) est une mesure de dissimilarité entre deux distributions de probabilités P et Q. Elle doit son… … Wikipédia en Français
Divergence de kullback-leibler — En théorie des probabilités et en théorie de l information, la divergence de Kullback Leibler[1] [2] (ou divergence K L ou encore Entropie relative) est une mesure de dissimilarité entre deux distributions de probabilités P et Q. Elle doit son… … Wikipédia en Français
Divergence de Kullback-Leibler — En théorie des probabilités et en théorie de l information, la divergence de Kullback Leibler[1], [2] (ou divergence K L ou encore entropie relative) est une mesure de dissimilarité entre deux distributions de probabilités P et Q. Elle doit son… … Wikipédia en Français
Kullback–Leibler divergence — In probability theory and information theory, the Kullback–Leibler divergence[1][2][3] (also information divergence, information gain, relative entropy, or KLIC) is a non symmetric measure of the difference between two probability distributions P … Wikipedia
List of statistics topics — Please add any Wikipedia articles related to statistics that are not already on this list.The Related changes link in the margin of this page (below search) leads to a list of the most recent changes to the articles listed below. To see the most… … Wikipedia
Jensen–Shannon divergence — In probability theory and statistics, the Jensen Shannon divergence is a popular method of measuring the similarity between two probability distributions. It is also known as information radius (IRad) [cite book |author=Hinrich Schütze;… … Wikipedia
F-divergence — In probability theory, an f divergence is a function I f ( P , Q ) that measures the difference between two probability distributions P and Q . The divergence is intuitively an average of the function f of the odds ratio given by P and Q .These… … Wikipedia
Bose–Einstein statistics — In statistical mechanics, Bose Einstein statistics (or more colloquially B E statistics) determines the statistical distribution of identical indistinguishable bosons over the energy states in thermal equilibrium.ConceptBosons, unlike fermions,… … Wikipedia
Entropie relative — Divergence de Kullback Leibler En théorie des probabilités et en théorie de l information, la divergence de Kullback Leibler[1] [2] (ou divergence K L ou encore Entropie relative) est une mesure de dissimilarité entre deux distributions de… … Wikipédia en Français

Academic Dictionaries and Encyclopedias

Divergence (statistics)

Contents

Definition

Geometrical properties