# Jensen–Shannon divergence

- Jensen–Shannon divergence
In probability theory and statistics, the **Jensen-Shannon divergence** is a popular method of measuring the similarity between two probability distributions. It is also known as **information radius (IRad)** [*cite book |author=Hinrich Schütze; Christopher D. Manning|title=Foundations of Statistical Natural Language Processing |publisher=MIT Press |location=Cambridge, Mass |year=1999 |pages=p. 304 |isbn=0-262-13360-1 |url=http://nlp.stanford.edu/fsnlp/ |doi=*] or **total divergence to the average** [*cite journal|title=Similarity-Based Methods For Word Sense Disambiguation|journal=Proceedings of the Thirty-Fifth Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics|date=1997|first=Ido|last=Dagan|coauthors=Lillian Lee, Fernando Pereira|volume=|issue=|pages=pp. 56–63|id= |url=http://citeseer.ist.psu.edu/dagan97similaritybased.html|format=|accessdate=2008-03-09 *] . It is based on the Kullback-Leibler divergence, with the notable (and useful) difference that it is always a finite value.

**Definition**

Consider the set $M\_+^1(A)$ of probability distributions where A is a set provided with some σ-algebra.

Jensen-Shannon divergence (JSD) $M\_+^1(A)\; imes\; M\_+^1(A)\; ightarrow\; [0,1]$ is a symmetrized and smoothed version of the Kullback-Leibler divergence$D(P\; parallel\; Q)$.It is defined by

$JSD(P\; parallel\; Q)=\; frac\{1\}\{2\}D(P\; parallel\; M)+frac\{1\}\{2\}D(Q\; parallel\; M)$

where $M=frac\{1\}\{2\}(P+Q)$

**ee also**

Kullback-Leibler divergence for details about calculating the Jensen-Shannon divergence.

** References **

*Jensen-Shannon Divergence and Hilbert space embedding, Bent Fuglede and Flemming Topsøe University of Copenhagen, Department of Mathematics [*http://www.math.ku.dk/~topsoe/ISIT2004JSD.pdf*]

* J. Lin. [*http://citeseer.ist.psu.edu/context/395386/0 Divergence measures based on the shannon entropy.*] IEEE Trans. on Information Theory, 37(1):145--151, January 1991.

* Y. Ofran & B. Rost. [*http://citeseer.ist.psu.edu/ofran03analysing.html Analysing Six Types of Protein-Protein Interfaces.*] 2003.

*Wikimedia Foundation.
2010.*

### Look at other dictionaries:

**Divergence (disambiguation)** — Divergence can refer to: In mathematics: Divergence, a function that associates a scalar with every point of a vector field Divergence (computer science), a computation which does not terminate (or terminates in an exceptional state) Divergence… … Wikipedia

**Kullback–Leibler divergence** — In probability theory and information theory, the Kullback–Leibler divergence[1][2][3] (also information divergence, information gain, relative entropy, or KLIC) is a non symmetric measure of the difference between two probability distributions P … Wikipedia

**List of mathematics articles (J)** — NOTOC J J homomorphism J integral J invariant J. H. Wilkinson Prize for Numerical Software Jaccard index Jack function Jacket matrix Jackson integral Jackson network Jackson s dimensional theorem Jackson s inequality Jackson s theorem Jackson s… … Wikipedia

**List of statistics topics** — Please add any Wikipedia articles related to statistics that are not already on this list.The Related changes link in the margin of this page (below search) leads to a list of the most recent changes to the articles listed below. To see the most… … Wikipedia

**String metric** — String metrics (also known as similarity metrics) are a class of textual based metrics resulting in a similarity or dissimilarity (distance) score between two pairs of text strings for approximate matching or comparison and in fuzzy string… … Wikipedia

**Akaike information criterion** — Akaike s information criterion, developed by Hirotsugu Akaike under the name of an information criterion (AIC) in 1971 and proposed in Akaike (1974), is a measure of the goodness of fit of an estimated statistical model. It is grounded in the… … Wikipedia

**Bayesian information criterion** — In statistics, in order to describe a particular dataset, one can use non parametric methods or parametric methods. In parametric methods, there might be various candidate models with different number of parameters to represent a dataset. The… … Wikipedia

**Deviance information criterion** — The deviance information criterion (DIC) is a hierarchical modeling generalization of the AIC (Akaike information criterion) and BIC (Bayesian information criterion, also known as the Schwarz criterion). It is particularly useful in Bayesian… … Wikipedia

**JSD** — is a three letter abbreviation with multiple meanings, as described below:* Jensen Shannon divergence * Jackson System Development * Doctor of Juridical Science * Joint science department * James Simpson Daniel, English rugby union player … Wikipedia

**Rényi entropy** — In information theory, the Rényi entropy, a generalisation of Shannon entropy, is one of a family of functionals for quantifying the diversity, uncertainty or randomness of a system. It is named after Alfréd Rényi. The Rényi entropy of order α,… … Wikipedia