Jensen–Shannon divergence

In probability theory and statistics, the Jensen-Shannon divergence is a popular method of measuring the similarity between two probability distributions. It is also known as information radius (IRad) [cite book |author=Hinrich Schütze; Christopher D. Manning|title=Foundations of Statistical Natural Language Processing |publisher=MIT Press |location=Cambridge, Mass |year=1999 |pages=p. 304 |isbn=0-262-13360-1 |url= |doi=] or total divergence to the average [cite journal|title=Similarity-Based Methods For Word Sense Disambiguation|journal=Proceedings of the Thirty-Fifth Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics|date=1997|first=Ido|last=Dagan|coauthors=Lillian Lee, Fernando Pereira|volume=|issue=|pages=pp. 56–63|id= |url=|format=|accessdate=2008-03-09 ] . It is based on the Kullback-Leibler divergence, with the notable (and useful) difference that it is always a finite value.


Consider the set M_+^1(A) of probability distributions where A is a set provided with some σ-algebra.

Jensen-Shannon divergence (JSD) M_+^1(A) imes M_+^1(A) ightarrow [0,1] is a symmetrized and smoothed version of the Kullback-Leibler divergenceD(P parallel Q).It is defined by

JSD(P parallel Q)= frac{1}{2}D(P parallel M)+frac{1}{2}D(Q parallel M)

where M=frac{1}{2}(P+Q)

ee also

Kullback-Leibler divergence for details about calculating the Jensen-Shannon divergence.


*Jensen-Shannon Divergence and Hilbert space embedding, Bent Fuglede and Flemming Topsøe University of Copenhagen, Department of Mathematics []
* J. Lin. [ Divergence measures based on the shannon entropy.] IEEE Trans. on Information Theory, 37(1):145--151, January 1991.
* Y. Ofran & B. Rost. [ Analysing Six Types of Protein-Protein Interfaces.] 2003.

