- Inequalities in information theory
Inequalities are very important in the study of
information theory. There are a number of different contexts in which these inequalities appear.
Consider a finite collection of finitely (or at most countably) supported
random variables on the same probability space. For a collection of "n" random variables, there are such non-empty subsets for which entropies can be defined. For example, when we may consider the entropies and and express the following inequalities (which together characterize the range of the marginal and joint entropies of two random variables):
* In fact, these can all be expressed as special cases of a single inequality involving the
conditional mutual information, namely:where , , and each denote the joint distribution of some arbitrary subset of our collection of random variables. Inequalities that can be derived from this are known as Shannon-type inequalities. More formally, (following the notation of Yeung), define to be the set of all "constructable" points in where a point is said to be constructable if and only if there is a joint, discrete distribution of "n" random variables such that each coordinate of that point, indexed by a non-empty subset of is equal to the joint entropy of the corresponding subset of the "n" random variables. The closure of is denoted
The cone in characterized by all Shannon-type inequalities among "n" random variables is denoted Software has been developed to automate the task of proving such inequalities. [ [http://user-www.ie.cuhk.edu.hk/~ITIP/ ITIP - Information Theoretic Inequality Prover] ] [ [http://xitip.epfl.ch/ Xitip - Information Theoretic Inequalities Prover] ] Given an inequality, such software is able to determine whether the given inequality contains the cone in which case the inequality can be verified, since
Other, less trivial inequalities have been discovered among the entropies and joint entropies of four or more random variables, which cannot be derived from Shannon's basic inequalities. [Z. Zhang, R.W. Yeung, "A non-Shannon-type conditional inequality of information quantities." IEEE Transactions on Information Theory. New York: Nov 1997. Vol. 43, Iss. 6; p. 1982] [K. Makarychev et al. "A new class of non-Shannon-type inequalities for entropies." Communications in Information and Systems, Vol. 2, No. 2, pp. 147-166, December 2002 [http://www.cs.princeton.edu/~ymakaryc/papers/nonshann.pdf PDF] ] These are known as non-Shannon-type inequalities.
It was shown that:where the inclusions are proper for All three of these sets are, in fact,
This fundamental inequality states that the
Kullback–Leibler divergenceis non-negative.
Another inequality concerning the Kullback–Leibler divergence is known as Kullback's inequality [Aimé Fuchs and Giorgio Letta, "L'inégalité de Kullback. Application à la théorie de l'estimation." Séminaire de probabilités (Strasbourg), vol. 4, pp. 108-131, 1970. http://www.numdam.org/item?id=SPS_1970__4__108_0] . If and are equivalent (i.e. absolutely continuous with respect to one another)
probability distributions on the same measurable space, whose first moments exist, then:where is the rate function, i.e. the convex conjugateof the cumulant-generating function, of , and is the first moment of
Cramér–Rao boundis an immediate consequence of this inequality.
In 1957, [I.I. Hirschman, "A Note on Entropy", American Journal of Mathematics, 1957] Hirschman showed that for a (reasonably well-behaved) function such that and its
Fourier transformthe sum of the differential entropies of and is non-negative, i.e.:Hirschman conjectured, and it was later proved, [W. Beckner. "Inequalities in Fourier Analysis." Annals of Mathematics, vol. 102, no. 6, pp. 159-182. 1975] that a sharper bound of which is attained in the case of a Gaussian distribution, could replace the right-hand side of this inequality. This is especially significant since it implies, and is stronger than, Weyl's formulation of Heisenberg's uncertainty principle.
This inequality can be generalized, and somewhat more easily proven, [S. Zozor, C. Vignat. "On classes of non-Gaussian asymptotic minimizers in entropic uncertainty principles." [http://arxiv.org/abs/math/0605510v1 arXiv:math/0605510v1] ] via the concept of
Rényi entropy. [Iwo Bialynicki-Birula. "Formulation of the uncertainty relations in terms of the Renyi entropies." [http://arxiv.org/abs/quant-ph/0608116v2 arXiv:quant-ph/0608116v2] ] Namely, if and it can be shown that:from which the lower bound for the sum of the Shannon entropies can be obtained by taking the limit as
Given discrete random variables and such that takes values only in the interval and is determined by (so that ), we have [ T. Tao, "Szemeredi's regularity lemma revisited", Contrib. Discrete Math., 1 (2006), 8–28. Preprint: [http://arxiv.org/abs/math/0504472v2 arXiv:math/0504472v2] ] [Rudolf Ahlswede, "The final form of Tao's inequality relating conditional expectation and conditional mutual information." Advances in Mathematics of Communications. Volume 1, No. 2, 2007, pp. 239–242 [http://www.aimsciences.org/journals/pdfs.jsp?paperID=2565&mode=full PDF] ] :relating the conditional expectation to the
conditional mutual information. (Note: the correction factor inside the radical arises because we are measuring the conditional mutual information in bits rather than nats.)
Entropy power inequality
Wikimedia Foundation. 2010.