- Inequalities in information theory
Inequalities are very important in the study of
information theory . There are a number of different contexts in which these inequalities appear.hannon-type inequalities
Consider a finite collection of finitely (or at most countably) supported
random variable s on the sameprobability space . For a collection of "n" random variables, there are such non-empty subsets for which entropies can be defined. For example, when we may consider the entropies and and express the following inequalities (which together characterize the range of the marginal and joint entropies of two random variables):
*
*
*
*
* In fact, these can all be expressed as special cases of a single inequality involving theconditional mutual information , namely:where , , and each denote the joint distribution of some arbitrary subset of our collection of random variables. Inequalities that can be derived from this are known as Shannon-type inequalities. More formally, (following the notation of Yeung), define to be the set of all "constructable" points in where a point is said to be constructable if and only if there is a joint, discrete distribution of "n" random variables such that each coordinate of that point, indexed by a non-empty subset of is equal to the joint entropy of the corresponding subset of the "n" random variables. The closure of is denotedThe cone in characterized by all Shannon-type inequalities among "n" random variables is denoted Software has been developed to automate the task of proving such inequalities. [ [http://user-www.ie.cuhk.edu.hk/~ITIP/ ITIP - Information Theoretic Inequality Prover] ] [ [http://xitip.epfl.ch/ Xitip - Information Theoretic Inequalities Prover] ] Given an inequality, such software is able to determine whether the given inequality contains the cone in which case the inequality can be verified, since
Non-Shannon-type inequalities
Other, less trivial inequalities have been discovered among the entropies and joint entropies of four or more random variables, which cannot be derived from Shannon's basic inequalities. [Z. Zhang, R.W. Yeung, "A non-Shannon-type conditional inequality of information quantities." IEEE Transactions on Information Theory. New York: Nov 1997. Vol. 43, Iss. 6; p. 1982] [K. Makarychev et al. "A new class of non-Shannon-type inequalities for entropies." Communications in Information and Systems, Vol. 2, No. 2, pp. 147-166, December 2002 [http://www.cs.princeton.edu/~ymakaryc/papers/nonshann.pdf PDF] ] These are known as non-Shannon-type inequalities.
It was shown that:where the inclusions are proper for All three of these sets are, in fact,
convex cone s.Other inequalities
Gibbs' inequality
"Main article:
Gibbs' inequality "This fundamental inequality states that the
Kullback–Leibler divergence is non-negative.Kullback's inequality
Another inequality concerning the Kullback–Leibler divergence is known as Kullback's inequality [Aimé Fuchs and Giorgio Letta, "L'inégalité de Kullback. Application à la théorie de l'estimation." Séminaire de probabilités (Strasbourg), vol. 4, pp. 108-131, 1970. http://www.numdam.org/item?id=SPS_1970__4__108_0] . If and are equivalent (i.e. absolutely continuous with respect to one another)
probability distribution s on the samemeasurable space , whose first moments exist, then:where is therate function , i.e. theconvex conjugate of thecumulant -generating function, of , and is the first moment ofThe
Cramér–Rao bound is an immediate consequence of this inequality.Hirschman uncertainty
In 1957, [I.I. Hirschman, "A Note on Entropy", American Journal of Mathematics, 1957] Hirschman showed that for a (reasonably well-behaved) function such that and its
Fourier transform the sum of the differential entropies of and is non-negative, i.e.:Hirschman conjectured, and it was later proved, [W. Beckner. "Inequalities in Fourier Analysis." Annals of Mathematics, vol. 102, no. 6, pp. 159-182. 1975] that a sharper bound of which is attained in the case of a Gaussian distribution, could replace the right-hand side of this inequality. This is especially significant since it implies, and is stronger than, Weyl's formulation of Heisenberg'suncertainty principle .This inequality can be generalized, and somewhat more easily proven, [S. Zozor, C. Vignat. "On classes of non-Gaussian asymptotic minimizers in entropic uncertainty principles." [http://arxiv.org/abs/math/0605510v1 arXiv:math/0605510v1] ] via the concept of
Rényi entropy . [Iwo Bialynicki-Birula. "Formulation of the uncertainty relations in terms of the Renyi entropies." [http://arxiv.org/abs/quant-ph/0608116v2 arXiv:quant-ph/0608116v2] ] Namely, if and it can be shown that:from which the lower bound for the sum of the Shannon entropies can be obtained by taking the limit asTao's inequality
Given discrete random variables and such that takes values only in the interval and is determined by (so that ), we have [ T. Tao, "Szemeredi's regularity lemma revisited", Contrib. Discrete Math., 1 (2006), 8–28. Preprint: [http://arxiv.org/abs/math/0504472v2 arXiv:math/0504472v2] ] [Rudolf Ahlswede, "The final form of Tao's inequality relating conditional expectation and conditional mutual information." Advances in Mathematics of Communications. Volume 1, No. 2, 2007, pp. 239–242 [http://www.aimsciences.org/journals/pdfs.jsp?paperID=2565&mode=full PDF] ] :relating the conditional expectation to the
conditional mutual information . (Note: the correction factor inside the radical arises because we are measuring the conditional mutual information inbit s rather thannat s.)ee Also
*
Cramér–Rao bound
*Entropy power inequality
*Fano's inequality
*Jensen's inequality
*Kraft inequality References
Wikimedia Foundation. 2010.