- Inequalities in information theory
Inequalities are very important in the study of

information theory . There are a number of different contexts in which these inequalities appear.**hannon-type inequalities**Consider a finite collection of finitely (or at most countably) supported

random variable s on the sameprobability space . For a collection of "n" random variables, there are $2^n-1$ such non-empty subsets for which entropies can be defined. For example, when $n=2,$ we may consider the entropies $H(X\_1),$ $H(X\_2),$ and $H(X\_1,\; X\_2),$ and express the following inequalities (which together characterize the range of the marginal and joint entropies of two random variables):

* $H(X\_1)\; ge\; 0$

* $H(X\_2)\; ge\; 0$

* $H(X\_1)\; le\; H(X\_1,\; X\_2)$

* $H(X\_2)\; le\; H(X\_1,\; X\_2)$

* $H(X\_1,\; X\_2)\; le\; H(X\_1)\; +\; H(X\_2)$In fact, these can all be expressed as special cases of a single inequality involving the, namely:$I(A;B|C)\; ge\; 0,$where $A$, $B$, and $C$ each denote the joint distribution of some arbitrary subset of our collection of random variables. Inequalities that can be derived from this are known asconditional mutual information **Shannon-type**inequalities. More formally, (following the notation of Yeung), define $Gamma^*\_n$ to be the set of all "constructable" points in $mathbb\; R^\{2^n-1\},$ where a point is said to be constructable if and only if there is a joint, discrete distribution of "n" random variables such that each coordinate of that point, indexed by a non-empty subset of $\{1,2,...n\},$ is equal to the joint entropy of the corresponding subset of the "n" random variables. The closure of $Gamma^*\_n$ is denoted $overline\{Gamma^*\_n\}.$The

**cone**in $mathbb\; R^\{2^n-1\}$ characterized by all Shannon-type inequalities among "n" random variables is denoted $Gamma\_n.$ Software has been developed to automate the task of proving such inequalities. [*[*] [*http://user-www.ie.cuhk.edu.hk/~ITIP/ ITIP - Information Theoretic Inequality Prover*]*[*] Given an inequality, such software is able to determine whether the given inequality contains the cone $Gamma\_n,$ in which case the inequality can be verified, since $Gamma^*\_n\; subseteq\; Gamma\_n.$*http://xitip.epfl.ch/ Xitip - Information Theoretic Inequalities Prover*]**Non-Shannon-type inequalities**Other, less trivial inequalities have been discovered among the entropies and joint entropies of four or more random variables, which cannot be derived from Shannon's basic inequalities. [

*Z. Zhang, R.W. Yeung, "A non-Shannon-type conditional inequality of information quantities." IEEE Transactions on Information Theory. New York: Nov 1997. Vol. 43, Iss. 6; p. 1982*] [*K. Makarychev et al. "A new class of non-Shannon-type inequalities for entropies." Communications in Information and Systems, Vol. 2, No. 2, pp. 147-166, December 2002 [*] These are known as*http://www.cs.princeton.edu/~ymakaryc/papers/nonshann.pdf PDF*]**non-Shannon-type**inequalities.It was shown that:$Gamma^*\_n\; subset\; overline\{Gamma^*\_n\}\; subset\; Gamma\_n,$where the inclusions are proper for $n\; ge\; 4.$ All three of these sets are, in fact,

convex cone s.**Other inequalities****Gibbs' inequality**"Main article:

Gibbs' inequality "This fundamental inequality states that the

Kullback–Leibler divergence is non-negative.**Kullback's inequality**Another inequality concerning the Kullback–Leibler divergence is known as

**Kullback's inequality**[*Aimé Fuchs and Giorgio Letta, "L'inégalité de Kullback. Application à la théorie de l'estimation." Séminaire de probabilités (Strasbourg), vol. 4, pp. 108-131, 1970. http://www.numdam.org/item?id=SPS_1970__4__108_0*] . If $P$ and $Q$ are**equivalent**(i.e. absolutely continuous with respect to one another)probability distribution s on the samemeasurable space , whose first moments exist, then:$D\_\{KL\}(P|Q)\; ge\; Psi\_Q^*(mu\text{'}\_1(P)),$where $Psi\_Q^*$ is the, i.e. therate function convex conjugate of thecumulant -generating function, of $Q$, and $mu\text{'}\_1(P)$ is the first moment of $P.$The

Cramér–Rao bound is an immediate consequence of this inequality.**Hirschman uncertainty**In 1957, [

*I.I. Hirschman, "A Note on Entropy", American Journal of Mathematics, 1957*] Hirschman showed that for a (reasonably well-behaved) function $f:mathbb\; R\; ightarrow\; mathbb\; C$ such that $int\_\{-infty\}^infty\; |f(x)|^2,dx\; =\; 1,$ and itsFourier transform $g(y)=int\_\{-infty\}^infty\; f(x)\; e^\{-2\; pi\; i\; x\; y\},dx,$ the sum of the differential entropies of $|f|^2$ and $|g|^2$ is non-negative, i.e.:$-int\_\{-infty\}^infty\; |f(x)|^2\; log\; |f(x)|^2\; ,dx\; -int\_\{-infty\}^infty\; |g(y)|^2\; log\; |g(y)|^2\; ,dy\; ge\; 0.$Hirschman conjectured, and it was later proved, [*W. Beckner. "Inequalities in Fourier Analysis." Annals of Mathematics, vol. 102, no. 6, pp. 159-182. 1975*] that a sharper bound of $1-log\; 2,$ which is attained in the case of a Gaussian distribution, could replace the right-hand side of this inequality. This is especially significant since it implies, and is stronger than, Weyl's formulation of Heisenberg'suncertainty principle .This inequality can be generalized, and somewhat more easily proven, [

*S. Zozor, C. Vignat. "On classes of non-Gaussian asymptotic minimizers in entropic uncertainty principles." [*] via the concept of*http://arxiv.org/abs/math/0605510v1 arXiv:math/0605510v1*]Rényi entropy . [*Iwo Bialynicki-Birula. "Formulation of the uncertainty relations in terms of the Renyi entropies." [*] Namely, if $alpha,eta\; 0,$ and $1/alpha\; +\; 1/eta\; =\; 2,$ it can be shown that:$frac\; 1\; \{1-alpha\}\; log\; left(\; int\_\{-infty\}^infty\; |f(x)|^\{2alpha\}\; ,dx\; ight)\; +\; frac\; 1\; \{1-eta\}\; log\; left(\; int\_\{-infty\}^infty\; |g(y)|^\{2eta\}\; ,dy\; ight)\; ge\; frac\; 1\; 2\; left(frac\{logalpha\}\{alpha-1\}+frac\{logeta\}\{eta-1\}\; ight)\; -\; log\; 2,$from which the lower bound for the sum of the Shannon entropies can be obtained by taking the limit as $alpha,\; eta\; ightarrow\; 1.$*http://arxiv.org/abs/quant-ph/0608116v2 arXiv:quant-ph/0608116v2*]**Tao's inequality**Given discrete random variables $X,$ $Y,$ and $Y\text{'},$ such that $X$ takes values only in the interval $[-1,1]$ and $Y\text{'}$ is determined by $Y$ (so that $H(Y\text{'}|Y)=0$), we have [

*T. Tao, "Szemeredi's regularity lemma revisited", Contrib. Discrete Math., 1 (2006), 8–28. Preprint: [*] [*http://arxiv.org/abs/math/0504472v2 arXiv:math/0504472v2*]*Rudolf Ahlswede, "The final form of Tao's inequality relating conditional expectation and conditional mutual information." Advances in Mathematics of Communications. Volume 1, No. 2, 2007, pp. 239–242 [*] :$mathbb\; E\; igg(\; ig|\; mathbb\; E(X|Y\text{'})\; -\; mathbb\; E(X|Y)\; ig|\; igg)\; le\; sqrt\; \{\; 2\; log\; 2\; ,\; I(X;Y|Y\text{'})\; \},$relating the conditional expectation to the*http://www.aimsciences.org/journals/pdfs.jsp?paperID=2565&mode=full PDF*]conditional mutual information . (Note: the correction factor $log\; 2$ inside the radical arises because we are measuring the conditional mutual information inbit s rather thannat s.)**ee Also***

Cramér–Rao bound

*Entropy power inequality

*Fano's inequality

*Jensen's inequality

*Kraft inequality **References**

*Wikimedia Foundation.
2010.*