Inequalities in information theory

Inequalities in information theory

Inequalities are very important in the study of information theory. There are a number of different contexts in which these inequalities appear.

hannon-type inequalities

Consider a finite collection of finitely (or at most countably) supported random variables on the same probability space. For a collection of "n" random variables, there are 2^n-1 such non-empty subsets for which entropies can be defined. For example, when n=2, we may consider the entropies H(X_1), H(X_2), and H(X_1, X_2), and express the following inequalities (which together characterize the range of the marginal and joint entropies of two random variables):
* H(X_1) ge 0
* H(X_2) ge 0
* H(X_1) le H(X_1, X_2)
* H(X_2) le H(X_1, X_2)
* H(X_1, X_2) le H(X_1) + H(X_2)In fact, these can all be expressed as special cases of a single inequality involving the conditional mutual information, namely:I(A;B|C) ge 0,where A, B, and C each denote the joint distribution of some arbitrary subset of our collection of random variables. Inequalities that can be derived from this are known as Shannon-type inequalities. More formally, (following the notation of Yeung), define Gamma^*_n to be the set of all "constructable" points in mathbb R^{2^n-1}, where a point is said to be constructable if and only if there is a joint, discrete distribution of "n" random variables such that each coordinate of that point, indexed by a non-empty subset of {1,2,...n}, is equal to the joint entropy of the corresponding subset of the "n" random variables. The closure of Gamma^*_n is denoted overline{Gamma^*_n}.

The cone in mathbb R^{2^n-1} characterized by all Shannon-type inequalities among "n" random variables is denoted Gamma_n. Software has been developed to automate the task of proving such inequalities. [ [http://user-www.ie.cuhk.edu.hk/~ITIP/ ITIP - Information Theoretic Inequality Prover] ] [ [http://xitip.epfl.ch/ Xitip - Information Theoretic Inequalities Prover] ] Given an inequality, such software is able to determine whether the given inequality contains the cone Gamma_n, in which case the inequality can be verified, since Gamma^*_n subseteq Gamma_n.

Non-Shannon-type inequalities

Other, less trivial inequalities have been discovered among the entropies and joint entropies of four or more random variables, which cannot be derived from Shannon's basic inequalities. [Z. Zhang, R.W. Yeung, "A non-Shannon-type conditional inequality of information quantities." IEEE Transactions on Information Theory. New York: Nov 1997. Vol. 43, Iss. 6; p. 1982] [K. Makarychev et al. "A new class of non-Shannon-type inequalities for entropies." Communications in Information and Systems, Vol. 2, No. 2, pp. 147-166, December 2002 [http://www.cs.princeton.edu/~ymakaryc/papers/nonshann.pdf PDF] ] These are known as non-Shannon-type inequalities.

It was shown that:Gamma^*_n subset overline{Gamma^*_n} subset Gamma_n,where the inclusions are proper for n ge 4. All three of these sets are, in fact, convex cones.

Other inequalities

Gibbs' inequality

"Main article: Gibbs' inequality"

This fundamental inequality states that the Kullback–Leibler divergence is non-negative.

Kullback's inequality

Another inequality concerning the Kullback–Leibler divergence is known as Kullback's inequality [Aimé Fuchs and Giorgio Letta, "L'inégalité de Kullback. Application à la théorie de l'estimation." Séminaire de probabilités (Strasbourg), vol. 4, pp. 108-131, 1970. http://www.numdam.org/item?id=SPS_1970__4__108_0] . If P and Q are equivalent (i.e. absolutely continuous with respect to one another) probability distributions on the same measurable space, whose first moments exist, then:D_{KL}(P|Q) ge Psi_Q^*(mu'_1(P)),where Psi_Q^* is the rate function, i.e. the convex conjugate of the cumulant-generating function, of Q, and mu'_1(P) is the first moment of P.

The Cramér–Rao bound is an immediate consequence of this inequality.

Hirschman uncertainty

In 1957, [I.I. Hirschman, "A Note on Entropy", American Journal of Mathematics, 1957] Hirschman showed that for a (reasonably well-behaved) function f:mathbb R ightarrow mathbb C such that int_{-infty}^infty |f(x)|^2,dx = 1, and its Fourier transform g(y)=int_{-infty}^infty f(x) e^{-2 pi i x y},dx, the sum of the differential entropies of |f|^2 and |g|^2 is non-negative, i.e.:-int_{-infty}^infty |f(x)|^2 log |f(x)|^2 ,dx -int_{-infty}^infty |g(y)|^2 log |g(y)|^2 ,dy ge 0.Hirschman conjectured, and it was later proved, [W. Beckner. "Inequalities in Fourier Analysis." Annals of Mathematics, vol. 102, no. 6, pp. 159-182. 1975] that a sharper bound of 1-log 2, which is attained in the case of a Gaussian distribution, could replace the right-hand side of this inequality. This is especially significant since it implies, and is stronger than, Weyl's formulation of Heisenberg's uncertainty principle.

This inequality can be generalized, and somewhat more easily proven, [S. Zozor, C. Vignat. "On classes of non-Gaussian asymptotic minimizers in entropic uncertainty principles." [http://arxiv.org/abs/math/0605510v1 arXiv:math/0605510v1] ] via the concept of Rényi entropy. [Iwo Bialynicki-Birula. "Formulation of the uncertainty relations in terms of the Renyi entropies." [http://arxiv.org/abs/quant-ph/0608116v2 arXiv:quant-ph/0608116v2] ] Namely, if alpha,eta > 0, and 1/alpha + 1/eta = 2, it can be shown that:frac 1 {1-alpha} log left( int_{-infty}^infty |f(x)|^{2alpha} ,dx ight) + frac 1 {1-eta} log left( int_{-infty}^infty |g(y)|^{2eta} ,dy ight) ge frac 1 2 left(frac{logalpha}{alpha-1}+frac{logeta}{eta-1} ight) - log 2,from which the lower bound for the sum of the Shannon entropies can be obtained by taking the limit as alpha, eta ightarrow 1.

Tao's inequality

Given discrete random variables X, Y, and Y', such that X takes values only in the interval [-1,1] and Y' is determined by Y (so that H(Y'|Y)=0), we have [ T. Tao, "Szemeredi's regularity lemma revisited", Contrib. Discrete Math., 1 (2006), 8–28. Preprint: [http://arxiv.org/abs/math/0504472v2 arXiv:math/0504472v2] ] [Rudolf Ahlswede, "The final form of Tao's inequality relating conditional expectation and conditional mutual information." Advances in Mathematics of Communications. Volume 1, No. 2, 2007, pp. 239–242 [http://www.aimsciences.org/journals/pdfs.jsp?paperID=2565&mode=full PDF] ] :mathbb E igg( ig| mathbb E(X|Y') - mathbb E(X|Y) ig| igg) le sqrt { 2 log 2 , I(X;Y|Y') },relating the conditional expectation to the conditional mutual information. (Note: the correction factor log 2 inside the radical arises because we are measuring the conditional mutual information in bits rather than nats.)

ee Also

*Cramér–Rao bound
*Entropy power inequality
*Fano's inequality
*Jensen's inequality
*Kraft inequality

References


Wikimedia Foundation. 2010.

Игры ⚽ Нужно решить контрольную?

Look at other dictionaries:

  • Information theory and measure theory — Measures in information theory = Many of the formulas in information theory have separate versions for continuous and discrete cases, i.e. integrals for the continuous case and sums for the discrete case. These versions can often be generalized… …   Wikipedia

  • Information algebra — Classical information theory goes back to Claude Shannon. It is a theory of information transmission, looking at communication and storage. However, it has not been considered so far that information comes from different sources and that it is… …   Wikipedia

  • Conditional mutual information — In probability theory, and in particular, information theory, the conditional mutual information is, in its most basic form, the expected value of the mutual information of two random variables given the value of a third. Contents 1 Definition 2… …   Wikipedia

  • List of inequalities — This page lists Wikipedia articles about named mathematical inequalities. Inequalities in pure mathematics =Analysis= * Askey–Gasper inequality * Bernoulli s inequality * Bernstein s inequality (mathematical analysis) * Bessel s inequality *… …   Wikipedia

  • Multivariate mutual information — In information theory there have been various attempts over the years to extend the definition of mutual information to more than two random variables. These attempts have met with a great deal of confusion and a realization that interactions… …   Wikipedia

  • Mutual information — Individual (H(X),H(Y)), joint (H(X,Y)), and conditional entropies for a pair of correlated subsystems X,Y with mutual information I(X; Y). In probability theory and information theory, the mutual information (sometimes known by the archaic term… …   Wikipedia

  • Information society — For other uses, see Information society (disambiguation). The aim of the information society is to gain competitive advantage internationally through using IT in a creative and productive way. An information society is a society in which the… …   Wikipedia

  • Chaos theory — This article is about chaos theory in Mathematics. For other uses of Chaos theory, see Chaos Theory (disambiguation). For other uses of Chaos, see Chaos (disambiguation). A plot of the Lorenz attractor for values r = 28, σ = 10, b = 8/3 …   Wikipedia

  • Morse theory — Morse function redirects here. In another context, a Morse function can also mean an anharmonic oscillator: see Morse potential In differential topology, the techniques of Morse theory give a very direct way of analyzing the topology of a… …   Wikipedia

  • Geometric group theory — is an area in mathematics devoted to the study of finitely generated groups via exploring the connections between algebraic properties of such groups and topological and geometric properties of spaces on which these groups act (that is, when the… …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”