Fano's inequality

Fano's inequality

In information theory, Fano's inequality (also known as the Fano converse and the Fano lemma) relates the average information lost in a noisy channel to the probability of the categorization error. It was derived by Robert Fano in the early 1950s while teaching a Ph.D. seminar in information theory at MIT, and later recorded in his 1961 textbook.

It is used to find a lower bound on the error probability of any decoder as well as the lower bounds for minimax risks in density estimation.

Fano's inequality

Let the random variables "X" and "Y" represent input and output messages (out of "r"+1 possible messages) with a joint probability P(x,y). The Fano's inequality is:H(X|Y)leq H(e)+P(e)log(r),

where :Hleft(X|Y ight)=sum_{i,j} P(x_i,y_j)log Pleft(x_i|y_j ight)

is the conditional entropy,:P(e)=sup_i sum_{j ot=i} P(y_j|x_i)

is the probability of the communication error, and:H(e)=P(e)log P(e)+(1-P(e))log(1-P(e))

is the corresponding binary entropy.

Alternative formulation

Let "X" be a random variable with density equal to one of r+1 possible densities f_1,ldots,f_{r+1}. Furthermore, the Kullback-Leibler divergence between any pair of densities can not be too large,: D_{KL}(f_i|f_j)leq eta for all i ot = j.

Let psi(X)in{1,ldots, r+1} be an estimate of the index. Then

:sup_i P_i(psi(X) ot = i) geq 1-frac{eta+log 2}{log r}where P_i is the probability induced by f_i

Generalization

The following generalization is due to Ibragimov and Khasminskii (1979), Assouad and Birge (1983).

Let F be a class of densities with a subclass of "r"+1 densities f_ heta such that for any heta ot = heta,'

:|f_{ heta}-f_{ heta'}|_{L_1}geq alpha,:D_{KL}(f_ heta|f_{ heta'})leq eta.Then in the worst case the expected value of error of estimation is bound from below,

:sup_{fin mathbf{F E |f_n-f|_{L_1}geq frac{alpha}{2}left(1-frac{neta+log 2}{log r} ight)where f_n is any density estimator based on a sample of size "n".

References

* P. Assouad, "Deus remarques sur l'estimation," "Comptes Rendus de L'Academie des Sciences de Paris", Vol. 296, pp. 1021-1024, 1983.

* L. Birge, "Estimating a density under order restrictions: nonasymptotic minimax risk," Technical report, UER de Sciences Economiques, Universite Paris X, Nanterre, France, 1983.

* L. Devroye, "A Course in Density Estimation". Progress in probability and statistics, Vol 14. Boston, Birkhauser, 1987. ISBN 0-8176-3365-0, ISBN 3-7643-3365-0.

* R. Fano, "Transmission of information; a statistical theory of communications." Cambridge, Mass., M.I.T. Press, 1961. ISBN 0-262-06001-9

* R. Fano, " [http://www.scholarpedia.org/article/Fano_inequality Fano inequality] " Scholarpedia, 2008.

* I. A. Ibragimov, R. Z. Has′minskii, "Statistical estimation, asymptotic theory". Applications of Mathematics, vol. 16, Springer-Verlag, New York, 1981. ISBN 0-3879-0523-5


Wikimedia Foundation. 2010.

Игры ⚽ Поможем сделать НИР

Look at other dictionaries:

  • Gibbs' inequality — In information theory, Gibbs inequality is a statement about the mathematical entropy of a discrete probability distribution. Several other bounds on the entropy of probability distributions are derived from Gibbs inequality, including Fano s… …   Wikipedia

  • Inequalities in information theory — Inequalities are very important in the study of information theory. There are a number of different contexts in which these inequalities appear.hannon type inequalitiesConsider a finite collection of finitely (or at most countably) supported… …   Wikipedia

  • Noisy-channel coding theorem — In information theory, the noisy channel coding theorem (sometimes Shannon s theorem), establishes that for any given degree of noise contamination of a communication channel, it is possible to communicate discrete data (digital information)… …   Wikipedia

  • List of mathematics articles (F) — NOTOC F F₄ F algebra F coalgebra F distribution F divergence Fσ set F space F test F theory F. and M. Riesz theorem F1 Score Faà di Bruno s formula Face (geometry) Face configuration Face diagonal Facet (mathematics) Facetting… …   Wikipedia

  • List of statistics topics — Please add any Wikipedia articles related to statistics that are not already on this list.The Related changes link in the margin of this page (below search) leads to a list of the most recent changes to the articles listed below. To see the most… …   Wikipedia

  • Sylvester–Gallai theorem — The Sylvester–Gallai theorem asserts that given a finite number of points in the Euclidean plane, either all the points are collinear; or there is a line which contains exactly two of the points. This claim was posed as a problem by J. J.… …   Wikipedia

  • Block design — In combinatorial mathematics, a block design (more fully, a balanced incomplete block design) is a particular kind of set system, which has long standing applications to experimental design (an area of statistics) as well as purely combinatorial… …   Wikipedia

  • Information theory — Not to be confused with Information science. Information theory is a branch of applied mathematics and electrical engineering involving the quantification of information. Information theory was developed by Claude E. Shannon to find fundamental… …   Wikipedia

  • Nyquist–Shannon sampling theorem — Fig.1: Hypothetical spectrum of a bandlimited signal as a function of frequency The Nyquist–Shannon sampling theorem, after Harry Nyquist and Claude Shannon, is a fundamental result in the field of information theory, in particular… …   Wikipedia

  • List of terms relating to algorithms and data structures — The [http://www.nist.gov/dads/ NIST Dictionary of Algorithms and Data Structures] is a reference work maintained by the U.S. National Institute of Standards and Technology. It defines a large number of terms relating to algorithms and data… …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”