Jensen's inequality

Jensen's inequality

In mathematics, Jensen's inequality, named after the Danish mathematician Johan Jensen, relates the value of a convex function of an integral to the integral of the convex function. It was proved by Jensen in 1906 [Jensen, J. "Sur les fonctions convexes et les inégalités entre les valeurs moyennes".] . Given its generality, the inequality appears in many forms depending on the context, some of which are presented below. In its simplest form the inequality states,"the convex transformation of a mean is less than or equal to the mean after convex transformation."

The finite form of the equation was the logo of Institute for Mathematical Sciences at University of Copenhagen until 2006.

tatements

The classical form of Jensen's inequality involves several numbers and weights. The inequality can be stated quite generally using measure theory, or the equivalent probabilist notation. In this probabilistic setting the inequality can be further generalized to its "full strength".

Finite form

For a real convex function φ, numbers "xi" in its domain, and positive weights "ai", Jensen's inequality can be stated as:

:varphileft(frac{sum a_{i} x_{i{sum a_{i ight) le frac{sum a_{i} varphi (x_{i})}{sum a_{i;

and the inequality is clearly reversed if φ is concave.

As a particular case, if the weights "ai" are all equal to unity, then

:varphileft(frac{sum x_{i{n} ight) le frac{sum varphi (x_{i})}{n}.

For instance, the log("x") function is "concave" (note that we can use Jensen's to prove convexity or concavity, if it holds for two real numbers whose functions are taken), so substituting scriptstylevarphi(x),=,-log(x) in the previous formula, this establishes the (logarithm of) the familiar arithmetic mean-geometric mean inequality:

: frac{x_1 + x_2 + cdots + x_n}{n} ge sqrt [n] {x_1 x_2 cdots x_n}.

The variable "x" may, if required, be a function of another variable (or set of variables) "t", so that "x""i" = "g"("t""i"). All of this carries directly over to the general continuous case: the weights "ai" are replaced by a non-negative integrable function "f"("x"), such as a probability distribution, and the summations replaced by integrals.

Measure-theoretic and probabilistic form

Let (Ω,A,μ) be a measure space, such that μ(Ω) = 1. If "g" is a real-valued function that is μ-integrable, and if φ is a measurable convex function on the real axis, then:

:varphileft(int_{Omega} g, dmu ight) le int_Omega varphi circ g, dmu.

The same result can be equivalently stated in a probability theory setting, by a simple change of notation. Let scriptstyle(Omega, mathfrak{F},mathbb{P}) be a probability space, "X" an integrable real-valued random variable and φ a measurable convex function. Then:

:varphileft(mathbb{E}{X} ight) leq mathbb{E}{varphi(X)}.

In this probability setting, the measure μ is intended as a probability scriptstylemathbb{P}, the integral with respect to μ as an expected value scriptstylemathbb{E}, and the function "g" as a random variable "X".

General inequality in a probabilistic setting

More generally, let "T" be a real topological vector space, and "X" a "T"-valued integrable random variable. In this general setting, "integrable" means that for any element "z" in the dual space of "T": scriptstylemathbb{E}|langle z, X angle|,<,infty , there exists an element scriptstylemathbb{E}{X} in "T", such that scriptstylelangle z, mathbb{E}{X} angle,=,mathbb{E}{langle z, X angle}. Then, for any measurable convex function φ and any sub-&sigma;-algebra scriptstylemathfrak{G} of scriptstylemathfrak{F}:

:varphileft(mathbb{E}{X|mathfrak{G}} ight) leq mathbb{E}{varphi(X)|mathfrak{G}}.

Here scriptstylemathbb{E}{cdot|mathfrak{G} } stands for the expectation conditioned to the σ-algebra scriptstylemathfrak{G}. This general statement reduces to the previous ones when the topological vector space "T" is the real axis, and scriptstylemathfrak{G} is the trivial σ-algebra scriptstyle{varnothing, Omega}.

Proofs

A proof of Jensen's inequality can be provided in several ways, and three different proofs corresponding to the different statements above will be offered. Before embarking on these mathematical derivations, however, it is worth analyzing an intuitive graphical argument based on the probabilistic case where "X" is a real number (see figure). Assuming a hypothetical distribution of "X" values, one can immediately identify the position of scriptstylemathbb{E}{X} and its image scriptstylevarphi(mathbb{E}{X}) in the graph. Noticing that for convex mappings scriptstyle Y,=,varphi(X) the corresponding distribution of "Y" values is increasingly "stretched out" for increasing values of "X", it is easy to see that the distribution of "Y" is broader than that of "X" in the interval corresponding to "X" > "X"0 and narrower in "X" < "X"0 for any "X"0; in particular, this is also true for scriptstyle X_0 ,=, mathbb{E}{ X }. Consequently, in this picture the expectation of "Y" will always shift upwards with respect to the position of scriptstylevarphi(mathbb{E}{ X } ), and this "proves" the inequality, i.e.

: mathbb{E}{ Y(X) } geq Y(mathbb{E}{ X } ),

the equality taking place when scriptstylevarphi(X) is not strictly convex, e.g. when it is a straight line.

The proofs below formalize this intuitive notion.

Proof 1 (finite form)

If "λ"1 and "λ"2 are two arbitrary positive real numbers such that "λ"1 + "λ"2 = 1 then convexity of scriptstylevarphi implies

:varphi(lambda_1 x_1+lambda_2 x_2)leq lambda_1,varphi(x_1)+lambda_2,varphi(x_2) ext{ for any }x_1,,x_2.

This can be easily generalized: if "λ"1, "λ"2, ..., "λ""n" are positive real numbers such that "λ"1 + ... + "λ""n" = 1, then

:varphi(lambda_1 x_1+lambda_2 x_2+cdots+lambda_n x_n)leq lambda_1,varphi(x_1)+lambda_2,varphi(x_2)+cdots+lambda_n,varphi(x_n),

for any "x"1, ..., "x""n". This "finite form" of the Jensen's inequality can be proved by induction: by convexity hypotheses, the statement is true for "n" = 2. Suppose it is true also for some "n", one needs to prove it for "n" + 1. At least one of the "λ""i" is strictly positive, say "λ"1; therefore by convexity inequality:

:varphileft(sum_{i=1}^{n+1}lambda_i x_i ight)= varphileft(lambda_1 x_1+(1-lambda_1)sum_{i=2}^{n+1} frac{lambda_i}{1-lambda_1} x_i ight)leq lambda_1,varphi(x_1)+(1-lambda_1) varphileft(sum_{i=2}^{n+1}left( frac{lambda_i}{1-lambda_1} x_i ight) ight).

Since scriptstyle sum_{i=2}^{n+1} lambda_i/(1-lambda_1), =,1, one can apply the induction hypotheses to the last term in the previous formula to obtain the result, namely the finite form of the Jensen's inequality.

In order to obtain the general inequality from this finite form, one needs to use a density argument. The finite form can be re-written as:

:varphileft(int x,dmu_n(x) ight)leq int varphi(x),dmu_n(x),

where "μ"n" is a measure given by an arbitrary convex combination of Dirac deltas:

:mu_n=sum_{i=1}^n lambda_i delta_{x_i}.

Since convex functions are continuous, and since convex combinations of Dirac deltas are weakly dense in the set of probability measures (as could be easily verified), the general statement is obtained simply by a limiting procedure.

Proof 2 (measure-theoretic form)

Let "g" be a real-valued μ-integrable function on a measure space Ω, and let "φ" be a convex function on the real numbers. Define the right-handed derivative of φ at "x" as

:varphi^prime(x):=lim_{t o0^+}frac{varphi(x+t)-varphi(x)}{t}.

Since φ is convex, the quotient of the right-hand side is decreasing when "t" approaches 0 from the right, and bounded below by any term of the form

:frac{varphi(x+t)-varphi(x)}{t}

where "t" < 0, and therefore, the limit does always exist.

Now, let us define the following:

:x_0:=int_Omega g, dmu,

:a:=varphi^prime(x_0),

:b:=varphi(x_0)-x_0varphi^prime(x_0).

Then for all "x", "ax" + "b" ≤ "φ"("x"). To see that, take "x" &gt; "x"0, and define "t" = "x" − "x"0 &gt; 0. Then,

:varphi^prime(x_0)leqfrac{varphi(x_0+t)-varphi(x_0)}{t}.

Therefore,

:varphi^prime(x_0)(x-x_0)+varphi(x_0)leqvarphi(x)

as desired. The case for "x" &lt; "x"0 is proven similarly, and clearly "ax"0 + "b" = "φ"("x"0).

φ("x"0) can then be rewritten as

:ax_0+b=aleft(int_Omega g,dmu ight)+b.

But since μ(Ω) = 1, then for every real number "k" we have

:int_Omega k,dmu=k.

In particular,

:aleft(int_Omega g,dmu ight)+b=int_Omega(ag+b),dmuleqint_Omegavarphicirc g,dmu.

Proof 3 (general inequality in a probabilistic setting)

Let X be an integrable random variable that takes value in a real topological vector space "T". Since scriptstylevarphi:T mapsto mathbb{R} is convex, for any x,y in T, the quantity

:frac{varphi(x+ heta,y)-varphi(x)}{ heta},

is decreasing as θ approaches 0+. In particular, the "subdifferential" of "φ" evaluated at "x" in the direction "y" is well-defined by

:(Dvarphi)(x)cdot y:=lim_{ heta downarrow 0} frac{varphi(x+ heta,y)-varphi(x)}{ heta}=inf_{ heta eq 0} frac{varphi(x+ heta,y)-varphi(x)}{ heta}.

It is easily seen that the subdifferential is linear in "y" and, since the infimum taken in the right-hand side of the previous formula is smaller than the value of the same term for "θ" = 1, one gets

:varphi(x)leq varphi(x+y)-(Dvarphi)(x)cdot y.,

In particular, for an arbitrary sub-σ-algebra scriptstylemathfrak{G} we can evaluate the last inequality when scriptstyle x,=,mathbb{E}{X|mathfrak{G}},,y=X-mathbb{E}{X|mathfrak{G}} to obtain

:varphi(mathbb{E}{X|mathfrak{G}})leq varphi(X)-(Dvarphi)(mathbb{E}{X|mathfrak{G}})cdot (X-mathbb{E}{X|mathfrak{G}}).

Now, if we take the expectation conditioned to scriptstylemathfrak{G} on both sides of the previous expression, we get the result since:

:mathbb{E}{left [(Dvarphi)(mathbb{E}{X|mathfrak{G}})cdot (X-mathbb{E}{X|mathfrak{G}}) ight] |mathfrak{G}}=(Dvarphi)(mathbb{E}{X|mathfrak{G}})cdot mathbb{E}{ left( X-mathbb{E}{X|mathfrak{G}} ight) |mathfrak{G}}=0,

by the linearity of the subdifferential in the "y" variable, and well-known properties of the conditional expectation.

Applications and special cases

Form involving a probability density function

Suppose Ω is a measurable subset of the real line and "f"("x") is a non-negative function such that

:int_{-infty}^infty f(x),dx = 1.

In probabilistic language, "f" is a probability density function.

Then Jensen's inequality becomes the following statement about convex integrals:

If "g" is any real-valued measurable function and φ is convex over the range of "g", then

: varphileft(int_{-infty}^infty g(x)f(x), dx ight) le int_{-infty}^infty varphi(g(x)) f(x), dx.

If "g"("x") = "x", then this form of the inequality reduces to a commonly used special case:

:varphileft(int_{-infty}^infty x, f(x), dx ight) le int_{-infty}^infty varphi(x),f(x), dx.

Alternative finite form

If Omega is some finite set {x_1,x_2,ldots,x_n}, and if mu is a counting measure on Omega, then the general form reduces to a statement about sums:

: varphileft(sum_{i=1}^{n} g(x_i)lambda_i ight) le sum_{i=1}^{n} varphi(g(x_i))lambda_i,

provided that lambda_1 + lambda_2 + cdots + lambda_n = 1, lambda_i ge 0.

There is also an infinite discrete form.

tatistical physics

Jensen's inequality is of particular importance in statistical physics when the convex function is an exponential, giving:

: e^{langle X angle} leq leftlangle e^X ight angle,

where angle brackets denote expected values with respect to some probability distribution in the random variable "X".

The proof in this case is very simple (cf. Chandler, Sec. 5.5). The desired inequality follows directly, by writing

: leftlangle e^X ight angle= e^{langle X angle} leftlangle e^{X - langle X angle} ight angle

and then applying the inequality: e^X geq 1+X ,

to the final exponential.

Information theory

If "p"("x") is the true probability distribution for "x", and "q"("x") is another distribution, then applying Jensen's inequality for the random variable "Y"("x") = "q"("x")/"p"("x") and the function φ("y") = −log("y") gives

:Bbb{E}{varphi(Y)} ge varphi(Bbb{E}{Y})

:Rightarrow int p(x) log frac{p(x)}{q(x)}dx ge - log int p(x) frac{q(x)}{p(x)}dx

:Rightarrow int p(x) log frac{p(x)}{q(x)}dx ge 0

:Rightarrow - int p(x) log q(x) ge - int p(x) log p(x),

a result called Gibbs' inequality.

It shows that the average message length is minimised when codes are assigned on the basis of the true probabilities "p" rather than any other distribution "q". The quantity that is non-negative is called the Kullback-Leibler distance of "q" from "p".

Rao-Blackwell theorem

If "L" is a convex function, then from Jensen's inequality we get

:L(Bbb{E}{delta(X)}) le Bbb{E}{L(delta(X))} quad Rightarrow quad Bbb{E}{L(Bbb{E}{delta(X)})} le Bbb{E}{L(delta(X))}.

So if δ("X") is some estimator of an unobserved parameter θ given a vector of observables "X"; and if "T"("X") is a sufficient statistic for θ; then an improved estimator, in the sense of having a smaller expected loss "L", can be obtained by calculating

:delta_{1}(X) = Bbb{E}_{ heta}{delta(X') ,|, T(X')= T(X)},

the expected value of δ with respect to θ, taken over all possible vectors of observations "X" compatible with the same value of "T"("X") as that observed.

This result is known as the Rao-Blackwell theorem.

ee also

* Law of averages

References

*cite book|author=Walter Rudin|title=Real and Complex Analysis|publisher=McGraw-Hill|year=1987|id=ISBN 0-07-054234-1
*cite book|author=David Chandler|title=Introduction to Modern Statistical Mechanics|publisher=Oxford|year=1987|id=ISBN 0-19-504277-8
*cite journal
last = Jensen
first = Johan Ludwig William Valdemar
authorlink = Johan Jensen
year = 1906
title = [http://www.springerlink.com/content/r55q1411g840j446/ Sur les fonctions convexes et les inégalités entre les valeurs moyennes]
journal = Acta Mathematica
volume = 30
pages = 175–193
doi = 10.1007/BF02418571

External links

*
* Jensen's inequality served as the logo for the [http://www.math.ku.dk/ma/en/ Mathematics department of Copenhagen University]

Footnotes


Wikimedia Foundation. 2010.

Игры ⚽ Нужно решить контрольную?

Look at other dictionaries:

  • Jensen — may refer to:People with the surname Jensen: *Jensen (surname)Other *Jensen Electronics, an electronics brand owned by Audiovox Corporation. *Jensen! is a Dutch late night talk show on the television station RTL 5. *Jensen Music, a guitar… …   Wikipedia

  • Jensen (disambiguation) — Jensen is a surname meaning Son of Jens . It is the most common surname in Denmark. It may refer to:* The asteroid 5900 Jensen, named after Poul Jensen (astronomer). * Axel Jensen, Norwegian author * Arthur Jensen, an American psychologist *… …   Wikipedia

  • Inequality of arithmetic and geometric means — In mathematics, the inequality of arithmetic and geometric means, or more briefly the AM GM inequality, states that the arithmetic mean of a list of non negative real numbers is greater than or equal to the geometric mean of the same list; and… …   Wikipedia

  • Inequality — In mathematics, an inequality is a statement about the relative size or order of two objects, or about whether they are the same or not (See also: equality) *The notation a < b means that a is less than b . *The notation a > b means that a is… …   Wikipedia

  • Inequality (mathematics) — Not to be confused with Inequation. Less than and Greater than redirect here. For the use of the < and > signs as punctuation, see Bracket. More than redirects here. For the UK insurance brand, see RSA Insurance Group. The feasible regions… …   Wikipedia

  • Chebyshev's inequality — For the similarly named inequality involving series, see Chebyshev s sum inequality. In probability theory, Chebyshev’s inequality (also spelled as Tchebysheff’s inequality) guarantees that in any data sample or probability distribution, nearly… …   Wikipedia

  • Hölder's inequality — In mathematical analysis Hölder s inequality, named after Otto Hölder, is a fundamental inequality between integrals and an indispensable tool for the study of Lp spaces. Let (S, Σ, μ) be a measure space and let 1 ≤ p, q ≤ ∞ with… …   Wikipedia

  • An inequality on location and scale parameters — For probability distributions having an expected value and a median, the mean (i.e., the expected value) and the median can never differ from each other by more than one standard deviation. To express this in mathematical notation, let mu; , m ,… …   Wikipedia

  • Ky Fan inequality — In mathematics, the Ky Fan inequality is an inequality involving the geometric mean and arithmetic mean of two sets of real numbers of the unit interval. The result was published on page 5 of the book Inequalities by Beckenbach and Bellman (1961) …   Wikipedia

  • Karamata's inequality — In mathematics, Karamata s inequality, also known as the Majorization Inequality, states that if f(x) is a convex function in x and the sequence :x 1, x 2, ..., x n majorizes:y 1, y 2, ..., y n then :f(x 1)+f(x 2)+...+f(x n) ge f(y 1)+f(y… …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”