Chernoff bound

In probability theory, the Chernoff bound, named after Herman Chernoff, gives exponentially decreasing bounds on tail distributions of sums of independent random variables. It is better than the first or second moment based tail bounds such as Markov's inequality or Chebyshev inequality, which only yield power-law bounds on tail decay.

It is related to the (historically earliest) Bernstein inequalities, and to Hoeffding's inequality.

Let X₁, ..., X_n be independent Bernoulli random variables, each having probability p > 1/2. Then the probability of simultaneous occurrence of more than n/2 of the events ${X k = 1}$ has an exact value P, where

$P=\sum\limits_{i = \lfloor \frac{n}{2} \rfloor + 1}^n \binom{n}{i}p^i (1 - p)^{n - i} .$

The Chernoff bound shows that P has the following lower bound:

$P \ge 1 - \mathrm{e}^{- 2n \left( {p - \frac{1}{2}} \right)^2} .$

This result admits various generalisations as outlined below. One can encounter many flavours of Chernoff bounds: the original additive form (which gives a bound on the absolute error) or the more practical multiplicative form (which bounds the error relative to the mean).

1 A motivating example
2 The first step in the proof of Chernoff bounds
3 Precise statements and proofs
4 Applications of Chernoff bound
5 Matrix Chernoff bound
6 See also
7 References

A motivating example

The simplest case of Chernoff bounds is used to bound the success probability of majority agreement for n independent, equally likely events.

A simple motivating example is to consider a biased coin. One side (say, Heads), is more likely to come up than the other, but you don't know which and would like to find out. The obvious solution is to flip it many times and then choose the side that comes up the most. But how many times do you have to flip it to be confident that you've chosen correctly?

In our example, let $X i$ denote the event that the ith coin flip comes up Heads; suppose that we want to ensure we choose the wrong side with at most a small probability ε. Then, rearranging the above, we must have:

$n \geq \frac{1}{(p -1/2)^2} \ln \frac{1}{\sqrt{\varepsilon}}.$

If the coin is noticeably biased, say coming up on one side 60% of the time (p = .6), then we can guess that side with 95% ( $\epsilon = .05$ ) accuracy after 150 flips $(n = 150)$ . If it is 90% biased, then a mere 10 flips suffices. If the coin is only biased a tiny amount, like most real coins are, the number of necessary flips becomes much larger.

More practically, the Chernoff bound is used in randomized algorithms (or in computational devices such as quantum computers) to determine a bound on the number of runs necessary to determine a value by majority agreement, up to a specified probability. For example, suppose an algorithm (or machine) A computes the correct value of a function f with probability p > 1/2. If we choose n satisfying the inequality above, the probability that a majority exists and is equal to the correct value is at least 1 − ε, which for small enough ε is quite reliable. If p is a constant, ε diminishes exponentially with growing n, which is what makes algorithms in the complexity class BPP efficient.

Notice that if p is very close to 1/2, the necessary n can become very large. For example, if p = 1/2 + 1/2^m, as it might be in some PP algorithms, the result is that n is bounded below by an exponential function in m:

$n \geq 2^{2m} \ln \frac{1}{\sqrt{\varepsilon}}.$

The first step in the proof of Chernoff bounds

The Chernoff bound for a random variable X, which is the sum of n independent random variables $X 1, X 2,..., X n$ , is obtained by applying e^tX for some well-chosen value of t. This method was first applied by Sergei Bernstein to prove the related Bernstein inequalities.

From Markov's inequality and using independence we can derive the following useful inequality:

For any t > 0,

$\Pr\left[X \ge a\right] = \Pr\left[e^{tX} \ge e^{ta}\right] \le \frac{ \mathbf{E} \left[e^{tX} \right]}{e^{ta}} = {\prod_i E[e^{tX_i}]\over e^{ta}}.$

In particular optimizing over t and using independence we obtain,

$\Pr\left[X \ge a\right] \leq \min_{t>

Similarly,

$\Pr\left[X \le a\right] = \Pr\left[e^{-tX} \ge e^{-ta}\right]$

and so,

$\Pr\left[X \le a\right] \leq \min_{t>

Precise statements and proofs

Theorem for additive form (absolute error)

The following Theorem is due to Wassily Hoeffding and hence is called Chernoff-Hoeffding theorem.

Assume random variables $X_1, X_2, \ldots, X_m$ are i.i.d. Let $p = E \left [X_i \right ]$ , $X_i \in \{0,1\}$ , and $ε > 0$ . Then

$\begin{align} &\Pr\left[ \frac 1 m \sum X_i \geq p + \varepsilon \right] \\ &\qquad\leq \left ( {\left (\frac{p}{p + \varepsilon}\right )}^{p+\varepsilon} {\left (\frac{1 - p}{1 -p - \varepsilon}\right )}^{1 - p- \varepsilon}\right ) ^m = e^{ - D(p+\varepsilon\|p) m} \end{align}$

and

$\begin{align} &\Pr\left[ \frac 1 m \sum X_i \leq p - \varepsilon \right] \\ &\qquad\leq \left ( {\left (\frac{p}{p - \varepsilon}\right )}^{p-\varepsilon} {\left (\frac{1 - p}{1 -p + \varepsilon}\right )}^{1 - p+ \varepsilon}\right ) ^m = e^{ - D(p-\varepsilon\|p) m}, \end{align}$

where

$D(x||y) = x \log \frac{x}{y} + (1-x) \log \frac{1-x}{1-y}$

is the Kullback-Leibler divergence between Bernoulli distributed random variables with parameters $x$ and $y$ respectively. If $p\geq 1/2$ , then $\Pr\left( X>

Proof

The proof starts from the general inequality (+) above. $q = p + ε$ . Taking a = mq in (+), we obtain:

$\Pr\left[ \frac{1}{m} \sum X_i \ge q\right] \le \inf_{t>0} \left[\frac{ E\left[e^{tX_i} \right] }{e^{tq}}\right]^m . " border="0">

Now, knowing that $Pr [X i = 1] = p$ , $Pr [X i = 0] = (1 - p)$ , we have

$\left[\frac{ E\left[e^{tX_i} \right] }{e^{tq}}\right]^m = \left[\frac{p e^t + (1-p)}{e^{tq} }\right]^m = [pe^{(1-q)t} + (1-p)e^{-qt}]^m.$

Therefore we can easily compute the infimum, using calculus and some logarithms. Thus,

$\begin{align} &\frac{d}{dt} \log(pe^{(1-q)t} + (1-p)e^{-qt}) \\ &\qquad= \frac{1}{pe^{(1-q)t} + (1-p)e^{-qt}} ((1-q)pe^{(1-q)t}-q(1-p)e^{-qt}) \\ &\qquad = -q + \frac{pe^{(1-q)t}}{pe^{(1-q)t}+(1-p)e^{-qt}} \end{align}$

Setting the last equation to zero and solving, we have

$\begin{align} q & = \frac{pe^{(1-q)t}}{pe^{(1-q)t}+(1-p)e^{-qt}} = \frac{pe^{(1-q)t}}{e^{-qt}(pe^{t}+(1-p))} \\ pe^{(1-q)t} & = pe^{-qt}e^t = qe^{-qt}(pe^{t}+1-p) \\ \frac{p}{q}e^t & = pe^t + 1-p \end{align}$

so that $e^t = (1-p)\left(\frac{p}{q}-p\right)^{-1}$ .

Thus, $t = \log\left(\frac{(1-p)q}{(1-q)p}\right)$ .

As $q = p + ε > p$ , we see that $t > 0$ , so our bound is satisfied on $t$ . Having solved for $t$ , we can plug back into the equations above to find that

$\begin{align} &\log(pe^{(1-q)t} + (1-p)e^{-qt}) = \log[e^{-qt}(1-p+pe^t)] \\ &\qquad = \log\left[e^{-q \log\left(\frac{(1-p)q}{(1-q)p}\right)}\right] + \log\left[1-p+pe^{\log\left(\frac{1-p}{1-q}\right)}e^{\log\frac{q}{p}}\right] \\ &\qquad = -q\log\frac{1-p}{1-q} -q \log\frac{q}{p} + \log\left[1-p+ p\left(\frac{1-p}{1-q}\right)\frac{q}{p}\right] \\ &\qquad = -q\log\frac{1-p}{1-q} -q \log\frac{q}{p} + \log\left[\frac{(1-p)(1-q)}{1-q}+\frac{(1-p)q}{1-q}\right] \\ &\qquad = -q\log\frac{q}{p} + (1-q)\log\frac{1-p}{1-q} = -D(q \| p). \end{align}$

We now have our desired result, that

$\Pr\left[\frac{1}{m}\sum X_i \ge p + \varepsilon\right] \le e^{-D(p+\varepsilon\|p) m}.$

To complete the proof for the symmetric case, we simply define the random variable $Y i = 1 - X i$ , apply the same proof, and plug it into our bound.

Simpler bounds

A simpler bound follows by relaxing the theorem using $D( p + x \| p) \geq 2 x^2$ , which follows from the convexity of $D(p+x\| p)$ and the fact that $\frac{d^2}{dx^2} D(p+x\|p) = \frac{1}{(p+x)(1-p-x)}$ . This results in a special case of Hoeffding's inequality. Sometimes, the bound $D( (1+x) p \| p) \geq x^2 p/4$ for $-1/2 \leq x \leq 1/2$ , which is stronger for $p < 1 / 8$ , is also used.

Theorem for multiplicative form of Chernoff bound (relative error)

Let random variables $X_1, X_2, \ldots, X_n$ be independent random variables taking on values 0 or 1. Further, assume that $Pr (X i = 1) = p i$ . Then, if we let $X = \sum_{i=1}^n X_i$ and $μ$ be the expectation of $X$ , for any $δ > 0$

$\Pr \left[ X > (

Proof

According to (+),

$\begin{align} \Pr[X > ( 0} \frac{\mathbf{E}\left[\prod_{i=1}^n\exp(tX_i)\right]}{\exp(t(1+\delta)\mu)} \\ & = \inf_{t > 0} \frac{\prod_{i=1}^n\mathbf{E}[\exp(tX_i)]}{\exp(t(1+\delta)\mu)} \\ & = \inf_{t > 0} \frac{\prod_{i=1}^n\left[p_i\exp(t) + (1-p_i)\right]}{\exp(t(1+\delta)\mu)} \end{align} " border="0">

The third line above follows because $e^{tX_i}$ takes the value $e t$ with probability $p i$ and the value $1$ with probability $1 - p i$ . This is identical to the calculation above in the proof of the Theorem for additive form (absolute error).

Rewriting $p i e t + (1 - p i)$ as $p i (e t - 1) + 1$ and recalling that $1+x \le e^x$ (with strict inequality if $x > 0$ ), we set $x = p i (e t - 1)$ . The same result can be obtained by directly replacing a in the equation for the Chernoff bound with $(1 + δ) μ$ .^[1]

Thus,

$\begin{align} &\Pr[X > (<span class=$ 1+\delta)\mu] < \frac{\prod_{i=1}^n\exp(p_i(e^t-1))}{\exp(t(1+\delta)\mu)} \\ &\qquad = \frac{\exp\left((e^t-1)\sum_{i=1}^n p_i\right)}{\exp(t(1+\delta)\mu)} = \frac{\exp((e^t-1)\mu)}{\exp(t(1+\delta)\mu)}. \end{align} " border="0">

If we simply set $t = log (1 + δ)$ so that $t > 0$ for $δ > 0$ , we can substitute and find

$\frac{\exp((e^t-1)\mu)}{\exp(t(1+\delta)\mu)} = \frac{\exp((1+\delta - 1)\mu)}{(1+\delta)^{(1+\delta)\mu}} = \left[\frac{\exp(\delta)}{(1+\delta)^{(1+\delta)}}\right]^\mu$

This proves the result desired. A similar proof strategy can be used to show that

Pr [X < (1 - δ) μ] < exp (- μδ 2 / 2).

Better Chernoff bounds for some special cases

We can obtain stronger bounds using simpler proof techniques for some special cases of symmetric random variables.

Let $X 1, X 2,..., X n$ be independent random variables,

$X = \sum_{i=1}^n X_i$ .

(a) $\Pr(X_i = 1) = \Pr(X_i = -1) = \frac{1}{2}$ .

Then,

$\Pr( X \ge a) \le e^{\frac{-a^2}{2n}}, \quad a > ,

and therefore also

$\Pr( |X| \ge a) \le 2e^{\frac{-a^2}{2n}}, \quad a > .

(b) $\Pr(X_i = 1) = \Pr(X_i = 0) = \frac{1}{2}, \mathbf{E}[X] = \mu = \frac{n}{2}$

Then,

$\Pr( X \ge \mu+a) \le e^{\frac{-2a^2}{n}}, \quad a > ,

$\Pr( X \ge (1+\delta)\mu) \le e^{-\frac{\delta^2\mu}{3}}, \quad \delta > ,

$\Pr( X \le \mu-a) \le e^{\frac{-2a^2}{n}}, \quad 0 < a < \mu$ ,

$\Pr( X \le (1-\delta)\mu) \le e^{-\frac{\delta^2\mu}{2}}, \quad 0 < \delta < 1$ .

Applications of Chernoff bound

Chernoff bounds have very useful applications in set balancing and packet routing in sparse networks.

The set balancing problem arises while designing statistical experiments. Typically while designing a statistical experiment, given the features of each participant in the experiment, we need to know how to divide the participants into 2 disjoint groups such that each feature is roughly as balanced as possible between the two groups. Refer to this book section for more info on the problem.

Chernoff bounds are also used to obtain tight bounds for permutation routing problems which reduce network congestion while routing packets in sparse networks. Refer to this book section for a thorough treatment of the problem.

Matrix Chernoff bound

Rudolf Ahlswede and Andreas Winter introduced a Chernoff bound for matrix-valued random variables.

References

^ Refer to the proof above

Chernoff, H. (1952). "A Measure of Asymptotic Efficiency for Tests of a Hypothesis Based on the sum of Observations". Annals of Mathematical Statistics 23 (4): 493–507. doi:10.1214/aoms/1177729330. JSTOR 2236576. MR 57518. Zbl 0048.11804.
Hoeffding, W. (1963). "Probability Inequalities for Sums of Bounded Random Variables". Journal of the American Statistical Association 58 (301): 13–30. doi:10.2307/2282952. JSTOR 2282952.
Chernoff, H. (1981). "A Note on an Inequality Involving the Normal Distribution". The Annals of Probability 9 (3): 533. doi:10.1214/aop/1176994428. JSTOR 2243541. MR 614640. Zbl 0457.60014.
Hagerup, T. (1990). "A guided tour of Chernoff bounds". Information Processing Letters 33 (6): 305. doi:10.1016/0020-0190(90)90214-I.
Ahlswede, R.; Winter, A. (2003). "Strong Converse for Identification via Quantum Channels". IEEE Transactions on Information Theory 48 (3): 569–579. arXiv:quant-ph/0012127.
Mitzenmacher, M.; Upfal, E. (2005). Probability and Computing: Randomized Algorithms and Probabilistic Analysis. ISBN 9780521835404. http://books.google.com/books?id=0bAYl6d7hvkC.
Nielsen, F. (2011). "Chernoff information of exponential families". arXiv:1102.2684 [cs.IT].

Categories:

Probabilistic inequalities

Wikimedia Foundation. 2010.

Игры ⚽ Поможем написать курсовую

Look at other dictionaries:

Chernoff — is a surname and may refer to: Adrian Chernoff Herman Chernoff applied mathematician, statistician and physicist Chernoff bound, also called Chernoff s inequality Chernoff face Joel Chernoff singer songwriter Joel Chernoff movie producer Maxine… … Wikipedia
Herman Chernoff — (born July 1, 1923) is an American applied mathematician, statistician and physicist formerly a professor at MIT and currently working at Harvard University. [cite journal last1 = Bather | first1 = John journal = Statistical Science title = A… … Wikipedia
Inégalité de Chernoff — En théorie des probabilités, l inégalité de Chernoff, d après Herman Chernoff (de), énonce le résultat suivant : soient des variables aléatoires indépendantes, telles que E[Xi] = 0 et pour tout i. On pose … Wikipédia en Français
List of statistics topics — Please add any Wikipedia articles related to statistics that are not already on this list.The Related changes link in the margin of this page (below search) leads to a list of the most recent changes to the articles listed below. To see the most… … Wikipedia
Disjunct matrix — Disjunct and separable matrices play a pivotal role in the mathematical area of non adaptive group testing. This area investigates efficient designs and procedures to identify needles in haystacks by conducting the tests on groups of items… … Wikipedia
List of mathematics articles (C) — NOTOC C C closed subgroup C minimal theory C normal subgroup C number C semiring C space C symmetry C* algebra C0 semigroup CA group Cabal (set theory) Cabibbo Kobayashi Maskawa matrix Cabinet projection Cable knot Cabri Geometry Cabtaxi number… … Wikipedia
Concentration inequality — In mathematics, concentration inequalities provide probability bounds on how a random variable deviates from some value (e.g. its expectation). The laws of large numbers of classical probability theory state that sums of independent random… … Wikipedia
PP (complexity) — In complexity theory, PP is the class of decision problems solvable by a probabilistic Turing machine in polynomial time, with an error probability of less than 1/2 for all instances. The abbreviation PP refers to probabilistic polynomial time.… … Wikipedia
Central limit theorem — This figure demonstrates the central limit theorem. The sample means are generated using a random number generator, which draws numbers between 1 and 100 from a uniform probability distribution. It illustrates that increasing sample sizes result… … Wikipedia
BPP — In complexity theory, BPP is the class of decision problems solvable by a probabilistic Turing machine in polynomial time, with an error probability of at most 1/3 for all instances. The abbreviation BPP refers to Bounded error, Probabilistic,… … Wikipedia

Academic Dictionaries and Encyclopedias

Chernoff bound

Contents

A motivating example

The first step in the proof of Chernoff bounds

Precise statements and proofs

Theorem for additive form (absolute error)

Proof

Simpler bounds

Theorem for multiplicative form of Chernoff bound (relative error)

Proof

Better Chernoff bounds for some special cases

Applications of Chernoff bound

Matrix Chernoff bound

See also

References

Look at other dictionaries:

Share the article and excerpts

Academic Dictionaries and Encyclopedias

Wikipedia

Chernoff bound

Contents

A motivating example

The first step in the proof of Chernoff bounds

Precise statements and proofs

Theorem for additive form (absolute error)

Proof

Simpler bounds

Theorem for multiplicative form of Chernoff bound (relative error)

Proof

Better Chernoff bounds for some special cases

Applications of Chernoff bound

Matrix Chernoff bound

See also

References

Look at other dictionaries:

Share the article and excerpts

Direct link