# Asymptotic equipartition property

Asymptotic equipartition property

In information theory the asymptotic equipartition property (AEP) is a general property of the output samples of a stochastic source. It is fundamental to the concept of "typical set" used in theories of compression.

Roughly speaking, the theorem states that although there are many series of results that may be produced by a random process, the one actually produced is most probably from a loosely-defined set of outcomes that all have approximately the same chance of being the one actually realized. (This is a consequence of the law of large numbers and ergodic theory.) Although there are individual outcomes which have a higher probability than any outcome in this set, the vast number of outcomes in the set almost guarantees that the outcome will come from the set.

In the field of Pseudorandom number generation, a candidate generator of undetermined quality whose output sequence lies too far outside the typical set by some statistical criteria is rejected as insufficiently random. Thus, although the typical set is loosely defined, practical notions arise concerning "sufficient" typicality.

Definition

Given a discrete-time stationary ergodic stochastic process $X$ on the probability space $\left(Omega,B,p\right)$, AEP is an assertion that

:$-frac\left\{1\right\}\left\{n\right\} log p\left(X_1^n\right) o H\left(X\right)$

where $X_1^n$ denotes the process limited to duration $\left\{1,dots,n\right\}$, and $H\left(X\right)$ or simply $H$ denotes the entropy rate of $X$, which must exist for all discrete-time stationary processes including the ergodic ones. AEP is proved for finite-valued (i.e.

AEP for discrete-time i.i.d. sources

Given $X$ is an i.i.d. source, its time series"X"1, ..., "X""n" is i.i.d. with entropy "H"("X") in the discrete-valued case and differential entropy in the continuous-valued case. The weak law of large numbers gives the AEP with convergence in probability,:$lim_\left\{n oinfty\right\}Prleft \left[left|-frac\left\{1\right\}\left\{n\right\} log p\left(X_1, X_2, ..., X_n\right) - H\left(X\right) ight|> epsilon ight\right] =0 qquad forall epsilon>0.$since the entropy is equal to the expectation of $frac\left\{1\right\}\left\{n\right\} log p\left(X_1, X_2, ..., X_n\right)$. The strong law of large number asserts the stronger almost sure convergence,:$Prleft \left[lim_\left\{n oinfty\right\} - frac\left\{1\right\}\left\{n\right\} log p\left(X_1, X_2, ..., X_n\right) = H\left(X\right) ight\right] =1$which implies the result from the weak law of large numbers.

AEP for discrete-time finite-valued stationary ergodic sources

Consider a finite-valued sample space $Omega$, i.e. Shannon-McMillan-Breiman theorem, can be shown using the sandwich proof by Algoet and Cover outlined as follows:
* Let $x$ denote some measurable set $x=X\left(A\right)$ for some $Ain B$
* Parameterize the joint probability by $n$ and x as $j\left(n,x\right):=p\left(x_0^\left\{n-1\right\}\right)$
* Parameterize the conditional probability by $i,k$ and $x$ as $c\left(i,k,x\right) := p\left(x_i|x_\left\{i-k\right\}^\left\{i-1\right\}\right)$.
* Take the limit of the conditional probability as $k oinfty$ and denote it as $c\left(i,x\right):=p\left(x_i|x_\left\{-infty\right\}^\left\{i-1\right\}\right)$
* Argue the two notions of entropy rate $lim_\left\{n oinfty\right\} E \left[-log j\left(n,X\right)\right]$ and $lim_\left\{n oinfty\right\} E \left[-log c\left(n,n,X\right)\right]$ exist and are equal for any stationary process including the stationary ergodic process $X$. Denote it as $H$.
* Argue that both $c\left(i,k,X\right):=\left\{p\left(X_i|X_\left\{i-k\right\}^\left\{i-1\right\}\right)\right\}$ and $c\left(i,X\right):=\left\{p\left(X_i|X_\left\{-infty\right\}^\left\{i-1\right\}\right)\right\}$, where $i$ is the time index, are stationary ergodic processes, whose sample means converge almost surely to some values denoted by $H^k$ and $H^\left\{infty\right\}$ respectively.
* Define the $k$-th order Markov approximation to the probability $a\left(n,k,x\right)$ as:$a\left(n,k,x\right):=p\left(X_0^\left\{k-1\right\}\right)prod_\left\{i=k\right\}^\left\{n-1\right\}p\left(X_i|X_\left\{i-k\right\}^\left\{i-1\right\}\right)=j\left(k,x\right)prod_\left\{i=k\right\}^\left\{n-1\right\} c\left(i,k,x\right)$
* Argue that $a\left(n,k,X\left(Omega\right)\right)$ is finite from the finite-value assumption.
* Express $-frac1nlog a\left(n,k,X\right)$ in terms of the sample mean of $c\left(i,k,X\right)$ and show that it converges almost surely to $H^k$
* Define $a\left(n,x\right):=p\left(x_0^\left\{n-1\right\}|x_\left\{-infty\right\}^\left\{-1\right\}\right)$, which is a probability measure.
* Express $-frac1nlog a\left(n,X\right)$ in terms of the sample mean of $c\left(i,X\right)$ and show that it converges almost surely to $H^\left\{infty\right\}$
* Argue that $H^ksearrow H$ as $k oinfty$ using the stationarity of the process.
* Argue that $H=H^\left\{infty\right\}$ using the Lévy's martingale convergence theorem and the finite-value assumption.
* Show that $Eleft \left[frac\left\{a\left(n,k,X\right)\right\}\left\{j\left(n,X\right)\right\} ight\right] =a\left(n, k,X\left(Omega\right)\right)$ which is finite as argued before.
* Show that $Eleft \left[frac\left\{j\left(n,X\right)\right\}\left\{a\left(n,X\right)\right\} ight\right] =1$ by conditioning on the infinite past $X_\left\{-infty\right\}^\left\{-1\right\}$ and iterating the expectation.
* Show that $\left(forall alphainmathbb\left\{R\right\}\right)left\left(Prleft \left[frac\left\{a\left(n,k,X\right)\right\}\left\{j\left(n,X\right)\right\}geq alpha ight\right] leq frac\left\{a\left(n, k,X\left(Omega\right)\right)\right\}\left\{alpha\right\} ight\right)$ using the Markov's inequality and the expectation derived previously.
* Similarly, show that $\left(forall alphainmathbb\left\{R\right\}\right)left\left(Prleft \left[frac\left\{j\left(n,X\right)\right\}\left\{a\left(n,X\right)\right\}geq alpha ight\right] leq frac\left\{1\right\}\left\{alpha\right\} ight\right)$, which is equivalent to $\left(forall alphainmathbb\left\{R\right\}\right)left\left(Prleft \left[frac1nlogfrac\left\{j\left(n,X\right)\right\}\left\{a\left(n,X\right)\right\}geq frac1nlogalpha ight\right] leq frac\left\{1\right\}\left\{alpha\right\} ight\right)$.
* Show that $limsup_\left\{n oinfty\right\}$ of both $frac1n log frac\left\{a\left(n,k,X\right)\right\}\left\{j\left(n,X\right)\right\}$ and $frac1n logfrac\left\{j\left(n,X\right)\right\}\left\{a\left(n,X\right)\right\}$ are non-positive almost surely by setting for any and applying the Borel-Cantelli lemma.
* Show that $liminf$ and $limsup$ of $-frac1n log j\left(n,X\right)$ are lower and upper bounded almost surely by $H^\left\{infty\right\}$ and $H^k$ respectively by breaking up the logarithms in the previous result.
* Complete the proof by pointing out that the upper and lower bounds are shown previously to approach $H$ as $k oinfty$.

AEP for non-stationary discrete-time source producing independent symbols

The assumptions of stationarity/ergodicity/idential distribution of random variables is not essential for the AEP to hold. Indeed, as is quite clear intuitively, the AEP requires only some form of the law of large numbers to hold, which is fairly general. However, the expression needs to be suitably generalized, and the conditions need to be formulated precisely.

We assume that the source is producing independent symbols, with possibly different output statistics at each instant. We assume that the statistics of the process are known completely, that is, the marginal distribution of the process seen at each time instant is known. The joint distribution is just the product of marginals. Then, under the condition (which can be relaxed) that

where,
* "Proof"

The proof follows from a simple application of Markov's inequality (applied to second moment of $log\left(p\left(X_i\right)\right)$.

It is obvious that the proof holds if any moment $E \left[|logp\left(X i\right)|^r\right]$ is uniformly bounded for $r>1$ (again by Markov's inequality applied to r"th" moment). $Box\left\{\right\}$

Even this condition is not necessary, but given a non-stationary random process, it should not be difficult to test whether the AEP holds using the above method.

Applications for AEP for non-stationary source producing independent symbols

The AEP for non-stationary discrete-time independent process leads us to (among other results) source coding theorem for non-stationary source (with independent output symbols) and channel coding theorem for non-stationary memoryless channels.

Source Coding Theorem

The source coding theorem for discrete time non-stationary independent sources can be found here: source coding theorem

Channel Coding Theorem

Channel coding theorem for discrete time non-stationary memoryless channels can be found here: noisy channel coding theorem

AEP for certain continuous-time stationary ergodic sources

Discrete-time functions can be interpolated to continuous-time functions. If such interpolation $f$ is measurable, we may define the continuous-time stationary process accordingly as $ilde\left\{X\right\}:=fcirc X$. If AEP holds for the discrete-time process, as in the i.i.d. or finite-valued stationary ergodic cases shown above, it automatically holds for the continuous-time stationary process derived from it by some measurable interpolation. i.e.$-frac\left\{1\right\}\left\{n\right\} log p\left( ilde\left\{X\right\}_0^ au\right) o H\left(X\right)$where $n$ corresponds to the degree of freedom in time $au$. $nH\left(X\right)/ au$ and $H\left(X\right)$ are the entropy per unit time and per degree of freedom respectively, defined by Shannon.

An important class of such continuous-time stationary process is the bandlimited stationary ergodic process with the sample space being a subset of the continuous $mathcal\left\{L\right\}_2$ functions. AEP holds if the process is white, in which case the time samples are i.i.d., or there exists $T>1/2W$, where $W$ is the nominal bandwidth, such that the $T$-spaced time samples take values in a finite set, in which case we have the discrete-time finite-valued stationary ergodic process.

Any time-invariant operations also preserves AEP, stationarity and ergodicity and we may easily turn a stationary process to non-stationary without losing AEP by nulling out a finite number of time samples in the process.

ee also

* Source coding theorem

References

The Classic Paper

* Claude E. Shannon. "A Mathematical Theory of Communication". Bell System Technical Journal, July/October 1948.

Other Journal Articles

* Paul H. Algoet and Thomas M. Cover. [http://yreka.stanford.edu/~cover/papers/paper084.pdf A Sandwich Proof of the Shannon-McMillan-Breiman Theorem] . The Annals of Probability, 16(2): 899-909, 1988.
* Sergio Verdu and Te Sun Han. The Role of the Asymptotic Equipartition Property in Noiseless Source Coding. IEEE Transactions on Information Theory, 43(3): 847-857, 1997.

Textbooks on Information Theory

* Thomas M. Cover, Joy A. Thomas. "Elements of Information Theory" New York: Wiley, 1991. ISBN 0-471-06259-6

Wikimedia Foundation. 2010.

### Look at other dictionaries:

• Noisy-channel coding theorem — In information theory, the noisy channel coding theorem (sometimes Shannon s theorem), establishes that for any given degree of noise contamination of a communication channel, it is possible to communicate discrete data (digital information)… …   Wikipedia

• List of mathematics articles (A) — NOTOC A A Beautiful Mind A Beautiful Mind (book) A Beautiful Mind (film) A Brief History of Time (film) A Course of Pure Mathematics A curious identity involving binomial coefficients A derivation of the discrete Fourier transform A equivalence A …   Wikipedia

• Additive white Gaussian noise — Explanation= In communications, the additive white Gaussian noise (AWGN) channel model is one in which the only impairment is the linear addition of wideband or white noise with a constant spectral density (expressed as watts per hertz of… …   Wikipedia

• List of probability topics — This is a list of probability topics, by Wikipedia page. It overlaps with the (alphabetical) list of statistical topics. There are also the list of probabilists and list of statisticians.General aspects*Probability *Randomness, Pseudorandomness,… …   Wikipedia

• AEP — is a three letter acronym that may refer to: * Adobe Engagement Platform, the unified format created by Adobe Systems which merges Macromedia Flash and Adobe Acrobat formats. * AEP Paphos FC, a Cypriot soccer club * Alpha Epsilon Pi, an… …   Wikipedia

• Law of large numbers — The law of large numbers (LLN) is a theorem in probability that describes the long term stability of the mean of a random variable. Given a random variable with a finite expected value, if its values are repeatedly sampled, as the number of these …   Wikipedia

• Typical set — In information theory, the typical set is a set of sequences whose probability is close to two raised to the negative power of the entropy of their source distribution. That this set has total probability close to one is a consequence of the… …   Wikipedia

• Shannon's source coding theorem — In information theory, Shannon s source coding theorem (or noiseless coding theorem) establishes the limits to possible data compression, and the operational meaning of the Shannon entropy.The source coding theorem shows that (in the limit, as… …   Wikipedia

• Rarität — Als selten gelten subjektiv Dinge, Ereignisse oder Stoffe, wenn ihr Anteil an einer Grundgesamtmenge weniger als ungefähr 1 Prozent beträgt. Dies entspricht einem Mengenverhältnis von 2 (dezimalen) Größenordnungen oder mehr. Üblicherweise sind… …   Deutsch Wikipedia

• Seltenheit — Als selten gelten subjektiv Dinge, Ereignisse oder Stoffe, wenn ihr Anteil an einer Grundgesamtmenge weniger als ungefähr 1 Prozent beträgt. Dies entspricht einem Mengenverhältnis von 2 (dezimalen) Größenordnungen oder mehr. Üblicherweise sind… …   Deutsch Wikipedia