Chain rule for Kolmogorov complexity

The chain rule for Kolmogorov complexity is an analogue of the chain rule for information entropy, which states:

H (X, Y) = H (X) + H (Y | X)

That is, the combined randomness of two sequences X and Y is the sum of the randomness of X plus whatever randomness is left in Y once we know X. This follows immediately from the definitions of conditional and joint entropy fact from probability theory that the joint probability is the product of the marginal and conditional probability:

$P(X,Y) = P(X) P(Y|X) \,$

The equivalent statement for Kolmogorov complexity does not hold exactly; it is only true up to a logarithmic factor:

K (X, Y) = K (X) + K (Y | X) + O (log(K (X, Y)))

(An exact version, KP(X, Y) = KP(X) + KP(Y|X*) + O(1), holds for the prefix complexity KP, where X* is a shortest program for X.)

It states that the shortest program to reproduce X and Y is using a program to reproduce X and a program to reproduce Y given X, plus at most a logarithmic factor. Using this statement one can define an analogue of mutual information for Kolmogorov complexity.

Proof

The ≤ direction is obvious: we can write a program to produce x and y by concatenating a program to produce x, a program to produce y given access to x, and (whence the log term) the length of one of the programs, so that we know where to separate the two programs for x and y|x (log(K(x, y)) upper-bounds this length).

The ≥ direction is rather more difficult. The key to the proof is the construction of the set $A = \{(u,z) : K(u,z) \le K(x,y)\}$ ; that is, the construction of the set of all pairs $(u, z)$ such that the shortest input (for a universal Turing machine) that produces $(u, z)$ (and some way to distinguish $u$ from $z$ ) is shorter than the shortest producing $(x, y)$ . We will also need $A_x = \{z: K(x,z) \le K(x,y)\}$ , the subset of $A$ where $x = u$ . Enumerating $A$ is not hard (although not fast!). In parallel, simulate all $2 K (x, y)$ programs with length $\le K(x,y)$ . Output each $u, z$ as the simulated program terminates; eventually, any particular such $u, z$ will be produced. We can restrict to A_x simply by ignoring any output where $u \ne x$ . We will assume that the inequality does not hold and derive a contradiction by finding a short description for K(x) in terms of its index in the enumeration of a subset of A.

In particular, we can describe y by giving the index in which y is output when enumerating A_x (which is less than or equal to |A_x|) along with x and the value of K(x, y). Thus,

$K(y|x) \le \log(|A_x|) + O(\log(K(x,y))) + O(1) \,$

where the O(1) term is from the overhead of the enumerator, which does not depend on x or y.

Assume by way of contradiction

$K(y|x) > K(x,y) - K(x) + O(\log(K(x,y))) \,$

for some $x, y$ .

Combining this with the above, we have

$\log(|A_x|) + O(\log(K(x,y))) + O(1) \ge K(y|x) > K(x,y) - K(x) + O(\log(K(x,y))) \,$

So for any constant c and some x, y,

$\log(|A_x|) \ge K(x,y) - K(x) + c \log(K(x,y)) \,$

Let

$e = K(x,y) - K(x) + c \log K(x,y);\quad |A_x| \ge 2^e.$

Now, we can enumerate a set of possible values for $x$ by finding all $u$ such that $| A u | > 2 e$ -- list only the $u$ such that $y$ is output after more than $2 e$ other values when enumerating $A_u = \{ z: K(u,z) \le K(x,y) \}$ . $x$ is one such $u$ . Let $U$ be the set of all such $u$ . We can enumerate $U$ by enumerating each $A u$ (in parallel, as usual) and outputting a $u$ only when $y$ would be output for $A u$ and more than $2 e$ other values for the $A u$ would be output. Note that given $K (x, y)$ , $e$ and the index of $x$ in $U$ we can enumerate $U$ to recover $x$ .

Note that $\{(u,z) : u \in U, z \in A_u\}$ is a subset of $A$ , and $A$ has at most $2 K (x, y)$ elements, since there are only that many programs of length $K (x, y)$ . So we have:

$|U| < \frac{|A|}{2^e} \le \frac{2^{K(x,y)}}{2^e}$

where the first inequality follows since each element of $U$ corresponds to an $A u$ with strictly more than $2 e$ elements.

But this leads to a contradiction:

$K(x) < \log(K(x,y)) + \log(e) + \log(|U|) \le \log(K(x,y)) + \log(e) + K(x,y) - e$

which when we substitute in $e$ gives

K (x) < log(K (x, y)) + log(K (x, y) + K (x) + log(K (x, y)) + K (x, y) - K (x, y) + K (x) - c log K (x, y)

which for large enough $c$ gives $K (x) < K (x)$ .

References

Li, Ming; Vitányi, Paul (1997). An introduction to Kolmogorov complexity and its applications. New York: Springer-Verlag. ISBN 0387948686.

Categories:

Wikimedia Foundation. 2010.

Игры ⚽ Нужно сделать НИР?

Look at other dictionaries:

Chain rule (disambiguation) — Chain rule may refer to: Chain rule in calculus: Cyclic chain rule, or triple product rule: Chain rule (probability) … Wikipedia
Kolmogorov complexity — In algorithmic information theory (a subfield of computer science), the Kolmogorov complexity of an object, such as a piece of text, is a measure of the computational resources needed to specify the object. It is named after Soviet Russian… … Wikipedia
List of mathematics articles (C) — NOTOC C C closed subgroup C minimal theory C normal subgroup C number C semiring C space C symmetry C* algebra C0 semigroup CA group Cabal (set theory) Cabibbo Kobayashi Maskawa matrix Cabinet projection Cable knot Cabri Geometry Cabtaxi number… … Wikipedia
Mutual information — Individual (H(X),H(Y)), joint (H(X,Y)), and conditional entropies for a pair of correlated subsystems X,Y with mutual information I(X; Y). In probability theory and information theory, the mutual information (sometimes known by the archaic term… … Wikipedia
List of Russian people — The Millennium of Russia monument in Veliky Novgorod, featuring the statues and reliefs of the most celebrated people in the first 1000 years of Russian history … Wikipedia
List of terms relating to algorithms and data structures — The [http://www.nist.gov/dads/ NIST Dictionary of Algorithms and Data Structures] is a reference work maintained by the U.S. National Institute of Standards and Technology. It defines a large number of terms relating to algorithms and data… … Wikipedia
mathematics — /math euh mat iks/, n. 1. (used with a sing. v.) the systematic treatment of magnitude, relationships between figures and forms, and relations between quantities expressed symbolically. 2. (used with a sing. or pl. v.) mathematical procedures,… … Universalium
Entropy (information theory) — In information theory, entropy is a measure of the uncertainty associated with a random variable. The term by itself in this context usually refers to the Shannon entropy, which quantifies, in the sense of an expected value, the information… … Wikipedia
Chaos theory — This article is about chaos theory in Mathematics. For other uses of Chaos theory, see Chaos Theory (disambiguation). For other uses of Chaos, see Chaos (disambiguation). A plot of the Lorenz attractor for values r = 28, σ = 10, b = 8/3 … Wikipedia
List of mathematics articles (L) — NOTOC L L (complexity) L BFGS L² cohomology L function L game L notation L system L theory L Analyse des Infiniment Petits pour l Intelligence des Lignes Courbes L Hôpital s rule L(R) La Géométrie Labeled graph Labelled enumeration theorem Lack… … Wikipedia

Academic Dictionaries and Encyclopedias

Chain rule for Kolmogorov complexity

Proof

References

Look at other dictionaries:

Share the article and excerpts

Academic Dictionaries and Encyclopedias

Wikipedia

Chain rule for Kolmogorov complexity

Proof

References

Look at other dictionaries:

Share the article and excerpts

Direct link