Kolmogorov complexity

In algorithmic information theory (a subfield of computer science), the Kolmogorov complexity of an object, such as a piece of text, is a measure of the computational resources needed to specify the object. It is named after Soviet Russian mathematician Andrey Kolmogorov.

Kolmogorov complexity is also known as descriptive complexity^{[Note 1]}, Kolmogorov–Chaitin complexity, algorithmic entropy, or program-size complexity.

For example, consider the following two strings of length 64, each containing only lowercase letters, digits, and spaces:

abababababababababababababababababababababababababababababababab
4c1j5b2p0cv4w1x8rx2y39umgw5q85s7uraquuxdppa0q7nieieqe9noc4cvafzf

The first string has a short English-language description, namely "ab 32 times", which consists of 11 characters. The second one has no obvious simple description (using the same character set) other than writing down the string itself, which has 64 characters.

This image illustrates part of the Mandelbrot set fractal. Simply storing the 24-bit color of each pixel in this image would require 1.62 million bits; but a small computer program can reproduce these 1.62 million bits using the definition of the Mandelbrot set. Thus, the Kolmogorov complexity of the raw file encoding this bitmap is much less than 1.62 million.

More formally, the complexity of a string is the length of the string's shortest description in some fixed universal description language. The sensitivity of complexity relative to the choice of description language is discussed below. It can be shown that the Kolmogorov complexity of any string cannot be more than a few bytes larger than the length of the string itself. Strings whose Kolmogorov complexity is small relative to the string's size are not considered to be complex. The notion of Kolmogorov complexity can be used to state and prove impossibility results akin to Gödel's incompleteness theorem and Turing's halting problem.

1 Definition
2 History and context
3 Basic results
- 3.1 Incomputability of Kolmogorov complexity
- 3.2 Chain rule for Kolmogorov complexity
4 Compression
5 Chaitin's incompleteness theorem
6 Minimum message length
7 Kolmogorov randomness
8 See also
9 Notes
10 References
11 External links

Definition

To define Kolmogorov complexity, we must first specify a description language for strings. Such a description language can be based on any programming language, such as Lisp, Pascal, or Java Virtual Machine bytecode. If P is a program which outputs a string x, then P is a description of x. The length of the description is just the length of P as a character string. In determining the length of P, the lengths of any subroutines used in P must be accounted for. The length of any integer constant n which occurs in the program P is the number of bits required to represent n, that is (roughly) log₂n.

We could alternatively choose an encoding for Turing machines, where an encoding is a function which associates to each Turing Machine M a bitstring <M>. If M is a Turing Machine which on input w outputs string x, then the concatenated string <M> w is a description of x. For theoretical analysis, this approach is more suited for constructing detailed formal proofs and is generally preferred in the research literature. The binary lambda calculus may provide the simplest definition of complexity yet. In this article we will use an informal approach.

Any string s has at least one description, namely the program

 function GenerateFixedString()
    return s

If a description of s, d(s), is of minimal length—i.e. it uses the fewest number of characters—it is called a minimal description of s. Then the length of d(s)—i.e. the number of characters in the description—is the Kolmogorov complexity of s, written K(s). Symbolically,

$K(s) = |d(s)|. \quad$

We now consider how the choice of description language affects the value of K and show that the effect of changing the description language is bounded.

Theorem. If K₁ and K₂ are the complexity functions relative to description languages L₁ and L₂, then there is a constant c (which depends only on the languages L₁ and L₂) such that

$\forall s\ |K_1(s) - K_2(s)| \leq c.$

Proof. By symmetry, it suffices to prove that there is some constant c such that for all bitstrings s,

$K_1(s) \leq K_2(s) + c.$

Now, suppose there is a program in the language L₁ which acts as an interpreter for L₂:

  function InterpretLanguage(string p)

where p is a program in L₂. The interpreter is characterized by the following property:

Running InterpretLanguage on input p returns the result of running p.

Thus if P is a program in L₂ which is a minimal description of s, then InterpretLanguage(P) returns the string s. The length of this description of s is the sum of

The length of the program InterpretLanguage, which we can take to be the constant c.
The length of P which by definition is K₂(s).

This proves the desired upper bound.

History and context

Algorithmic information theory is the area of computer science that studies Kolmogorov complexity and other complexity measures on strings (or other data structures).

The concept and theory of Kolmogorov Complexity is based on a crucial theorem first discovered by Ray Solomonoff who published it in 1960, describing it in "A Preliminary Report on a General Theory of Inductive Inference"^[1] as part of his invention of algorithmic probability. He gave a more complete description in his 1964 publications, "A Formal Theory of Inductive Inference," Part 1 and Part 2 in Information and Control.^[2]^[3]

Andrey Kolmogorov later independently published this theorem in Problems Inform. Transmission,^[4] Gregory Chaitin also presents this theorem in J. ACM; Chaitin's paper was submitted October 1966, revised in December 1968 and cites both Solomonoff's and Kolmogorov's papers.^[5]

The theorem says that among algorithms that decode strings from their descriptions (codes) there exists an optimal one. This algorithm, for all strings, allows codes as short as allowed by any other algorithm up to an additive constant that depends on the algorithms, but not on the strings themselves. Solomonoff used this algorithm, and the code lengths it allows, to define a string's `universal probability' on which inductive inference of a string's subsequent digits can be based. Kolmogorov used this theorem to define several functions of strings: complexity, randomness, and information.

When Kolmogorov became aware of Solomonoff's work, he acknowledged Solomonoff's priority^[6] For several years, Solomonoff's work was better known in the Soviet Union than in the Western World. The general consensus in the scientific community, however, was to associate this type of complexity with Kolmogorov, who was concerned with randomness of a sequence while Algorithmic Probability became associated with Solomonoff, who focused on prediction using his invention of the universal a priori probability distribution.

There are several other variants of Kolmogorov complexity or algorithmic information. The most widely used one is based on self-delimiting programs and is mainly due to Leonid Levin (1974).

An axiomatic approach to Kolmogorov complexity based on Blum axioms (Blum 1967) was introduced by Mark Burgin in the paper presented for publication by Andrey Kolmogorov (Burgin 1982).

Some consider that naming the concept "Kolmogorov complexity" is an example of the Matthew effect.^[7]

Basic results

In the following discussion let K(s) be the complexity of the string s.

It is not hard to see that the minimal description of a string cannot be too much larger than the string itself: the program GenerateFixedString above that outputs s is a fixed amount larger than s.

Theorem. There is a constant c such that

$\forall s \ K(s) \leq |s| + c. \quad$

Incomputability of Kolmogorov complexity

The first result is that there is no way to effectively compute K.

Theorem. K is not a computable function.

In other words, there is no program which takes a string s as input and produces the integer K(s) as output. We show this by contradiction by making a program that creates a string that should only be able to be created by a longer program. Suppose there is a program

  function KolmogorovComplexity(string s)

that takes as input a string s and returns K(s). Now consider the program

  function GenerateComplexString(int n)
     for i = 1 to infinity:
        for each string s of length exactly i
           if KolmogorovComplexity(s) >= n
              return s
              quit

This program calls KolmogorovComplexity as a subroutine. This program tries every string, starting with the shortest, until it finds a string with complexity at least n, then returns that string. Therefore, given any positive integer n, it produces a string with Kolmogorov complexity at least as great as n. The program itself has a fixed length U. The input to the program GenerateComplexString is an integer n; here, the size of n is measured by the number of bits required to represent n which is log₂(n). Now consider the following program:

  function GenerateParadoxicalString()
      return GenerateComplexString(n₀)

This program calls GenerateComplexString as a subroutine and also has a free parameter n₀. This program outputs a string s whose complexity is at least n₀. By an auspicious choice of the parameter n₀ we will arrive at a contradiction. To choose this value, note s is described by the program GenerateParadoxicalString whose length is at most

$U + \log_2(n_0) + C \quad$

where C is the "overhead" added by the program GenerateParadoxicalString. Since n grows faster than log₂(n), there exists a value n₀ such that

$U + \log_2(n_0) + C < n_0. \quad$

But this contradicts the definition of s having a complexity at least n₀. That is, by the definition of K(s), the string s returned by GenerateParadoxicalString is only supposed to be able to be generated by a program of length n₀ or longer, but GenerateParadoxicalString is shorter than n₀. Thus the program named "KolmogorovComplexity" cannot actually computably find the complexity of arbitrary strings.

This is proof by contradiction where the contradiction is similar to the Berry paradox: "Let n be the smallest positive integer that cannot be defined in fewer than twenty English words." It is also possible to show the uncomputability of K by reduction from the uncomputability of the halting problem H, since K and H are Turing-equivalent.[1]

In the programming languages community there is a corollary known as the full employment theorem, stating there is no perfect size-optimizing compiler.

Chain rule for Kolmogorov complexity

Main article: Chain rule for Kolmogorov complexity

The chain rule for Kolmogorov complexity states that

$K(X,Y) = K(X) + K(Y|X) + O(\log(K(X,Y))).\quad$

It states that the shortest program that reproduces X and Y is no more than a logarithmic term larger than a program to reproduce X and a program to reproduce Y given X. Using this statement one can define an analogue of mutual information for Kolmogorov complexity.

Compression

It is straightforward to compute upper bounds for $K (s)$ : simply compress the string $s$ with some method, implement the corresponding decompressor in the chosen language, concatenate the decompressor to the compressed string, and measure the resulting string's length.

A string s is compressible by a number c if it has a description whose length does not exceed $| s | - c$ . This is equivalent to saying $K(s) \le |s|-c$ . Otherwise s is incompressible by c. A string incompressible by 1 is said to be simply incompressible; by the pigeonhole principle which applies because every compressed string maps to only one uncompressed string, incompressible strings must exist, since there are $2 n$ bit strings of length n but only 2ⁿ − 1 shorter strings, that is strings of length less than n, i.e. with length 0,1,...,n-1.^[8]

For the same reason, most strings are complex in the sense that they cannot be significantly compressed: $K (s)$ is not much smaller than $| s |$ , the length of s in bits. To make this precise, fix a value of n. There are $2 n$ bitstrings of length n. The uniform probability distribution on the space of these bitstrings assigns exactly equal weight $2 - n$ to each string of length n.

Theorem. With the uniform probability distribution on the space of bitstrings of length n, the probability that a string is incompressible by c is at least $1 - 2 - c + 1 + 2 - n$ .

To prove the theorem, note that the number of descriptions of length not exceeding $n - c$ is given by the geometric series:

$1 + 2 + 2^2 + \cdots + 2^{n-c} = 2^{n-c+1}-1.\$

There remain at least

$2^n-2^{n-c+1}+1\$

many bitstrings of length n that are incompressible by c. To determine the probability divide by $2 n$ .

Chaitin's incompleteness theorem

We know that, in the set of all possible strings, most strings are complex in the sense that they cannot be described in any significantly "compressed" way. However, it turns out that the fact that a specific string is complex cannot be formally proved, if the string's complexity is above a certain threshold. The precise formalization is as follows. First fix a particular axiomatic system S for the natural numbers. The axiomatic system has to be powerful enough so that to certain assertions A about complexity of strings one can associate a formula F_A in S. This association must have the following property: if F_A is provable from the axioms of S, then the corresponding assertion A is true. This "formalization" can be achieved either by an artificial encoding such as a Gödel numbering or by a formalization which more clearly respects the intended interpretation of S.

Theorem. There exists a constant L (which only depends on the particular axiomatic system and the choice of description language) such that there does not exist a string s for which the statement

$K(s) \geq L \quad$

(as formalized in S) can be proven within the axiomatic system S.

Note that by the abundance of nearly incompressible strings, the vast majority of those statements must be true.

The proof of this result is modeled on a self-referential construction used in Berry's paradox. The proof is by contradiction. If the theorem were false, then

Assumption (X): For any integer n there exists a string s for which there is a proof in S of the formula "K(s) ≥ n" (which we assume can be formalized in S).

We can find an effective enumeration of all the formal proofs in S by some procedure

  function NthProof(int n)

which takes as input n and outputs some proof. This function enumerates all proofs. Some of these are proofs for formulas we do not care about here (examples of proofs which will be listed by the procedure NthProof are the various known proofs of the law of quadratic reciprocity, those of Fermat's little theorem or the proof of Fermat's last theorem all translated into the formal language of S). Some of these are complexity formulas of the form K(s) ≥ n where s and n are constants in the language of S. There is a program

  function NthProofProvesComplexityFormula(int n)

which determines whether the nth proof actually proves a complexity formula K(s) ≥ L. The strings s and the integer L in turn are computable by programs:

  function StringNthProof(int n)

  function ComplexityLowerBoundNthProof(int n)

Consider the following program

  function GenerateProvablyComplexString(int n)
     for i = 1 to infinity:
        if  NthProofProvesComplexityFormula(i) and ComplexityLowerBoundNthProof(i) ≥ n
           return StringNthProof(i)
           quit

Given an n, this program tries every proof until it finds a string and a proof in the formal system S of the formula K(s) ≥ L for some L ≥ n. The program terminates by our Assumption (X). Now this program has a length U. There is an integer n₀ such that U + log₂(n₀) + C < n₀, where C is the overhead cost of

   function GenerateProvablyParadoxicalString()
      return GenerateProvablyComplexString(n₀)
      quit

The program GenerateProvablyParadoxicalString outputs a string s for which there exists an L such that K(s) ≥ L can be formally proved in S with L ≥ n₀. In particular K(s) ≥ n₀ is true. However, s is also described by a program of length U + log₂(n₀) + C so its complexity is less than n₀. This contradiction proves Assumption (X) cannot hold.

Similar ideas are used to prove the properties of Chaitin's constant.

Minimum message length

The minimum message length principle of statistical and inductive inference and machine learning was developed by C.S. Wallace and D.M. Boulton in 1968. MML is Bayesian (it incorporates prior beliefs) and information-theoretic. It has the desirable properties of statistical invariance (the inference transforms with a re-parametrisation, such as from polar coordinates to Cartesian coordinates), statistical consistency (even for very hard problems, MML will converge to any underlying model) and efficiency (the MML model will converge to any true underlying model about as quickly as is possible). C.S. Wallace and D.L. Dowe (1999) showed a formal connection between MML and algorithmic information theory (or Kolmogorov complexity).

Kolmogorov randomness

Kolmogorov randomness (also called algorithmic randomness) defines a string (usually of bits) as being random if and only if it is shorter than any computer program that can produce that string. This definition of randomness is critically dependent on the definition of Kolmogorov complexity. To make this definition complete, a computer has to be specified, usually a Turing machine. According to the above definition of randomness, a random string is also an "incompressible" string, in the sense that it is impossible to give a representation of the string using a program whose length is shorter than the length of the string itself. However, according to this definition, most strings shorter than a certain length end up being (Chaitin-Kolmogorovically) random because the best one can do with very small strings is to write a program that simply prints these strings.

Notes

^ Not to be confused with descriptive complexity theory, analysis of the complexity of decision problems by their expressibility as logical formulae.

References

^ Solomonoff, Ray (February 4, 1960). "A Preliminary Report on a General Theory of Inductive Inference" (PDF). Report V-131 (Cambridge, Ma.: Zator Co.). http://world.std.com/~rjs/rayfeb60.pdf. revision, Nov., 1960.
^ Solomonoff, Ray (March 1964). "A Formal Theory of Inductive Inference Part I". Information and Control 7 (1): 1–22. doi:10.1016/S0019-9958(64)90223-2. http://world.std.com/~rjs/1964pt1.pdf.
^ Solomonoff, Ray (June 1964). "A Formal Theory of Inductive Inference Part II". Information and Control 7 (2): 224–254. doi:10.1016/S0019-9958(64)90131-7. http://world.std.com/~rjs/1964pt2.pdf.
^ Kolmogoro, A.N. (1965). "Three Approaches to the Quantitative Definition of Information". Problems Inform. Transmission 1 (1): 1–7. http://www.ece.umd.edu/~abarg/ppi/contents/1-65-abstracts.html#1-65.2.
^ Chaitin, Gregory J. (1969). "On the Simplicity and Speed of Programs for Computing Infinite Sets of Natural Numbers" (PDF). Journal of the ACM 16 (3): 407. doi:10.1145/321526.321530. http://reference.kfupm.edu.sa/content/o/n/on_the_simplicity_and_speed_of_programs__94483.pdf.
^ Kolmogorov, A. (1968). "Logical basis for information theory and probability theory". IEEE Transactions on Information Theory 14 (5): 662–664. doi:10.1109/TIT.1968.1054210.
^ Li, Ming; Paul Vitanyi (1997-02-27). An Introduction to Kolmogorov Complexity and Its Applications (2nd ed.). Springer. ISBN 0387948686.
^ As there is N_L = 2^L strings of length L, the number of strings of lengths L=0..(n−1) is N₀ + N₁ + ... + N_n−1 = 2⁰ + 2¹ + ... + 2ⁿ⁻¹, which is a finite geometric series with sum 2⁰ + 2¹ + ... + 2ⁿ⁻¹ = 2⁰ × (1 − 2ⁿ) / (1 − 2) = 2ⁿ − 1.

Blum, M. (1967). "On the size of machines". Information and Control 11 (3): 257. doi:10.1016/S0019-9958(67)90546-3.
Burgin, M. (1982), "Generalized Kolmogorov complexity and duality in theory of computations", Notices of the Russian Academy of Sciences, v.25, No. 3, pp. 19–23.
Cover, Thomas M. and Thomas, Joy A., Elements of information theory, 1st Edition. New York: Wiley-Interscience, 1991. ISBN 0-471-06259-6. 2nd Edition. New York: Wiley-Interscience, 2006. ISBN 0-471-24195-4.
Kolmogorov, Andrei N. (1963). "On Tables of Random Numbers". Sankhyā Ser. A. 25: 369–375. MR 178484.
Kolmogorov, Andrei N. (1998). "On Tables of Random Numbers". Theoretical Computer Science 207 (2): 387–395. doi:10.1016/S0304-3975(98)00075-9. MR 1643414.
Lajos, Rónyai and Gábor, Ivanyos and Réka, Szabó, Algoritmusok. TypoTeX, 1999. ISBN 963-2790-14-6
Li, Ming and Vitányi, Paul, An Introduction to Kolmogorov Complexity and Its Applications, Springer, 1997. Introduction chapter full-text.
Yu Manin, A Course in Mathematical Logic, Springer-Verlag, 1977. ISBN 9780720428445
Sipser, Michael, Introduction to the Theory of Computation, PWS Publishing Company, 1997. ISBN 0-534-95097-3.
Wallace, C. S. and Dowe, D. L., Minimum Message Length and Kolmogorov Complexity, Computer Journal, Vol. 42, No. 4, 1999).

External links

The Legacy of Andrei Nikolaevich Kolmogorov
Chaitin's online publications
Solomonoff's IDSIA page
Generalizations of algorithmic information by J. Schmidhuber
Ming Li and Paul Vitanyi, An Introduction to Kolmogorov Complexity and Its Applications, 2nd Edition, Springer Verlag, 1997.
Tromp's lambda calculus computer model offers a concrete definition of K()
Universal AI based on Kolmogorov Complexity ISBN 3-540-22139-5 by M. Hutter: ISBN 3-540-22139-5
David Dowe's Minimum Message Length (MML) and Occam's razor pages.
P. Grunwald, M. A. Pitt and I. J. Myung (ed.), Advances in Minimum Description Length: Theory and Applications, M.I.T. Press, April 2005, ISBN 0-262-07262-9.

Data compression methods

Information theory

Entropy · Complexity · Redundancy · Lossy · Timeline of information theory

Lossless

Entropy encoding	Shannon–Fano · Shannon–Fano–Elias · Huffman · Adaptive Huffman · Arithmetic · Range · Golomb · Universal (Gamma · Exp-Golomb · Fibonacci · Levenshtein)

Dictionary	RLE · Byte pair encoding · DEFLATE · Lempel–Ziv (LZ77/78 · LZSS · LZW · LZWL · LZO · LZMA · LZX · LZRW · LZJB · LZS · LZT · ROLZ) · Statistical Lempel Ziv

Others	CTW · BWT · PPM · DMC · Delta

Audio

Theory	Companding · Convolution · Dynamic range · Latency · Sampling · Nyquist–Shannon theorem · Sound quality

Audio codec parts	LPC (LAR · LSP) · WLPC · CELP · ACELP · A-law · μ-law · ADPCM · DPCM · MDCT · Fourier transform · Psychoacoustic model

Others	Bit rate (CBR · ABR · VBR) · Speech compression · Sub-band coding

Image

Terms	Color space · Pixel · Chroma subsampling · Compression artifact · Image resolution

Methods	RLE · Fractal · Wavelet · EZW · SPIHT · LP · DCT · Chain code · KLT

Others	Test images · PSNR quality measure · Quantization

Video

Terms	Video characteristics · Frame · Frame rate · Interlace · Frame types · Video quality · Video resolution

Video codec parts	Motion compensation · DCT · Quantization

Others	Video codecs · Rate distortion theory · Bit rate (CBR · ABR · VBR)

See Compression formats for formats and Compression software implementations for codecs

Categories:

Wikimedia Foundation. 2010.

Игры ⚽ Нужна курсовая?

Look at other dictionaries:

Kolmogorov complexity — noun The complexity of an information object such as a book or an image informally defined as the length of the shortest program that produces that information object … Wiktionary
Chain rule for Kolmogorov complexity — The chain rule for Kolmogorov complexity is an analogue of the chain rule for information entropy, which states: H(X,Y) = H(X) + H(Y | X) That is, the combined randomness of two sequences X and Y is the sum of the randomness of X plus whatever… … Wikipedia
Complexity — For other uses, see Complexity (disambiguation). In general usage, complexity tends to be used to characterize something with many parts in intricate arrangement. The study of these complex linkages is the main goal of complex systems theory. In… … Wikipedia
Complexity (disambiguation) — In general usage, complexity tends to be used to characterize something with many parts in intricate arrangement. Complexity may also refer to: Complex systems Complexity theory (disambiguation) Kolmogorov complexity Los Angeles Complexity, a… … Wikipedia
Kolmogorov-Entropie — Die Kolmogorow Komplexität (nach Andrei Nikolajewitsch Kolmogorow) ist ein Maß für die Strukturiertheit einer Zeichenkette und ist durch die Länge des kürzesten Programms gegeben, das diese Zeichenkette erzeugt. Dieses kürzeste Programm gibt… … Deutsch Wikipedia
Kolmogorov-Komplexität — Die Kolmogorow Komplexität (nach Andrei Nikolajewitsch Kolmogorow) ist ein Maß für die Strukturiertheit einer Zeichenkette und ist durch die Länge des kürzesten Programms gegeben, das diese Zeichenkette erzeugt. Dieses kürzeste Programm gibt… … Deutsch Wikipedia
Kolmogorov structure function — (KSF) is used in the algorithmic theory of complexity for describing the structure of a string by use of models (programs) of increasing complexity … Wikipedia
Andrey Kolmogorov — Infobox Scientist name = Andrey Kolmogorov birth date = birth date|1903|4|25 birth place = Tambov, Imperial Russia nationality = Russian death date = death date and age|1987|10|20|1903|4|25 death place = Moscow, USSR field = Mathematician work… … Wikipedia
Specified complexity — Part of a series of articles on Intelligent design … Wikipedia
Cognitive complexity — Psychology Cognitive psychology Perception … Wikipedia

Academic Dictionaries and Encyclopedias

Kolmogorov complexity

Contents

Definition

History and context

Basic results

Incomputability of Kolmogorov complexity

Chain rule for Kolmogorov complexity

Compression

Chaitin's incompleteness theorem

Minimum message length

Kolmogorov randomness

See also

Notes

References

External links

Look at other dictionaries:

Share the article and excerpts

Academic Dictionaries and Encyclopedias

Wikipedia

Kolmogorov complexity

Contents

Definition

History and context

Basic results

Incomputability of Kolmogorov complexity

Chain rule for Kolmogorov complexity

Compression

Chaitin's incompleteness theorem

Minimum message length

Kolmogorov randomness

See also

Notes

References

External links

Look at other dictionaries:

Share the article and excerpts

Direct link