Variable-length code

Variable-length code

In coding theory a variable-length code is a code which maps source symbols to a "variable" number of bits.

Variable-length codes can allow sources to be compressed and decompressed with "zero" error (lossless data compression) and still be read back symbol by symbol. With the right coding strategy an i.i.d. source may be compressed almost arbitrarily close to its entropy. This is in contrast to fixed length coding methods, for which data compression is only possible for large blocks of data, and any compression beyond the logarithm of the total number of possibilities comes with a finite (though perhaps arbitrarily small) probability of failure.

Some examples of well-known variable-length coding strategies are Huffman coding, Lempel-Ziv coding and arithmetic coding.

Extension of a code

The extension of a code is the mapping of finite length source sequences to finite length bit strings, that is obtained by concatenating for each symbol of the source sequence the corresponding codeword produced by the original code.

Classes of variable-length codes

Variable-length codes can be strictly nested in order of decreasing generality as non-singular, uniquely decodable and instantaneous (prefix free). Instantaneous codes are always uniquely decodable, which in turn are always non-singular :

Non-singular codes

A code is non-singular if each source symbol is mapped to a different non-empty bit string, i.e. the mapping from source symbols to bit strings is one-to-one.
* For example the mapping "M1" = {(a,0), (b,0), (c,1)} is not non-singular because both "a" and "b" map to the same bit string "0" ; any extension of this mapping will generate a lossy (non-lossless) coding. Such singular coding may still be useful when some loss of information is acceptable (for example when such code is used in audio or video compression, where a lossy coding becomes equivalent to source quantification).
* However, the mapping "M2" = {(a,0), (b,1), (c,00), (d,01)} is non-singular ; its extension will generate a lossless coding, which will be useful for general data transmission (but this feature is not always required). Note that it is not necessary for the non-singular code to be more compact than the source (and in many applications, a larger code is useful, for example as a way to detect and/or recover from encoding or transmission errors, or in security applications to protect a source from undetectable tamperring).

Uniquely decodable codes

A code is uniquely decodable if its extension is non-singular.
* The non-singular example mapping "M2" in the previous paragraph is not uniquely decodable because (for example) the source sequence "aa" maps to bit string "00" using the extension, exactly like the source sequence "c". However, such a code is useful when the set of all possible source symbols is completely known and finite, or when there are restrictions (for example a formal syntax) that determine if source elements of this extension are acceptable. Such restrictions permit the decoding of the original message by checking which of the possible source symbols mapped to the same symbol are valid under those restrictions.
* The extension of the mapping "M3" = {(a,0), (b,01), (c,011)} is uniquely decodable (this can be demonstrated by looking at the "follow-set" after each target bit string in the map, because each bitstring is terminated as soon as we see a 0 bit which cannot follow any existing code to create a longer valid code in the map, but unambiguously starts a new code).

Instantaneous codes

A code is instantaneous (also said "context-free") if no target bit string in the mapping is a prefix of the target bit string of a different source symbol in the same mapping. This means that symbols can be decoded instantaneously after their entire codeword is received.
* The example mapping "M3" in the previous paragraph is not instantaneous because we don't know after reading the bit string "0" if it encodes a "a" source symbol, or if it is the prefix of the encodings of the "b" or "c" symbols.
* An example of an instantaneous variable-length code is shown below.
*::
*: The source is one of four symbols: a, b,c or d. The binary codeword for each symbol is given.
*: The encoding and decoding of a portion of an example source sequence is given below:
*::

Advantages

The advantage of a variable-length code is that unlikely source symbols can be assigned longer codewords and likely source symbols can be assigned shorter codewords, thus giving a low "expected" codeword length. For the above example, if the probabilities of (a, b, c, d) were left(frac{1}{2}, frac{1}{4}, frac{1}{8}, frac{1}{8} ight), the expected number of bits used to represent a source symbol using the code above would be::: 1 imesfrac{1}{2}+2 imesfrac{1}{4}+3 imesfrac{1}{8}+3 imesfrac{1}{8}=frac{7}{4}.As the entropy of this source is 1.75 bits per symbol, this code compresses the source as much as possible so that the source can be recovered with "zero" error.


Wikimedia Foundation. 2010.

Игры ⚽ Нужно сделать НИР?

Look at other dictionaries:

  • Universal Variable Length Code — Contenido 1 Definición 2 Proceso 2.1 Esquema general 2.2 Funcionamiento 3 Véase también …   Wikipedia Español

  • Variable-length quantity — A variable length quantity (VLQ) is a universal code that uses an arbitrary number of binary octets (eight bit bytes) to represent an infinitely large integer. It was defined for use in the standard MIDI file format [ [http://www.borg.com/… …   Wikipedia

  • Code — redirects here. CODE may also refer to Cultural Olympiad Digital Edition. Decoded redirects here. For the television show, see Brad Meltzer s Decoded. For code (computer programming), see source code. For other uses, see Code (disambiguation).… …   Wikipedia

  • Context-adaptive variable-length coding — Le Context adaptive variable length coding ou CAVLC est une forme de codeur entropique à longueur variable utilisé dans la norme vidéo H.264 ou MPEG 4 AVC. Il fait partie des techniques de compression sans perte, c est à dire qu à partir du code… …   Wikipédia en Français

  • Variable-width encoding — This article is about the storage of text in computers. For the transmission of data across noisy channels, see variable length code. A variable width encoding is a type of character encoding scheme in which codes of differing lengths are used to …   Wikipedia

  • Code 93 — WIKIPEDIA encoded in Code 93 Code 93 is a barcode symbology designed in 1982 by Intermec to provide a higher density and data security enhancement to Code 39. It is an alphanumeric, variable length symbology. Code 93 is used primarily by Canada… …   Wikipedia

  • Code 39 — A Code 39 Barcode Label WIKIPEDIA encoded in Code 39 Code 39 (also known as Alpha39, Code 3 of 9, Code 3/9, Type 39 …   Wikipedia

  • Prefix code — A prefix code is a code, typically a variable length code, with the prefix property : no code word is a prefix of any other code word in the set. A code with code words {0, 10, 11} has the prefix property; a code consisting of {0, 1, 10, 11} does …   Wikipedia

  • Canonical Huffman code — A canonical Huffman code is a particular type of Huffman code which has the property that it can be very compactly described.Data compressors generally work in one of two ways. Either the decompressor can infer what codebook the compressor has… …   Wikipedia

  • Binary code (computing) — [ ASCII binary.] Binary code is the system of representing text or computer processor instructions by the use of a two digit number system. This system is composed of only the number zero, representing the off state, and the number one,… …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”