# Golomb coding

Golomb coding

Golomb coding is a data compression scheme invented by Solomon W. Golomb in the 1960s. The scheme is based on entropy encoding and is optimal (in the sense of Shannon's source coding theorem) for alphabets following a geometric distribution, making it highly suitable for situations in which the occurrence of small values in the input stream is significantly more likely than large values.

The scheme is a generalisation of the earlier scheme Rice coding (invented by Robert F. Rice), although each scheme was invented independently. Specifically, a Rice code is equivalent to a Golomb code in which the tunable parameter is a power of two. This makes Rice codes convenient for use on a computer, since multiplication and division by 2 can be implemented using a bitshift operation, which can be performed extremely quickly. Similarly, calculating the corresponding remainder can be achieved using a simple bitmask operation, which is equally fast.

Rice coding is used as the entropy encoding stage in a number of lossless image compression and audio data compression methods.

Overview

Preparation

Golomb's scheme was designed to encode sequences of positive numbers. However it is easily extended to accept sequences containing negative numbers using an "overlap and interleave" scheme, in which all values are re-assigned to some positive number in a unique and reversible way. The sequence begins: 0, -1, 1, -2, 2, -3, 3, -4, 4 ... The nth negative value (ie -n) is mapped to the nth odd number (2n-1), and the mth positive value is mapped to the mth even number (2m). This may be expressed mathematically as follows: a positive value $x$ is mapped to ($x^prime=2|x|=2x, xge0$), and a negative value $y$ is mapped to ($y^prime=2|y|-1=-2y-1, y<0$).

Construction of codes

Golomb coding uses a tunable parameter "M" to divide an input value into two parts: $q$, the result of a division by "M", and $r$, the remainder. The quotient is sent in unary coding, followed by the remainder in truncated binary encoding. When $M=1$ Golomb coding is equivalent to unary coding.

Golomb-Rice codes can be thought of as codes that indicate a number by the position of the "bin" ("q"), and the "offset" within the bin ("r"). The above figure shows the position "q", and offset "r" for the encoding of integer "N" using Golomb-Rice parameter "M".

Formally, the two parts are given by the following expression, where $x$ is the number being encoded:$q = left lfloor frac\left\{left \left(x-1 ight \right)\right\}\left\{M\right\} ight floor$ and $r = x-qM-1$The final result looks like: $left \left(q+1 ight \right) r$

Note that $r$ can be of a varying number of bits, and is specifically only "b" bits for Rice code,and switches between "b"-1 and "b" bits for Golomb code (i.e "M" is not a power of 2): Let $b = lceillog_2\left(M\right) ceil$. If $0 leq r < 2^b-M$, then use "b"-1 bits to encode "r". If $2^b-M leq r < M$, then use "b" bits to encode "r". Clearly, $b=log_2\left(M\right)$ if "M" is a power of 2 and we can encode all values of "r" with "b" bits.

The parameter "M" is a function of the corresponding Bernoulli process, which is parameterized by $p=P\left(X=0\right)$ the probability of success in a given Bernoulli Trial. $M$ and $p$ are related by these inequalities:

: $\left(1-p\right)^M + \left(1-p\right)^\left\{M+1\right\} leq 1 < \left(1-p\right)^\left\{M-1\right\} + \left(1-p\right)^M.$

The Golomb code for this distribution is equivalent to the Huffman code for the same probabilities, if it were possible to compute the Huffman code.

Simple algorithm

Note below that this is the Rice-Golomb encoding, where the remainder code uses simple truncated binary encoding, also named "Rice coding" (other varying-length binary encodings, like arithmetic or Huffman encodings, are possible for the remainder codes, if the statistic distribution of remainder codes is not flat, and notably when not all possible remainders after the division are used). In this algorithm, if the "M" parameter is a power of 2, it becomes equivalent to the simpler Golomb encoding.

# Fix the parameter "M" to an integer value.
# For "N", the number to be encoded, find
## quotient = "q" = int ["N"/"M"]
## remainder = "r" = "N"%"M"
# Generate Codeword
## The Code format : , where
## Quotient Code (in unary coding)
### Write a "q"-length string of 1 bits
### Write a 0 bit
## Remainder Code (in truncated binary encoding)
### If "M" is power of 2, code remainder as binary format. So $log_2\left(M\right)$ bits are needed. (Rice code)
### If "M" is not a power of 2, set $b = lceillog_2\left(M\right) ceil$
#### If $r < 2^b-M$ code "r" as plain binary using "b"-1 bits.
#### If $r ge 2^b-M$ code the number $r+2^b-M$ in plain binary representation using "b" bits.

Example

Set "M" = 10. Thus $b = lceillog_2\left(10\right) ceil = 4$. The cutoff is $2^b-M = 16-10 = 6$

For example, with a Rice-Golomb encoding of parameter "M"=10, the decimal number 42 would first be split into "q"=4,"r"=2, and would be encoded as qcode("q"),rcode("r") = qcode(4),rcode(2) = 11110,010 (you don't need to encode the separating comma in the output stream, because the 0 at the end of the "q" code is enough to say when "q" ends and "r" begins ; both the qcode and rcode are self-delimited).

Example code

Note: this basic code assumes that the M parameter is a power of 2; it does not implement the case where truncated bit encoding of division remainders will be preferable (when M is not a power of 2, like in the previous example).

Encoding

void golombEncode(char* source, char* dest, int M) { IntReader intreader(source); BitWriter bitwriter(dest); while(intreader.hasLeft()) { int num = intreader.getInt(); int q = num / M; for (int i = 0 ; i < q; i++) bitwriter.putBit(true); // write q ones bitwriter.putBit(false); // write one zero int v = 1; for (int i = 0 ; i < log2(M); i++) { bitwriter.putBit( v & num ); v = v << 1; } } bitwriter.close(); intreader.close(); }

Decoding

void golombDecode(char* source, char* dest, int M) { BitReader bitreader(source); IntWriter intwriter(dest); int q = 0; int nr = 0; while (bitreader.hasLeft()) { nr = 0; q = 0; while (bitreader.getBit()) q++; // potentially dangerous with malformed files. for (int a = 0; a < log2(M); a++) // read out the sequential log2(M) bits if (bitreader.getBit()) nr += 1 << a; nr += q*M; // add the bits and the multiple of M intwriter.putInt(nr); // write out the value } bitreader.close(); intwriter.close(); }

Use for run-length encoding

Given an alphabet of two symbols, or a set of two events, "P" and "Q", with probabilities "p" and (1 − "p") respectively, where "p" &ge; 1/2, Golomb coding can be used to encode runs of zero or more "P"'s separated by single "Q"'s. In this application, the best setting of the parameter "M" is the nearest integer to $frac\left\{-1\right\}\left\{log_\left\{2\right\}p\right\}$. When "p" = 1/2, "M" = 1, and the Golomb code corresponds to binary ("n" &ge; 0 ones followed by a zero codes for "n" "P"'s followed by a "Q").

Applications

The FLAC audio codec uses Rice coding to represent linear prediction residues (because these residues are modeled into a near-geometric distribution, with small residues being more frequent than large residues, and are then encoded with less bits using a predefined Rice-Golomb encoding, without requiring more complex variable-length encodings like Huffman or arithmetic codings). However, this assumption is wrong for a simple sinusoidal signal, because the differential residues create a sinusoidal signal whose values are not creating a geometric distribution (the highest and lowest residue values have similar high frequency of occurrences, only the median positive and negative residues occur less often).

Apple's ALAC audio codec uses modified Rice coding after its Adaptive FIR filter.

Rice coding is also chosen at MPEG-4 ALS as a residual coding.

Rice coding is also used in FELICS.

The Golomb-Rice coder is used in the entropy coding stage of Rice Algorithm based "lossless image codecs". One such experiment yields a compression ratio graph given below. See other entries in this category at the bottom of this page. in those compression, the progressive space differential data yields an alternating suite of positive and negative values around 0, which are remapped to positive-only integers (by doubling the absolute value and adding one if the input is negative), and then Rice-Golomb coding is applied by varying the divisor which remains small.

Note that in those results, the Rice coding may create very long sequences of one-bits for the quotient; for practical reasons, it is often necessary to limit the total run-length of one-bits, so a modified version of the Rice-Golomb encoding consists of replacing the long string of one-bits by encoding its length with a recursive Rice-Golomb encoding; this requires reserving some values in addition to the initial divisor "k" to allow the necessary distinction.

References

* Golomb, S.W. (1966). [http://urchin.earth.li/~twic/Golombs_Original_Paper/ , Run-length encodings. IEEE Transactions on Information Theory, IT--12(3):399--401 ]
* R. F. Rice (1971) and R. Plaunt, "Adaptive Variable-Length Coding for Efficient Compression of Spacecraft Television Data, " IEEE Transactions on Communications, vol. 16(9), pp. 889-897, Dec. 1971.
* R. F. Rice (1979), "Some Practical Universal Noiseless Coding Techniques, " Jet Propulsion Laboratory, Pasadena, California, JPL Publication 79--22, Mar. 1979.
* Witten, Ian Moffat, Alistair Bell, Timothy. "Managing Gigabytes: Compressing and Indexing Documents and Images." Second Edition. Morgan Kaufmann Publishers, San Francisco CA. 1999 ISBN 1-55860-570-3
* David Salomon. "Data Compression",ISBN 0-387-95045-1.

Wikimedia Foundation. 2010.

### Look at other dictionaries:

• Exponential-Golomb coding — An Exponential Golomb code (or just Exp Golomb code) of order k is a type of universal code, parameterized by a whole number k. To encode a nonnegative integer in an order k exp Golomb code, one can use the following method: # Take the number in… …   Wikipedia

• Solomon W. Golomb — Solomon Wolf Golomb (b. 1932 in Baltimore, Maryland) is a mathematician and engineer, a professor of electrical engineering at the University of Southern California best known to the general public and fans of mathematical games as the inventor… …   Wikipedia

• Huffman coding — Huffman tree generated from the exact frequencies of the text this is an example of a huffman tree . The frequencies and codes of each character are below. Encoding the sentence with this code requires 135 bits, as opposed of 288 bits if 36… …   Wikipedia

• Code exponentiel-Golomb — Le code exponentiel Golomb ou Exp Golomb code (en) d ordre k est un type de code universel, paramétrable par un nombre entier k. Ce code est souvent utilisé dans la compression de données en tant que codeur entropique, par exemple dans la norme… …   Wikipédia en Français

• Advanced Video Coding — H.264 Pour les articles homonymes, voir AVC. H.264, ou MPEG 4 AVC (Advanced Video Coding), est une norme de codage vidéo développée conjointement par l UIT T Q.6/SG16 Video Coding Experts Group (VCEG) ainsi que l ISO/CEI Moving Picture Experts… …   Wikipédia en Français

• Adaptive Huffman coding — (also called Dynamic Huffman coding) is an adaptive coding technique based on Huffman coding. It permits building the code as the symbols are being transmitted, having no initial knowledge of source distribution, that allows one pass encoding and …   Wikipedia

• Modified Huffman coding — is used in fax machines to encode black on white images (bitmaps). It combines the variable length codes of Huffman coding with the coding of repetitive data in run length encoding. External links Modified Huffman coding from UNESCO . Archived… …   Wikipedia

• NegaFibonacci coding — Numeral systems by culture Hindu Arabic numerals Western Arabic (Hindu numerals) Eastern Arabic Indian family Tamil Burmese Khmer Lao Mongolian Thai East Asian numerals Chinese Japanese Suzhou Korean Vietnamese …   Wikipedia

• Unary coding — is an entropy encoding that represents a natural number, n , with n − 1 ones followed by a zero. For example 5 is represented as 11110. Some representations use n − 1 zeros followed by a one. The ones and zeros are interchangeable without loss of …   Wikipedia

• Audio Lossless Coding — MPEG 4 Audio Lossless Coding, also known as MPEG 4 ALS, is an extension to the MPEG 4 audio standard to allow lossless audio compression. The extension was finalized in December 2005.MPEG 4 ALS is similar to FLAC in its operation. Simply put it… …   Wikipedia