- Incremental encoding
Incremental encoding, also known as front compression, back compression, or front coding, is a type of
delta encoding compression algorithm whereby common prefixes or suffixes and their lengths are recorded so that they need not be duplicated. This algorithm is particularly well-suited for compressing sorted data, e.g., a list ofword s from adictionary .For example:
The encoding used to store the common prefix length itself varies from application to application. Typical techniques are storing the value as a single byte;
delta encoding , which store only the change in the common prefix length; and various universal codes. It may be combined with other generallossless data compression techniques such asentropy encoding anddictionary coder s to compress the remaining suffixes.Applications
Incremental encoding is widely used in information retrieval to compress the lexicons used in search indexes; these list all the words found in all the documents and a pointer for each one to a list of locations. Typically, it compresses these indexes by about 40%. [Ian H. Witten, Alistair Moffat, Timothy C. Bell. Managing Gigabytes. Second edition. Academic Press. ISBN 1-55860-5703. Section 4.1: Accessing the lexicon, subsection Front coding, pp.159–161.]
As one example, incremental encoding is used as a starting point by the
GNU locate utility, in an index of filenames and directories. TheGNU locate utility further usesbigram encoding to further shorten popular filepath prefixes.References
Wikimedia Foundation. 2010.