Delta encoding

Delta encoding

Delta encoding is a way of storing or transmitting data in the form of differences between sequential data rather than complete files; more generally this is known as data differencing. Delta encoding is sometimes called delta compression, particularly where archival histories of changes are required (e.g., in software projects).

The differences are recorded in discrete files called "deltas" or "diffs", after the Unix file comparison utility, diff. Because changes are often small – for example, changing a few words in a large document, or changing a few records in a large table – delta encoding greatly reduces data redundancy. Collections of unique deltas are substantially more space-efficient than their non-encoded equivalents.

From a logical point of view the difference between two data values is the information required to obtain one value from the other – see relative entropy. The difference between identical values (under some equivalence) is often called 0 or the neutral element. A good delta should be minimal,[says who?] or ambiguous unless one element of a pair is present.[citation needed]

Contents

Simple example

Perhaps the simplest example is storing values of bytes as differences (deltas) between sequential values, rather than the values themselves. So, instead of 2, 4, 6, 9, 7, we would store 2, 2, 2, 3, −2. This is not very useful when used alone, but it can help further compression of data in which sequential values occur often. IFF 8SVX sound format applies this encoding to raw sound data before applying compression to it. Unfortunately, not even all 8-bit sound samples compress better when delta encoded, and the usability of delta encoding is even smaller for 16-bit and better samples. Therefore, compression algorithms often choose to delta encode only when the compression is better than without. However, in video compression delta frames can considerably reduce frame size, and are used in virtually every video compression codec.

Definition

A delta can be defined in 2 ways, symmetric delta and directed delta. A symmetric delta can be expressed as:

Δ(v1, v2) = (v1 \ v2) U (v2 \ v1),

where v1 and v2 represent two successive versions.

A directed delta, also called a change, is a sequence of (elementary) change operations which, when applied to one version v1, yields another version v2 (note the correspondence to transaction logs in databases).

Variants

A variation of delta encoding which encodes differences between the prefixes or suffixes of strings is called incremental encoding. It is particularly effective for sorted lists with small differences between strings, such as a list of words from a dictionary.

Implementation issues

The nature of the data to be encoded influences the effectiveness of a particular compression algorithm. For example, in sending updates to the Google Chrome browser, a specialized algorithm is used based on knowledge of the archive and executable format.[1][2]

Delta encoding performs best when data has small or constant variation; for an unsorted data set, there may be little to no compression possible with this method.

In delta encoded transmission over a network where only a single copy of the file is available at each end of the communication channel, special error control codes are used to detect which parts of the file have changed since its previous version. For example, rsync uses a rolling checksum algorithm based on Mark Adler's adler-32 checksum.

Sample C code

The following C code performs a simple form of delta encoding and decoding:

void
delta_encode (char *buffer, const unsigned int length)
{
  char delta = 0;
  char original;
  unsigned int i;
  for (i = 0; i < length; ++i)
    {
      original = buffer[i];
      buffer[i] -= delta;
      delta = original;
    }
}
 
void
delta_decode (char *buffer, const unsigned int length)
{
  char delta = 0;
  unsigned int i;
  for (i = 0; i < length; ++i)
    {
      buffer[i] += delta;
      delta = buffer[i];
    }
}

Examples

Delta encoding in HTTP

Another instance of use of delta encoding is RFC 3229, "Delta encoding in HTTP", which proposes that HTTP servers should be able to send updated Web pages in the form of differences between versions (deltas), which should decrease Internet traffic, as most pages change slowly over time, rather than being completely rewritten repeatedly:

This document describes how delta encoding can be supported as a compatible extension to HTTP/1.1.

Many HTTP (Hypertext Transport Protocol) requests cause the retrieval of slightly modified instances of resources for which the client already has a cache entry. Research has shown that such modifying updates are frequent, and that the modifications are typically much smaller than the actual entity. In such cases, HTTP would make more efficient use of network bandwidth if it could transfer a minimal description of the changes, rather than the entire new instance of the resource.

Online backup

Many of the online backup services adopt this methodology, often known simply as deltas, in order to give their users previous versions of the same file from previous backups. This reduces associated costs, not only in the amount of data that has to be stored as differing versions (as the whole of each changed version of a file has to be offered for users to access), but also those costs in the uploading (and sometimes the downloading) of each file that has been updated (by just the smaller delta having to be used, rather than the whole file).

VCDIFF

One general format for delta encoding is VCDIFF, described in RFC 3284. Free software implementations include Xdelta and open-vcdiff.

GDIFF

Generic Diff Format (GDIFF) is another delta encoding. It is submitted to W3C in 1997.[3] In many cases, VCDIFF has better compression rate than GDIFF.

Diff

Diff is a file comparison program, which is mainly used for text files.

bsdiff

Bsdiff is a binary diff program using suffix sorting.

See also

References

External links


Wikimedia Foundation. 2010.

Игры ⚽ Поможем решить контрольную работу

Look at other dictionaries:

  • Delta encoding — Saltar a navegación, búsqueda Delta encoding también conocido como Delta compression o differential compression , es una técnica de compresión de archivos que se sale de los paradigmas convencionales de compresión, ésta se basa en la compresión… …   Wikipedia Español

  • Delta — commonly refers to: Delta (letter), Δ or δ in the Greek alphabet, also used as a mathematical symbol River delta, a landform at the mouth of a river Delta Air Lines, a major U.S. airline Delta may also refer to: Contents 1 Places …   Wikipedia

  • Delta-sigma modulation — Delta sigma (ΔΣ; or sigma delta, ΣΔ) modulation is a method for encoding high resolution or analog signals into lower resolution digital signals. The conversion is done using error feedback, where the difference between the two signals is… …   Wikipedia

  • Incremental encoding — Incremental encoding, also known as front compression, back compression, or front coding, is a type of delta encoding compression algorithm whereby common prefixes or suffixes and their lengths are recorded so that they need not be duplicated.… …   Wikipedia

  • Run-length encoding — (RLE) is a very simple form of data compression in which runs of data (that is, sequences in which the same data value occurs in many consecutive data elements) are stored as a single data value and count, rather than as the original run. This is …   Wikipedia

  • Binary delta compression — (BDC) is a technology used in software deployment for distributing patches. Explanation Downloading large amounts of data over the internet for software updates can induce high network traffic problems, especially when a network of computers is… …   Wikipedia

  • Byte pair encoding — or digram coding[1] is a simple form of data compression in which the most common pair of consecutive bytes of data is replaced with a byte that does not occur within that data. A table of the replacements is required to rebuild the original data …   Wikipedia

  • Entropy encoding — In information theory an entropy encoding is a lossless data compression scheme that is independent of the specific characteristics of the medium. One of the main types of entropy coding creates and assigns a unique prefix free code to each… …   Wikipedia

  • Run-length encoding — Le run length encoding, appelé en français le codage par plages, est un algorithme de compression de données en informatique. Le système s applique essentiellement à des documents scannés en noir et blanc : au lieu de coder un bit par point …   Wikipédia en Français

  • Elias delta coding — Elias delta code is a universal code encoding the positive integers. To code a number: #Write it in binary. #Count the bits and write down that number of bits in binary (X). #Use the binary digit written in step 1 again, remove the leading bit… …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”