- Reed–Solomon error correction
Reed-Solomon error correction is an
error-correcting code that works byoversampling apolynomial constructed from the data. The polynomial is evaluated at several points, and these values are sent or recorded. Sampling the polynomial more often than is necessary makes the polynomial over-determined. As long as it receives "many" of the points correctly, the receiver can recover the original polynomial even in the presence of a "few" bad points.Reed-Solomon codes are used in a wide variety of commercial applications, most prominently in
CD s,DVD s andBlu-ray Disc s, in data transmission technologies such asDSL &WiMAX , and broadcast systems such as DVB and ATSC.Overview
Reed-Solomon codes are
block code s. This means that a fixed block of input data is processedinto a fixed block of output data. In the case of the most commonly used R-S code (255, 223) – 223 Reed-Solomon input symbols (each eight bits long) are encoded into 255 output symbols.
* Most R-S ECC schemes are systematic. This means that some portion of the output codeword contains the input data in its original form.
* A Reed-Solomon symbol size of eight bits was chosen because the decoders for larger symbol sizes would be difficult to implement with current technology. This design choice forces the longest codeword length to be 255 symbols.
* The standard (255, 223) Reed-Solomon code is capable of correcting up to 16 Reed-Solomon symbol errors in each codeword. Since each symbol is actually eight bits, this means that the code can correct up to 16 short bursts of error due to the inner convolutional decoder.The Reed-Solomon code, like the
convolutional code , is a transparent code. This means that ifthe channel symbols have been inverted somewhere along the line, the decoders will stilloperate. The result will be the complement of the original data. However, the Reed-Solomoncode loses its transparency if virtual zero fill is used. For this reason it is mandatory that thesense of the data (i.e., true or complemented) be resolved before Reed-Solomon decoding.In the case of the
Voyager program , R-S codes reach near optimal performance when concatenated with the (7, 1/2) convolutional (Viterbi) inner code. Since two check symbols are required for each error to be corrected, this results in a total of 32 check symbols and 223 information symbols per codeword.In addition, the Reed-Solomon codewords can be interleaved on a symbol basis before beingconvolutionally encoded. Since this separates the symbols in a codeword, it becomes less likelythat a burst from the Viterbi decoder disturbs more than one Reed-Solomon symbol in any onecodeword.
Definition
Overview
The key idea behind a Reed-Solomon code is that the data encoded is first visualized as a
polynomial . The code relies on atheorem fromalgebra that states that any "k" distinct points "uniquely" determine a polynomial of degree at most "k"-1.The sender determines a degree k-1 polynomial, over a
finite field , that represents the k data points. The polynomial is then "encoded" by its evaluation at various points, and these values are what is actually sent. During transmission, some of these values may become corrupted. Therefore, more than "k" points are actually sent. As long as sufficient values are received correctly, the receiver can deduce what the original polynomial was, and hence decode the original data.In the same sense that one can correct a curve by interpolating past a gap, a Reed-Solomon code can bridge a series of errors in a block of data to recover the coefficients of the polynomial that drew the original curve.
Mathematical formulation
Given a
finite field "F" andpolynomial ring "F" ["x"] , let "n" and "k" be chosen such that 1 ≤ "k" ≤ "n" ≤ | "F" |. Pick "n" distinct elements of "F", denoted { "x"1, "x"2, ... , "x"n }. Then, the codebook C is created from the tuplets of values obtained by evaluating every polynomial (over "F") of degree less than "k" at each "x"i; that is,: mathbf{C} = left{ left( f(x_1), f(x_2), ..., f(x_n) ight)| f in F [x] , { m deg}(f)
C is a ["n", "k", "n"-"k"+1] code; in other words, it is a
linear code of length "n" (over "F") with dimension "k" and minimumHamming distance "n"-"k"+1.A "Reed-Solomon code" is a code of the above form, subject to the additional requirement that the set x_1, x_2, ..., x_n} must be the set of "all" non-zero elements of the field F (and therefore, n=|F|-1).
Remarks
For practical uses of Reed-Solomon codes, it is common to use a finite field F with 2^m elements. In this case, each symbol can be represented as an m-bit value. The sender sends the data points as encoded blocks, and the number of symbols in the encoded block is n = 2^m - 1. Thus a Reed-Solomon code operating on 8-bit symbols has n = 2^8 - 1 = 255 symbols per block. (This is a very popular value because of the prevalence of
byte-oriented computer systems.) The number k, with k < n, of "data" symbols in the block is a design parameter. A commonly used code encodes k = 223 eight-bit data symbols plus 32 eight-bit parity symbols in an n = 255-symbol block; this is denoted as a n, k) = (255,223) code, and is capable of correcting up to 16 symbol errors per block.The set x_1,...,x_n} of non-zero elements of a finite field can be written as 1,alpha,alpha^2,...,alpha^{n-1}}, where alpha is a primitive "n"th root of unity. It is customary to encode the values of a Reed-Solomon code in this order. Since alpha^n=1, and since for every polynomial p(x), the function p(alpha x) is also a polynomial of the same degree, it then follows that a Reed-Solomon code is cyclic.
Reed-Solomon codes as BCH codes
Reed-Solomon codes are a special case of a larger class of codes called
BCH code s. An efficient error correction algorithm for BCH codes (and therefore Reed-Solomon codes) was discovered byBerlekamp in 1968.To see that Reed-Solomon codes are special BCH codes, it is useful to give the following alternate definition of Reed-Solomon codes. [R. Lidl and G. Pilz. Applied Abstract Algebra, 2nd edition. Wiley, 1999, p.226.]
Given a
finite field F of size q, let n=q-1 and let alpha be a primitive nth root of unity in F. Also let 1leq kleq n be given. The "Reed-Solomon code" for these parameters has code word f_0,f_1,...,f_{n-1}) if and only if alpha,alpha^2,...,alpha^{n-k} are roots of the polynomial:p(x) = f_0 + f_1x + ... + f_{n-1}x^{n-1}.With this definition, it is immediately seen that a Reed-Solomon code is a
polynomial code , and in particular aBCH code . The generator polynomial g(x) is the minimal polynomial with roots alpha,alpha^2,...,alpha^{n-k}, and the code words are exactly the polynomials that are divisible by g(x).Equivalence of the two formulations
At first sight, the above two definitions of Reed-Solomon codes seem very different. In the first definition, code words are "values" of polynomials, whereas in the second, they are "coefficients". Moreover, the polynomials in the first definition are required to be of small degree, whereas those in the second definition are required to have specific roots.
The equivalence of the two definitions is proved using the discrete Fourier transform. This transform, which exists in all finite fields as well as the complex numbers, establishes a duality between the coefficients of polynomials and their values. This duality can be approximately summarized as follows: Let p(x) and q(x) be two polynomials of degree less than n. If the "values" of p(x) are the "coefficients" of q(x), then (up to a scalar factor and reordering), the "values" of q(x) are the "coefficients" of p(x). For this to make sense, the values must be taken at locations x=alpha^i, for i=0,...,n-1, where alpha is a primitive nth root of unity.
To be more precise, let:p(x) = v_0 + v_1x + v_2x^2 + ... + v_{n-1}x^{n-1} and:q(x) = f_0 + f_1x + f_2x^2 + ... + f_{n-1}x^{n-1},and assume p(x) and q(x) are related by the discrete Fourier transform. Then the coefficients and values of p(x) and q(x) are related as follows: for all i=0,...,n-1, f_i=p(alpha^i) and v_i=frac{1}{n}q(alpha^{n-i}).
Using these facts, we have:
*f_0,ldots,f_{n_1}) is a code word of the Reed-Solomon code according to the first definition
* if and only if p(x) is of degree less than k (because f_0,ldots,f_{n_1} are the values of p(x)),
* if and only if v_i=0 for i=k,...,n-1,
* if and only if q(alpha^i)=0 for i=1,...,n-k (because q(alpha^i)=nv_{n-i}),
* if and only if f_0,ldots,f_{n_1}) is a code word of the Reed-Solomon code according to the second definition.This shows that the two definitions are equivalent.
Properties of Reed-Solomon codes
The error-correcting ability of any Reed-Solomon code is determined by n - k, the measure of redundancy in the block. If the locations of the errored symbols are not known in advance, then a Reed–Solomon code can correct up to n-k)/2 erroneous symbols, i.e., it can correct half as many errors as there are redundant symbols added to the block. Sometimes error locations are known in advance (e.g., “side information” in
demodulator signal-to-noise ratio s)—these are called erasures. A Reed–Solomon code (like anylinear code ) is able to correct twice as many erasures as errors, and any combination of errors and erasures can be corrected as long as the inequality 2E + S < n - k is satisfied, where E is the number of errors and S is the number of erasures in the block.The properties of Reed-Solomon codes make them especially well-suited to applications where errors occur in bursts. This is because it does not matter to the code how many bits in a symbol are in error—if multiple bits in a symbol are corrupted it only counts as a single error. Conversely, if a data stream is not characterized by error bursts or drop-outs but by random single bit errors, a Reed-Solomon code is usually a poor choice.
Designers are not required to use the "natural" sizes of Reed-Solomon code blocks. A technique known as “shortening” can produce a smaller code of any desired size from a larger code. For example, the widely used (255,223) code can be converted to a (160,128) code by padding the unused portion of the block (usually the beginning) with 95 binary zeroes and not transmitting them. At the decoder, the same portion of the block is loaded locally with binary zeroes. The
compact disc is an example of an application of shortened Reed-Solomon codes.In 1999
Madhu Sudan andVenkatesan Guruswami at MIT, published “Improved Decoding of Reed-Solomon and Algebraic-Geometry Codes” introducing an algorithm that allowed for the correction of errors beyond half the minimum distance of the code. It applies to Reed–Solomon codes and more generally toalgebraic geometry codes . This algorithm produces a list of codewords (it is alist-decoding algorithm) and is based on interpolation and factorization of polynomials over GF(2^m) and its extensions.History
The code was invented in 1960 by
Irving S. Reed andGustave Solomon , who were then members ofMIT Lincoln Laboratory . Their seminal article was entitled "Polynomial Codes over Certain Finite Fields." When it was written, digital technology was not advanced enough to implement the concept. The first application, in 1982, of RS codes in mass-produced products was thecompact disc , where two interleaved RS codes are used. An efficient decoding algorithm for large-distance RS codes was developed byElwyn Berlekamp andJames Massey in 1969. Today RS codes are used inhard disk drive ,DVD , telecommunication, and digital broadcast protocols.Applications
Data storage
Reed-Solomon coding is very widely used in mass storage systems to correctthe burst errors associated with media defects.
Reed-Solomon coding is a key component of the
compact disc . It was the first use of strong error correction coding in a mass-produced consumer product, and DAT andDVD use similar schemes. In the CD, two layers of Reed-Solomon coding separated by a 28-wayconvolution al interleaver yields a scheme called Cross-Interleaved Reed Solomon Coding (CIRC). The first element of a CIRC decoder is a relatively weak inner (32,28) Reed-Solomon code, shortened from a (255,251) code with 8-bit symbols. This code can correct up to 2 byte errors per 32-byte block. More importantly, it flags as erasures any uncorrectable blocks, i.e., blocks with more than 2 byte errors. The decoded 28-byte blocks, with erasure indications, are then spread by the deinterleaver to different blocks of the (28,24) outer code. Thanks to the deinterleaving, an erased 28-byte block from the inner code becomes a single erased byte in each of 28 outer code blocks. The outer code easily corrects this, since it can handle up to 4 such erasures per block.The result is a CIRC that can completely correct error bursts up to 4000 bits, or about 2.5 mm on the disc surface. This code is so strong that most CD playback errors are almost certainly caused by tracking errors that cause the laser to jump track, not by uncorrectable error bursts. [K.A.S. Immink, "Reed-Solomon Codes and the Compact Disc" in S.B. Wicker and V.K. Bhargava, Edrs, "Reed-Solomon Codes and Their Applications",
IEEE Press , 1994.]Another product which incorporates Reed–Solomon coding is
Nintendo 'se-Reader . This is avideo-game delivery system which uses a two-dimensional "barcode " printed on trading cards. The cards are scanned using a device which attaches to Nintendo'sGame Boy Advance game system.Reed-Solomon error correction is also used in
parchive files which are commonly posted accompanying multimedia files onUSENET . Distributed online storage service, [http://wua.la/en Wuala] , also makes use of Reed-Solomon when breaking up files.Data transmission
Specialized forms of Reed-Solomon codes specifically Cauchy-RS and Vandermonde-RS can be used to overcome the unreliable nature of data transmission over erasure channels. The encoding process assumes a code of RS(N,K) which results in N codewords of length N symbols each storing K symbols of data, being generated, that are then sent over an erasure channel.
Any combination of K codewords received at the other end is enough to reconstruct all of the N codewords. The code rate is generally set to 1/2 unless the channel's erasure likelihood can be adequately modelled and is seen to be less. In conclusion N is usually 2K, meaning that at least half of all the codewords sent must be received in order to reconstruct all of the codewords sent.
Reed-Solomon codes are also used in
xDSL systems andCCSDS 'sSpace Communications Protocol Specifications as a form ofForward Error Correction .Mail encoding
Paper bar codes such as
PostBar ,MaxiCode ,Datamatrix andQR_Code use Reed–Solomon error correction to allow correct reading even if a portion of the bar code is damaged.atellite transmission
One significant application of Reed–Solomon coding was to encode the digital pictures sent back by the Voyager space probe.
Voyager introduced Reed–Solomon coding in conjunction with ML
convolutional code s, a practice that has since become very widespread in deep space and satellite (e.g., direct digital broadcasting) communications.Viterbi decoder s tend to produce errors in short bursts. Correcting these burst errors is a job best done by short or simplified Reed-Solomon codes.Modern versions of concatenated Reed-Solomon/Viterbi-decoded convolutional coding were and are used on the
Mars Pathfinder , Galileo,Mars Exploration Rover and Cassini missions, where they perform within about 1–1.5 dB of the ultimate limit imposed by theShannon capacity .These concatenated codes are now being replaced by more powerful
turbo code s where the transmitted data does not need to be decoded immediately.ketch of the error correction algorithm
The following is a sketch of the main idea behind the error correction algorithm for Reed-Solomon codes.
By definition, a code word of a Reed-Solomon code is given by the sequence of values of a low-degree polynomial over a
finite field . A key fact for the error correction algorithm is that the "values" and the "coefficients" of a polynomial are related by the discrete Fourier transform.The purpose of a Fourier transform is to convert a signal from a
time domain to afrequency domain or vice versa. In case of the Fourier transform over a finite field, the frequency domain signal corresponds to the coefficients of a polynomial, and the time domain signal correspond to the values of the same polynomial.As shown in Figures 1 and 2, an isolated value in the frequency domain corresponds to a smooth wave in the time domain. The wavelength depends on the location of the isolated value. Conversely, as shown in Figures 3 and 4, an isolated value in the time domain corresponds to a smooth wave in the frequency domain.
In a Reed-Solomon code, the frequency domain is divided into two regions as shown in Figure 5: a left (low-frequency) region of length k, and a right (high-frequency) region of length n-k. A data word is then embedded into the left region (corresponding to the k coefficients of a polynomial of degree at most k-1), while the right region is filled with zeros. The result is Fourier transformed into the time domain, yielding a code word that is composed only of low frequencies. In the absence of errors, a code word can be decoded by reverse Fourier transforming it back into the frequency domain.
Now consider a code word containing a single error, as shown in Figure 6. The effect of this error in the frequency domain is a smooth, single-frequency wave in the right region, called the "syndrome" of the error. The error location can be determined by determining the frequency of the syndrome signal.
Similarly, if two or more errors are introduced in the code word, the syndrome will be a signal composed of two or more frequencies, as shown in Figure 7. As long as it is possible to determine the frequencies of which the syndrome is composed, it is possible to determine the error locations. Notice that the error "locations" depend only on the "frequencies" of these waves, whereas the error "magnitudes" depend on their amplitudes and phase.
The problem of determining the error locations has therefore been reduced to the problem of finding, given a sequence of n-k values, the smallest set of elementary waves into which these values can be decomposed. It is known from
digital signal processing that this problem is equivalent to finding the roots of the minimal polynomial of the sequence, or equivalently, of finding the shortestlinear feedback shift register (LFSR) for the sequence. The latter problem can either be solved inefficiently by solving a system of linear equations, or more efficiently by theBerlekamp-Massey algorithm .ee also
*
Forward error correction
*BCH code
*Low-density parity-check code
*Chien search
*Datamatrix References
* F. J. MacWilliams and N. J. A. Sloane, The Theory of Error-Correcting Code, New York: North-Holland Publishing Company, 1977.
* Irving S. Reed and Xuemin Chen, Error-Control Coding for Data Networks", Boston: Kluwer Academic Publishers, 1999.External links
* [http://www.schifra.com Schifra Open Source C++ Reed-Solomon Codec]
* [http://www.radionetworkprocessor.com/reed-solomon.html A collection of links to books, online articles and source code]
* [http://rscode.sourceforge.net/ Henry Minsky's RSCode library, Reed-Solomon encoder/decoder]
* [http://www.cs.utk.edu/%7Eplank/plank/papers/SPE-9-97.html A Tutorial on Reed-Solomon Coding for Fault-Tolerance in RAID-like Systems]
* [http://runtime-basic.net/Projekt:ReedSolomon A free tool for testing the Reed-Solomon Algorithm (German)]
* An application note from 4i2i on some [http://www.4i2i.com/reed_solomon_codes.htm specific implementations]
* A Tutorial from [http://www.highlandcomm.com Highland Communications Technologies] on [http://www.highlandcomm.com/reed_solomon_codes.htm Reed-Solomon codes.]
* A thesis on [http://sidewords.files.wordpress.com/2007/12/thesis.pdf Algebraic soft-decoding of Reed-Solomon codes] . It explains the basics as well.
* [http://dept.ee.wits.ac.za/~versfeld/research_resources/sourcecode/Errors_And_Erasures_Test.zip Matlab implementation of errors-and-erasures decoding]
Wikimedia Foundation. 2010.