 Computation of CRC

Computation of a cyclic redundancy check is derived from the mathematics of polynomial division, modulo two. In practice, it resembles long division of the binary message string, with a fixed number of zeroes appended, by the "generator polynomial" string except that exclusive OR operations replace subtractions. Division of this type is efficiently realised in hardware by a modified shift register, and in software by a series of equivalent algorithms, starting with simple code close to the mathematics and becoming faster (and arguably more obfuscated^{[1]}) through bytewise parallelism and spacetime tradeoffs.
Various CRC standards extend the polynomial division algorithm by specifying an initial shift register value, a final exclusive OR step and, most critically, a bit ordering (endianness). As a result, the code seen in practice deviates confusingly from "pure" division,^{[1]} and the register may shift left or right.
Contents
Example
For a discussion of polynomial division modulo two, see Mathematics of CRC.As an example of implementing polynomial division in hardware, suppose that we are trying to compute an 8bit CRC of an 8bit message made of the ASCII character "W", which is binary 01010111_{2}, decimal 87_{10}, or hexadecimal 57_{16}. For illustration, we will use the CRC8ATM (HEC) polynomial x^{8} + x^{2} + x + 1. Writing the first bit transmitted (the coefficient of the highest power of x) on the left, this corresponds to the 9bit string "100000111".
The byte value 57_{16} can be transmitted in two different orders, depending on the bit ordering convention used. Each one generates a different message polynomial M(x). Msbitfirst, this is x^{6} + x^{4} + x^{2} + x + 1 = 01010111, while lsbitfirst, it is x^{7} + x^{6} + x^{5} + x^{3} + x = 11101010. These can then be multiplied by x^{8} to produce two 16bit message polynomials x^{8}M(x).
Computing the remainder then consists of subtracting multiples of the generator polynomial G(x). This is just like decimal long division, but even simpler because the only possible multiples at each step are 0 and 1, and the subtractions borrow "from infinity" instead of reducing the upper digits. Because we do not care about the quotient, there is no need to record it.
Mostsignificant bit first Leastsignificant bit first 0 1 0 1 0 1 1 1 0 0 0 0 0 0 0 0 − 0 0 0 0 0 0 0 0 = 0 1 0 1 0 1 1 1 0 0 0 0 0 0 0 0 − 1 0 0 0 0 0 1 1 1 = 0 0 0 1 0 1 1 0 1 1 0 0 0 0 0 0 − 0 0 0 0 0 0 0 0 0 = 0 0 0 1 0 1 1 0 1 1 0 0 0 0 0 0 − 1 0 0 0 0 0 1 1 1 = 0 0 0 0 0 1 1 0 1 0 1 1 0 0 0 0 − 0 0 0 0 0 0 0 0 0 = 0 0 0 0 0 1 1 0 1 0 1 1 0 0 0 0 − 1 0 0 0 0 0 1 1 1 = 0 0 0 0 0 0 1 0 1 0 1 0 1 1 0 0 − 1 0 0 0 0 0 1 1 1 = 0 0 0 0 0 0 0 0 1 0 1 0 0 0 1 0 − 0 0 0 0 0 0 0 0 0 = 0 0 0 0 0 0 0 0 1 0 1 0 0 0 1 0 1 1 1 0 1 0 1 0 0 0 0 0 0 0 0 0 − 1 0 0 0 0 0 1 1 1 = 0 1 1 0 1 0 0 1 1 0 0 0 0 0 0 0 − 1 0 0 0 0 0 1 1 1 = 0 0 1 0 1 0 0 0 0 1 0 0 0 0 0 0 − 1 0 0 0 0 0 1 1 1 = 0 0 0 0 1 0 0 0 1 0 1 0 0 0 0 0 − 0 0 0 0 0 0 0 0 0 = 0 0 0 0 1 0 0 0 1 0 1 0 0 0 0 0 − 1 0 0 0 0 0 1 1 1 = 0 0 0 0 0 0 0 0 1 0 0 1 1 0 0 0 − 0 0 0 0 0 0 0 0 0 = 0 0 0 0 0 0 0 0 1 0 0 1 1 0 0 0 − 0 0 0 0 0 0 0 0 0 = 0 0 0 0 0 0 0 0 1 0 0 1 1 0 0 0 − 0 0 0 0 0 0 0 0 0 = 0 0 0 0 0 0 0 0 1 0 0 1 1 0 0 0 Observe that after each subtraction, the bits are divided into three groups: at the beginning, a group which is all zero; at the end, a group which is unchanged from the original; and a shaded group in the middle which is "interesting". The "interesting" group is 8 bits long, matching the degree of the polynomial. Every step, the appropriate multiple of the polynomial is subtracted to make the zero group becomes one bit longer, and the unchanged group becomes one bit shorter, until only the final remainder is left.
In the msbitfirst example, the remainder polynomial is x^{7} + x^{5} + x. Converting to a hexadecimal number using the convention that the highest power of x is the msbit; this is A2_{16}. In the lsbitfirst, the remainder is x^{7} + x^{4} + x^{3}. Converting to hexadecimal using the convention that the highest power of x is the lsbit, this is 19_{16}.
Implementation
Writing out the full message at each step, as done in the example above, is very tedious. Efficient implementations use an nbit shift register to hold only the interesting bits. Multiplying the polynomial by x is equivalent to shifting the register by one place, as the coefficients do not change in value but only move up to the next term of the polynomial.
Here is a first draft of some pseudocode for computing an nbit CRC. It uses a contrived composite data type for polynomials, where
x
is not an integer variable, but a constructor generating a Polynomial object that can be added, multiplied and exponentiated. Toxor
two polynomials is to add them, modulo two; that is, to exclusive OR the coefficients of each matching term from both polynomials.function crc(bit array bitString[1..len], int len) { remainderPolynomial := polynomialForm(bitString[1..n]) // First n bits of the message // A popular variant complements remainderPolynomial here for i from 1 to len { remainderPolynomial := remainderPolynomial * x + bitString[i+n] * x^{0} // Define bitString[k]=0 for k>len if coefficient of x^{n} of remainderPolynomial = 1 { remainderPolynomial := remainderPolynomial xor generatorPolynomial } } // A popular variant complements remainderPolynomial here return remainderPolynomial }
 Code fragment 1: Simple polynomial division
Note that this example code avoids the need to specify a bitordering convention by not using bytes; the input
bitString
is already in the form of a bit array, and theremainderPolynomial
is manipulated in terms of polynomial operations; the multiplication by x could be a left or right shift, and the addition ofbitString[i+n]
is done to the x^{0} coefficient, which could be the right or left end of the register.This code has two disadvantages. First, it actually requires an n+1bit register to hold the
remainderPolynomial
so that the x^{n} coefficient can be tested. More significantly, it requires thebitString
to be padded with n zero bits.The first problem can be solved by testing the x^{n − 1} coefficient of the
remainderPolynomial
before it is multiplied by x.The second problem could be solved by doing the last n iterations differently, but there is a more subtle optimization which is used universally, in both hardware and software implementations.
Because the XOR operation used to subtract the generator polynomial from the message is commutative and associative, it does not matter in what order the various inputs are combined into the
remainderPolynomial
. And specifically, a given bit of thebitString
does not need to be added to theremainderPolynomial
until the very last instant when it is tested to determine whether toxor
with thegeneratorPolynomial
.This eliminates the need to preload the
remainderPolynomial
with the first n bits of the message, as well:function crc(bit array bitString[1..len], int len) { remainderPolynomial := 0 // A popular variant complements remainderPolynomial here for i from 1 to len { remainderPolynomial := remainderPolynomial xor (bitstring[i] * x^{n1}) if (coefficient of x^{n1} of remainderPolynomial) = 1 { remainderPolynomial := (remainderPolynomial * x) xor generatorPolynomial } else { remainderPolynomial := (remainderPolynomial * x) } } // A popular variant complements remainderPolynomial here return remainderPolynomial }
 Code fragment 2: Polynomial division with deferred message XORing
This is the standard bitatatime hardware CRC implementation, and is well worthy of study; once you understand why this computes exactly the same result as the first version, the remaining optimizations are quite straightforward. If
remainderPolynomial
is only n bits long, then the x^{n} coefficients of it and ofgeneratorPolynomial
are simply discarded. This is the reason that you will usually see CRC polynomials written in binary with the leading coefficient omitted.In software, it is convenient to note that while one may delay the
xor
of each bit until the very last moment, it is also possible to do it earlier. It is usually convenient to perform thexor
a byte at a time, even in a bitatatime implementation like this:function crc(byte array string[1..len], int len) { remainderPolynomial := 0 // A popular variant complements remainderPolynomial here for i from 1 to len { remainderPolynomial := remainderPolynomial xor polynomialForm(string[i]) * x^{n8} for j from 1 to 8 { // Assuming 8 bits per byte if coefficient of x^{n1} of remainderPolynomial = 1 { remainderPolynomial := (remainderPolynomial * x) xor generatorPolynomial } else { remainderPolynomial := (remainderPolynomial * x) } } } // A popular variant complements remainderPolynomial here return remainderPolynomial }
 Code fragment 3: Polynomial division with bytewise message XORing
This is usually the most compact software implementation, used in microcontrollers when space is at a premium over speed.
Bit ordering (Endianness)
When implemented in bit serial hardware, the generator polynomial uniquely describes the bit assignment; the first bit transmitted is always the coefficient of the highest power of x, and the last n bits transmitted are the CRC remainder R(x), starting with the coefficient of x^{n − 1} and ending with the coefficient of x^{0}, a.k.a. the coefficient of 1.
However, when bits are processed a byte at a time, such as when using parallel transmission, byte framing as in 8B/10B encoding or RS232style asynchronous serial communication, or when implementing a CRC in software, it is necessary to specify the bit ordering (endianness) of the data; which bit in each byte is considered "first" and will be the coefficient of the higher power of x.
If the data is destined for serial communication, it is best to use the bit ordering the data will ultimately be sent in. This is because a CRC's ability to detect burst errors is based on proximity in the message polynomial M(x); if adjacent polynomial terms are not transmitted sequentially, a physical error burst of one length may be seen as a longer burst due to the rearrangement of bits.
For example, both IEEE 802 (ethernet) and RS232 (serial port) standards specify leastsignificant bit first (littleendian) transmission, so a software CRC implementation to protect data sent across such a link should map the least significant bits in each byte to coefficients of the highest powers of x. On the other hand, floppy disks and most hard drives write the most significant bit of each byte first.
The lsbitfirst CRC is slightly simpler to implement in software, so is somewhat more commonly seen, but many programmers find the msbitfirst bit ordering easier to follow. Thus, for example, the XMODEMCRC extension, an early use of CRCs in software, uses an msbitfirst CRC.
So far, the pseudocode has avoided specifying the ordering of bits within bytes by describing shifts in the pseudocode as multiplications by x and writing explicit conversions from binary to polynomial form. In practice, the CRC is held in a standard binary register using a particular bitordering convention. In msbitfirst form, the most significant binary bits will be sent first and so contain the higherorder polynomial coefficients, while in lsbitfirst form, the leastsignificant binary bits contain the higherorder coefficients. The above pseudocode can be written in both forms. For concreteness, this uses the 16bit CRC16CCITT polynomial x^{16} + x^{12} + x^{5} + 1:
// Most significant bit first (bigendian) // x^16+x^12+x^5+1 = (1) 0001 0000 0010 0001 = 0x1021 function crc(byte array string[1..len], int len) { rem := 0 // A popular variant complements rem here for i from 1 to len { rem := rem xor (string[i] leftShift (n8)) // n = 16 in this example for j from 1 to 8 { // Assuming 8 bits per byte if rem and 0x8000 { // if leftmost (most significant) bit is set rem := (rem leftShift 1) xor 0x1021 } else { rem := rem leftShift 1 } rem := rem and 0xffff // Trim remainder to 16 bits } } // A popular variant complements rem here return rem }
 Code fragment 4: Shift register based division, MSB first
// Least significant bit first (littleendian) // x^16+x^12+x^5+1 = 1000 0100 0000 1000 (1) = 0x8408 function crc(byte array string[1..len], int len) { rem := 0 // A popular variant complements rem here for i from 1 to len { rem := rem xor string[i] for j from 1 to 8 { // Assuming 8 bits per byte if rem and 0x0001 { // if rightmost (least significant) bit is set rem := (rem rightShift 1) xor 0x8408 } else { rem := rem rightShift 1 } } } // A popular variant complements rem here return rem }
 Code fragment 5: Shift register based division, LSB first
Note that the lsbitfirst form avoids the need to shift
string[i]
before thexor
. In either case, be sure to transmit the bytes of the CRC in the order that matches your chosen bitordering convention.Parallel computation
Another common optimization uses a lookup table indexed by highest order coefficients of
rem
to perform the inner loop over 8 bits in fewer steps. A 256entry lookup table is a particularly common choice, although using a 16entry table twice per byte is very compact and still faster than the bit at a time version. This replaces the inner loop overj
with// Msbitfirst rem = (rem leftShift 8) xor big_endian_table[(leftmost 8 bits of rem) rightShift (n8)] // Lsbitfirst rem = (rem rightShift 8) xor little_endian_table[rightmost 8 bits of rem]
 Code fragment 6: Cores of table based division
One of the most commonly encountered CRC algorithms is known as CRC32, used by (among others) Ethernet, FDDI, ZIP and other archive formats, and PNG image format. Its polynomial can be written msbitfirst as 0x04C11DB7, or lsbitfirst as 0xEDB88320. The W3C webpage on PNG includes an appendix with a short and simple tabledriven implementation in C of CRC32. You will note that the code corresponds to the lsbitfirst byteatatime pseudocode presented here, and the table is generated using the bitatatime code.
Parallel computation without table
Parallel update for a byte or a word at a time can also be done explicitly, without a table.^{[2]} For each bit an equation is solved after 8 bits have been shifted in. The following tables list the equations for some commonly used polynomials, using following symbols:
c_{i} CRC bit 7…0 (or 15…0) before update r_{i} CRC bit 7…0 (or 15…0) after update d_{i} input data bit 7…0 e_{i} = d_{i} + c_{i} e_{p} = e_{7} + e_{6} + … + e_{1} + e_{0} (parity bit) s_{i} = d_{i} + c_{i+8} s_{p} = s_{7} + s_{6} + … + s_{1} + s_{0} (parity bit) Bitwise update equations for some CRC8 polynomials after 8 bits have been shifted in Polynomial: (x^{7} + x^{3} + 1) × x (leftshifted CRC7CCITT) x^{8} + x^{5} + x^{4} + 1 (CRC8Dallas/Maxim) Coefficients: 0x12 = (0x09 << 1) (MSBF/normal) 0x8c (LSBF/reverse) r_{0 } ← r_{1 } ← r_{2 } ← r_{3 } ← r_{4 } ← r_{5 } ← r_{6 } ← r_{7 } ←
0 e_{0} + e_{4} + e_{7} e_{1} + e_{5} e_{2} + e_{6} e_{3} + e_{7} _{ } + e_{0} + e_{4} + e_{7} e_{4} _{ } _{ } + e_{1} + e_{5} e_{5} _{ } _{ } + e_{2} + e_{6} e_{6} _{ } _{ } + e_{3} + e_{7}
e_{0} _{ } _{ } + e_{4} + e_{1} + e_{0} _{ } + e_{5} + e_{2} + e_{1} e_{1} _{ } _{ } + e_{5} + e_{2} + e_{1} _{ } + e_{6} + e_{3} + e_{2} + e_{0} e_{2} _{ } _{ } + e_{6} + e_{3} + e_{2} + e_{0} + e_{7} + e_{4} + e_{3} + e_{1} e_{3} + e_{0} _{ } + e_{7} + e_{4} + e_{3} + e_{1} e_{4} + e_{1} + e_{0} e_{5} + e_{2} + e_{1} e_{6} + e_{3} + e_{2} + e_{0} e_{7} + e_{4} + e_{3} + e_{1}
C code
fragments:uint8_t c, d, e, f, r; … e = c ^ d; f = e ^ (e >> 4) ^ (e >> 7); r = (f << 1) ^ (f << 4);
uint8_t c, d, e, f, r; … e = c ^ d; f = e ^ (e << 3) ^ (e << 4) ^ (e << 6); r = f ^ (f >> 4) ^ (f >> 5);
Bitwise update equations for some CRC16 polynomials after 8 bits have been shifted in Polynomial: x^{16} + x^{12} + x^{5} + 1 (CRC16CCITT) x^{16} + x^{15} + x^{2} + 1 (CRC16ANSI) Coefficients: 0x1021 (MSBF/normal) 0x8408 (LSBF/reverse) 0x8005 (MSBF/normal) 0xa001 (LSBF/reverse) r_{0 } ← r_{1 } ← r_{2 } ← r_{3 } ← r_{4 } ← r_{5 } ← r_{6 } ← r_{7 } ← r_{8 } ← r_{9 } ← r_{10} ← r_{11} ← r_{12} ← r_{13} ← r_{14} ← r_{15} ←
s_{4} + s_{0} s_{5} + s_{1} s_{6} + s_{2} s_{7} + s_{3} _{ } s_{4} _{ } s_{5} + s_{4} + s_{0} _{ } s_{6} + s_{5} + s_{1} _{ } s_{7} + s_{6} + s_{2} c_{0} _{ } + s_{7} + s_{3} c_{1} _{ } _{ } + s_{4} c_{2} _{ } _{ } + s_{5} c_{3} _{ } _{ } + s_{6} c_{4} _{ } _{ } + s_{7} + s_{4} + s_{0} c_{5} _{ } _{ } _{ } + s_{5} + s_{1} c_{6} _{ } _{ } _{ } + s_{6} + s_{2} c_{7} _{ } _{ } _{ } + s_{7} + s_{3}
c_{8 } _{ } _{ } + e_{4} + e_{0} c_{9 } _{ } _{ } + e_{5} + e_{1} c_{10} _{ } _{ } + e_{6} + e_{2} c_{11} _{ } + e_{0} + e_{7} + e_{3} c_{12} _{ } + e_{1} c_{13} _{ } + e_{2} c_{14} _{ } + e_{3} c_{15} _{ } + e_{4} + e_{0} _{ } e_{0} + e_{5} + e_{1} _{ } e_{1} + e_{6} + e_{2} _{ } e_{2} + e_{7} + e_{3} _{ } e_{3} _{ } e_{4} + e_{0} _{ } e_{5} + e_{1} _{ } e_{6} + e_{2} _{ } e_{7} + e_{3}
_{ } _{ } s_{p} _{ } _{ } s_{0} + s_{p} _{ } _{ } s_{1} + s_{0} _{ } _{ } s_{2} + s_{1} _{ } _{ } s_{3} + s_{2} _{ } _{ } s_{4} + s_{3} _{ } _{ } s_{5} + s_{4} _{ } _{ } s_{6} + s_{5} c_{0} _{ } + s_{7} + s_{6} c_{1} _{ } _{ } + s_{7} c_{2} c_{3} c_{4} c_{5} c_{6} c_{7} + s_{p}
c_{8 } _{ } _{ } + e_{p} c_{9 } c_{10} c_{11} c_{12} c_{13} c_{14} + e_{0} c_{15} + e_{1} + e_{0} _{ } e_{2} + e_{1} _{ } e_{3} + e_{2} _{ } e_{4} + e_{3} _{ } e_{5} + e_{4} _{ } e_{6} + e_{5} _{ } e_{7} + e_{6} _{ } e_{p} + e_{7} _{ } _{ } e_{p}
C code
fragments:uint8_t d, s, t; uint16_t c, r; … s = d ^ (c >> 8); t = s ^ (s >> 4); r = (c << 8) ^ t ^ (t << 5) ^ (t << 12);
uint8_t d, e, f; uint16_t c, r; … e = c ^ d; f = e ^ (e << 4); r = (c >> 8) ^ (f << 8) ^ (f << 3) ^ (f >> 4);
uint8_t d, s, p; uint16_t c, r, t; … s = d ^ (c >> 8); p = s ^ (s >> 4); p = p ^ (p >> 2); p = p ^ (p >> 1); p = p & 1; t = p  (s << 1); r = (c << 8) ^ (t << 15) ^ t ^ (t << 1);
uint8_t d, e, p; uint16_t c, r, f; … e = c ^ d; p = e ^ (e >> 4); p = p ^ (p >> 2); p = p ^ (p >> 1); p = p & 1; f = e  (p << 8); r = (c >> 8) ^ (f << 6) ^ (f << 7) ^ (f >> 8);
Twostep computation
As the CRC32 polynomial has a large number of terms, when computing the remainder a byte at a time each bit depends on several bits of the previous iteration. In byteparallel hardware implementations this calls for either multipleinput or cascaded XOR gates which increases propagation delay.
To maximise computation speed, an intermediate remainder can be calculated by passing the message through a 123bit shift register. The polynomial is a carefully selected multiple of the standard polynomial such that the terms (feedback taps) are widely spaced, and no bit of the remainder is XORed more than once per byte iteration. Thus only twoinput XOR gates, the fastest possible, are needed. Finally the intermediate remainder is divided by the standard polynomial in a second shift register to yield the CRC32 remainder.^{[3]}
Onepass checking
When appending a CRC to a message, it is possible to detach the transmitted CRC, recompute it, and verify the recomputed value against the transmitted one. However, a simpler technique is commonly used in hardware.
When the CRC is transmitted with the correct bit order (most significant terms first), a receiver can compute an overall CRC, over the message and the CRC, and if the CRC is correct, the result will be zero. This possibility is the reason that most network protocols that include a CRC do so before the ending delimiter; it is not necessary to know whether the end of the packet is imminent to check the CRC.
CRC variants
In practice, most standards specify presetting the register to allones and inverting the CRC before transmission. This has no effect on the ability of a CRC to detect changed bits, but gives it the ability to notice bits that are added to the message.
Preset to −1
The basic mathematics of a CRC accepts (considers as correctly transmitted) messages which, when interpreted as a polynomial, are a multiple of the CRC polynomial. If some leading 0 bits are prepended to such a message, they will not change its interpretation as a polynomial. This is equivalent to the fact that 0001 and 1 are the same number.
But if the message being transmitted does care about leading 0 bits, the inability of the basic CRC algorithm to detect such a change is undesirable. If it is possible that a transmission error could add such bits, a simple solution is to start with the
rem
shift register set to some nonzero value; for convenience, the allones value is typically used. This is mathematically equivalent to complementing the first n bits of the message.This does not affect CRC generation and checking in any way, as long as both generator and checker use the same initial value. Any nonzero initial value will do, and a few standards specify unusual values,^{[4]} but the allones value (−1 in twos complement binary) is by far the most common.
Postinvert
The same sort of error can occur at the end of a message. Appending 0 bits to a message is equivalent to multiplying its polynomial by x, and if it was previously a multiple of the CRC polynomial, the result of that multiplication will be, as well. This is equivalent to the fact that, since 726 is a multiple of 11, so is 7260.
A similar solution can be applied at the end of the message, inverting the CRC register before it is appended to the message. Again, any nonzero change will do; inverting all the bits is simply the most common.
This has an effect on onepass CRC checking; instead of producing a result of zero when the message is correct, it produces a fixed nonzero result. (To be precise, the result is the noninverted CRC of the inversion pattern.) This is still straightforward to verify.
See also
General category
 Error correcting code
 List of checksum algorithms
 Parity
NonCRC checksums
References
 ^ ^{a} ^{b} Williams, Ross N. (19960924). "A Painless Guide to CRC Error Detection Algorithms V3.00". http://www.repairfaq.org/filipg/LINK/F_crc_v3.html. Retrieved 20080207.
 ^ Jon Buller (19960315). "<31498ED0.7C0A@nortel.com> Re: 8051 and CRCCCITT". comp.arch.embedded. (Web link). Retrieved 20090727.
 ^ Glaise, René J. (19970120). "A twostep computation of cyclic redundancy code CRC32 for ATM networks". IBM Journal of Research and Development (Armonk, NY: IBM) 41 (6): 705. doi:10.1147/rd.416.0705. http://www.research.ibm.com/journal/rd/416/glaise.html. Retrieved 20090213.
 ^ E.g. lowfrequency RFID TMS37157 data sheet, Texas Instruments, November 2009, p. 39, http://www.ti.com/lit/gpn/tms37157, retrieved 20110415, "The CRC Generator is initialized with the value 0x3791 as shown in Figure 50."
External links
 64Bit CRC  Bitwise XOR LongDivision To Bytewise TableLookup: Comprehensive documentation of CRCMath, Procedure, and CCode.
Categories: Checksum algorithms
 Finite fields
Wikimedia Foundation. 2010.