- Tornado code
In
computer science , tornado codes are a class oferasure codes that supporterror correction and have fast encoding and decoding algorithms. Software-based implementations of tornado codes are about 100 times faster on small lengths and about 10,000 times faster on larger lengths than software-basedReed-Solomon erasure codes while having only slightly worse overhead.Tornado codes are fixed rate, near optimal erasure correcting codes that use sparse bipartite graphs to trade encoding and decoding speed for reception overhead. Since the introduction of Tornado codes, many other similar codes have emerged, most notably
Online codes ,LT codes andRaptor codes .Rough Overview
Tornado codes work via xor (exclusive or). Xor operates on binary values, 1s and 0s. A xor B is 1 if A and B have different values and 0 if A and B have the same values. If you are given (A xor B) and A, you can determine the value for B. (Note that A xor B xor A is equal to B.) Similarly, if you are given (A xor B) and B, you can determine the value for A. This extends to multiple values, so given (A xor B xor C xor D) and any 3 of the values, the missing value can be recovered.
The xor of a number of binary variables is called their "parity" and this is often used in error detection and correction. Tornado codes use it for error correction. They use another checksum (like CRC-32 or MD5) for error detection.
The Tornado code algorithm starts with the sender breaking an input file or message into equal sized blocks of bytes. Let's call these blocks A [1] through A [N] . The sender records the index of each block and computes a checksums for the block and its index. (These will be used to determine if a block has been damaged during transmission and therefore needs to be recovered.) The sender also calculates some parity blocks, B [1] through B [K] . Each of these parity blocks holds the parity for a subset of the input blocks A [1] through A [N] . The sizes of and composition of these subsets is key to the speed and success of this algorithm. For each parity block, the sender records the indices of the input blocks and a checksum for the parity block and its input indices.
The sender now sends the input and parity blocks (with their indices and checksums) to the receiver. During this transmission, some of the blocks may be corrupted.
The receiver uses the checksums to identify the bad blocks and discards them. The receiver is now left with a subset of the input blocks and some parity blocks. As long as the receiver has received N + C blocks (where C is some constant), it is highly probable that the receiver can recover the file. Now, each parity block is associated with a subset of input blocks and for most parity blocks there may be multiple input blocks missing from its subset. However, given the sizes of the random subsets, it is highly likely that there exists one parity block that is missing only one of its input blocks. Using the xor operation described above, that missing input block can be recovered. Once it is recovered, a parity block that was previously missing two input blocks may now be missing just one and that one can now be recovered. This process continues - input blocks being recovered and more parity blocks being available to recover missing blocks - until the entire input file or message is recovered.
The beauty of the algorithm comes from determining the sizes and composition of the subsets. On average, the sizes are low - making it very fast to create them and fast to recover the file. However, occasionally they are large - covering most of the input blocks - so that any missing block can be recovered.
Exact Algorithm
"Need details on the exact formula for computing the size of subsets, given a value for C and expected loss rate. Also need big O() notation for algorithm and comparison for best of Reed-Solomon implementation. Source code is always great."
The subsets for each parity block are generated in a semi-random fashion. The algorithm assigned for each input block a number of parity blocks it will be present in. The algorithm also assigns for each parity block the number of input blocks that will be in it. Then randomly parity blocks with remaining slots are assigned input blocks with remaining slots. This is called a random degree-constrained bipartite graph. This method generates parity blocks with random subsets of input blocks with a specific distribution that allows the input file or message to be recovered with very high probability.
Patent Issues
Tornado codes and many similar codes (LT codes, Online codes) are patented inside the United States of America and cannot be used with out the patent holder's permission.
Citations
Michael Luby created Tornado codes and has good descriptions on his website. [http://www.icsi.berkeley.edu/~luby/]
A readable description from CMU (PostScript): [http://www.cs.cmu.edu/afs/cs.cmu.edu/project/pscico-guyb/realworld/www/tornado.ps]
Probable patent citation from Michael Luby's CV: "Message Encoding and Transmission System and Method for Multilevel Data Redundancy", Andres Albanese, Michael Luby, Johannes Blomer and Je Edmonds, U.S. Patent No. 5,617,541, Issued April 1, 1997. Serial Number 08/361,802; 12-21-94, Assignment recorded February 27, 1995, Reel 7364, Frames 685-689.
Wikimedia Foundation. 2010.