- LZWL
LZWL is a syllable-based variant of the character-based
LZW compression algorithm.LZWL can work with syllables obtained by all algorithms of decomposition into syllables. This algorithm can be used for words too.
yllables
According Compact Oxford English Dictionary syllable is defined as: ‘A unit of pronunciation having one vowel sound, with or without surrounding consonants, and forming all or part of a word.’
As the decomposition to syllables is used in data compression, it is not necessary to decompose words into syllables always correctly.
Algorithm
Algorithm LZWL can work with syllables obtained by all algorithms of decomposition into syllables. This algorithm can be used for words too.
In the initialization step the dictionary is filled up with all characters from the alphabet. In each next step it is searched for the maximal string S, which is from the dictionary and matches the prefix of the still non-coded part of the input. The number of phrase S is sent to the output. A new phrase is added to the dictionary. This phrase is created by concatenation of string S and the character that follows S in file. The actual input position is moved forward by the length of S.Decoding has only one situation for solving. We can receive the number of phrase, which is not from the dictionary. In this case we can create that phrase by concatenation of the last added phrase with its first character.
The syllable-based version works over an alphabet of syllables. In the initialization step we add to the dictionary the empty syllable and small syllables from a database of frequent syllables. Finding string S and coding its number is similar to the character-based version, except that string S is a string of syllables. The number of phrase S is encoded to the output. The string S can be the empty syllable.
If S is the empty syllable, then we must get from the file one syllable called K and encode K by methods for coding new syllables. Syllable K is added to the dictionary. The position in the file is moved forward by the length of S. In the case when S is the empty syllable, the input position is moved forward by the length of K.
In adding a phrase to the dictionary there is a difference to the character-based version. The phrase from the next step will be called S1. If S and S1 are both non-empty syllables, then we add a new phrase to the dictionary. The new phrase is created by the concatenation of S1 with the first syllable of S. This solution has two advantages: The first is that strings are not created from syllables that appear only once. The second advantage is that we cannot receive in decoder number of phrase that is not from dictionary.External links
* [http://ftp.informatik.rwth-aachen.de/Publications/CEUR-WS/Vol-176/paper5.pdf Detailed description]
Wikimedia Foundation. 2010.