- UTF-1
UTF-1 is a way of transforming ISO 10646/
Unicode into a stream ofbyte s. Due to the design it is not possible to resynchronise if decoding starts in the middle of a character (this makes truncation hard, among other things) and simplebyte-oriented search routines cannot be reliably used with it. UTF-1 is also fairly slow due to its use of division. Due to these issues, UTF-1 never gained wide acceptance and has been almost totally replaced byUTF-8 .Design
UTF-1 is a multi-byte encoding like
UTF-8 , a singleUnicode code point can be encoded in one, two, three, or five octets. While theASCII range is encoded as one octet as inUTF-8 the ASCII octets 0x21 - 0x7E (decimal 33 - 126) are also used in UTF-1 multi-byte encodings, therefore UTF-1 is unsuited for many Internet protocols includingMIME .UTF-1 does not use the
C0 and C1 control codes in other encodings, any 0x00 - 0x20 (decimal 0 - 32) and any 0x7F - 0x9F (decimal 127 - 159) octet stands for the corresponding code point u+0000 - u+0020 and u+007F - u+009F, respectively. This design with 66 "protected" octets tried to be ISO 2022 compatible.The UTF-1 encoding scheme uses "modulo 190" arithmetics (), it was designed to encode the complete 31 bits of the original
Universal Character Set (UCS-4).For comparison, UTF-8 "protects" all 128 ASCII octets, and needs two bits in trail bytes of multi-byte encodings for this purpose, resulting in "modulo 64" (base 64) arithmetics (, ). BOCU-1 "protects" only the 13 octets 0x00, 0x07 - 0x0F, 0x1A - 0x1B, and 0x20 (space), covering the minimal set required for MIME-compatibility, resulting in "modulo 243" arithmetics ().See also
*
Comparison of Unicode encodings
*Universal Character Set References
* [http://www.itscj.ipsj.or.jp/ISO-IR/178.pdf ISO IR 178] (PDF, 256 KB, the retired UTF-1 specification)
Wikimedia Foundation. 2010.