- UTF-EBCDIC
UTF-EBCDIC is a
character encoding used to representUnicode characters. It is meant to beEBCDIC -friendly, so that legacyEBCDIC applications on mainframes may process the characters without much difficulty. Its advantages for existing EBCDIC-based systems are similar toUTF-8 's advantages for existingASCII -based systems. Details on UTF-EBCDIC are defined in Unicode Technical Report #16.To produce the UTF-EBCDIC encoded version of a series of Unicode code points, an encoding based on UTF-8 (known in the specification as UTF-8-Mod) is applied first. The main difference between this encoding and UTF-8 is that it allows unicode code points U+0080 through U+009F (the
C1 control code s) to be represented as a single byte and therefore later mapped to corresponding EBCDIC control codes. In order to achieve this 101XXXXX was used instead of 10XXXXXX as the format for later bytes in a multi-byte sequence. As this can only hold 5 bits rather than 6, UTF-EBCDIC will generally produce larger output for the same input data than UTF-8.This transformation leaves the data in an ASCII based format, so a reversible byte-byte transform is made on this data using a lookup table to make it as close to normal EBCDIC code pages as feasible. These steps can be easily reversed to recover the unicode code points.
Generally, this encoding form is rarely used, even on EBCDIC based mainframes for which it was designed.
IBM EBCDIC based mainframe operating systems, likez/OS , usually useUTF-16 for complete Unicode support. For example,DB2 UDB ,COBOL ,PL/I , Java and theIBM XML toolkit support UTF-16 on IBM mainframes.See also
*
UTF-1
*BOCU-1 External links
* http://www.unicode.org/reports/tr16/ Unicode Technical Report #16: the definition of UTF-EBCDIC
Wikimedia Foundation. 2010.