Code page 930

Code page 930

Code page 930 (abbreviated as CP930, also known as Japanese EBCDIC) is a code page created by IBM for representation of Japanese text. It is a superset of EBCDIC. It is commonly used on IBM OS390 and IBM AS400 operating system.It encodes halfwidth Katakana, fullwidth Katakana and Hiragana and Kanji.

Technical detail

CP930 uses 1 byte to encode halfwidth Katakana and 2 bytes to encode all other Japanese characters. If only halfwidth Katakana mixed with Latin characters is used, which was the standard till the 80s, CP930 can be considered a pure 8bit encoding. Else it is a mixed single byte double byte encoding with the added flavor of using a Shift-In 0x0E and Shift-Out 0x0F byte to indicate the start and end of a double-byte encoding. Thus a 4 character Kanji name is commonly is encoded as 10 bytes.

Practical considerations

CP930 itself and CP930 usage patterns contains a number of idiosyncrazies, which makes working with CP930 in practice hard (see also EBCDIC for idiosyncrazies of the EBCDIC standard) and are of some practical relevance.
* Because of the Shift-In, Shift-Out codes parsing a byte sequence from the middle is hard.- On the positive side the Shift-In 0x0E and Shift-Out 0x0F bytes are a sure way of spotting CP930 even when it has been run through an incorrect code page conversion resulting in mojibake.
* Although CP930 allows for mixed halfwidth and fullwidth character text, many database schemas strictly distinguish between columns containing only single byte halfwidth Katakana and such containing only double byte fullwidth characters. This is a convenience created for software developers to make text length prediction for a given column size in bytes easier and vice-versa.
* On the downside the above means that for consistency Latin text in such fullwidth character column will have to be entered or converted into fullwidth Alphabetic characters (interesting when doing database searches) such that they are encoded as double byte characters
* When database columns are implicitly defined as pure fullwidth character text the Shift-In, Shift-Out codes are often omitted, which results in strictly speaking incorrect encoding. Code page converters might or might not be sensitive to those missing Shift-In, Shift-Out codes.

Wikimedia Foundation. 2010.

Игры ⚽ Нужен реферат?

Look at other dictionaries:

  • Code page — is another term for character encoding. It consists of a table of values that describes the character set for a particular language. The term code page originated from IBM s EBCDIC based mainframe systems,[1] but many vendors use this term… …   Wikipedia

  • Code page 850 — character set with 9×16 glyphs, as it usually rendered by VGA Code page 850 (also known as CP 850, IBM 00850,[1] OEM 850,[2] MS DOS Latin 1[3]) is a …   Wikipedia

  • Code page 437 — Code page 437, as rendered by the IBM PC using a VGA adapter. IBM PC or MS DOS code page 437, often abbreviated CP437 and also known as DOS US, OEM US or sometimes misleadingly referred to as the OEM font, High ASCII or Extended ASCII,[1][2] is… …   Wikipedia

  • Code page 865 — (also known as CP 865, IBM 00865,[1] OEM 865, MS DOS Nordic[2]) is a code page used under MS DOS to write Nordic languages (except Icelandic, for which code page 861 is used). Code page 865 differs from code page 437 in three points: 0x9B (‹ø›… …   Wikipedia

  • Code page 852 — (also known as CP 852, IBM 00852,[1] OEM 852 (Latin II),[2][3] MS DOS Latin 2[4]) is a code page used under MS DOS to write Central European languages that use Latin script (such as Bosnian, Croatian, Czech, Hungarian …   Wikipedia

  • Code page 857 — (also known as CP 857, IBM 00857,[1] OEM 857,[2] MS DOS Turkish[3]) is a code page used under MS DOS to write Turkish. Code page 857 is based on code page 850, but with many changes. It includes all characters from ISO 8859 9. Code page layout… …   Wikipedia

  • Code page 855 — (also known as CP 855, IBM 00855,[1] OEM 855,[2] MS DOS Cyrillic[3]) is a code page used under MS DOS to write Cyrillic script. This code page is not used much. Code page layout The following table shows code page 855.[2] …   Wikipedia

  • Code page 737 — (also known as CP 737, IBM 00737,[1] OEM 737,[2] MS DOS Greek[3]) is a code page used under MS DOS to write Greek language. It was much more popular than code page 869. Code page layout The following table shows code page 737.[2] …   Wikipedia

  • Code page 869 — (CP 869, IBM 869, OEM 869) is a code page used under MS DOS to write Greek language. It is also called MS DOS Greek 2.[1] It was designed to include all characters from ISO 8859 7. Code page 869 was not as popular as code page 737. Code page… …   Wikipedia

  • Code page 861 — (also known as CP 861, IBM 00861,[1] OEM 861, MS DOS Icelandic[2]) is a code page used under MS DOS to write the Icelandic language (as well as other Nordic languages). Code page layout The following table shows Code page 861. Each character is… …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”