Shift JIS

Shift JIS

Shift JIS (also SJIS, MIME name Shift_JIS) is a character encoding for the Japanese language originally developed by a Japanese company called ASCII Corporation in conjunction with Microsoft and standardized as JIS X 0208 Appendix 1. It is based on character sets defined within JIS standards JIS X 0201:1997 (for the single-byte characters) and JIS X 0208:1997 (for the double byte characters). The lead bytes for the double byte characters are "shifted" around the 64 halfwidth katakana characters in the single-byte range 0xA1 to 0xDF. The single-byte characters 0x00 to 0x7F match the ASCII encoding, except for a yen sign at 0x5C and an overline at 0x7E in place of the ASCII character set's backslash and tilde respectively. On the web, 0x5C is still used as the JavaScript escape character. The single-byte characters from 0xA1 to 0xDF map to the half-width katakana characters found in JIS X 0201.

Shift JIS requires an 8-bit medium for transmission. It is fully backwards compatible with the legacy JIS X 0201 single-byte encoding, meaning it supports half-width katakana and that any valid JIS X 0201 string is also a valid Shift JIS string. However Shift JIS only guarantees that the first byte will be in the upper ASCII range; the value of the second byte can be either high or low. This makes reliable Shift JIS detection difficult. On the other hand, the competing 8-bit format EUC-JP, which does not support single-byte halfwidth katakana, allows for a much cleaner and direct conversion to and from JIS X 0208 codepoints, as all upper-ASCII bytes are part of a double-byte character and all lower-ASCII bytes are part of a single-byte character.

For a double-byte JIS sequence j_1 j_2, the transformation to the corresponding Shift JIS bytes s_1 s_2 is::33 le j_1 le 94 Rightarrow s_1 = left lfloor frac{j_1 + 1}{2} ight floor + 112,:95 le j_1 le 126 Rightarrow s_1 = left lfloor frac{j_1 + 1}{2} ight floor + 176,:j_1 mbox{ is odd } Rightarrow s_2 = j_2 + 31 + egin{cases} 1 & mbox{if }j_2 ge 96 \ 0 & mbox{otherwise} end{cases} ,:j_1 mbox{ is even } Rightarrow s_2 = j_2 + 126,

Many different versions of Shift JIS exist. There are two areas for expansion: Firstly, JIS X 0208 does not fill the whole 94×94 space encoded for it in Shift JIS, therefore there is room for more characters here — these are really extensions to JIS X 0208 rather than to Shift JIS itself. The most popular extension here is to the Windows-31J (otherwise known as Code page 932) encoding popularized by Microsoft, although Microsoft itself does not recognize the Windows-31J name and instead calls that variation "shift_jis". Secondly, Shift JIS has more encoding space than is needed for JIS X 0201 and JIS X 0208, and this space can and is used for yet more characters. The space with lead bytes 0xF5 to 0xF9 is used by Japanese mobile phone operators for pictographs for use in E-mail, for example (KDDI goes further and defines hundreds more in the space with lead bytes 0xF3 and 0xF4).

Beyond even this there have been numerous minor variations made on Shift JIS, with individual characters here and there altered. Most of these extensions and variants have no IANA registration, so there is much scope for confusion if the extensions are used. Microsoft Code Page 932 is registered separately from Shift JIS.

IBM CCSID 943 has the same extensions as Code Page 932.

As with most code pages and encodings it is recommended that Unicode be used instead.

Shift JIS byte map

The chart below gives the detailed meaning of each byte in a Shift JIS encoded stream.

See also

* Japanese language and computers
* Mojibake
* Shift JIS art

External links

* [http://lfw.org/text/jp.html Ping: Japanese text encoding] (Note the algorithm given on this page appears to be wrong, although the figures are correct.)
* [http://www.rikai.com/library/kanjitables/kanji_codes.sjis.shtml Shift-JIS] A table of the non-ASCII part of the codeset.
* [http://mail.apps.ietf.org/ietf/charsets/msg00616.html Proposal for clarification of the difference between Shift JIS and Windows-31J at IANA]
* [http://www.iana.org/assignments/character-sets IANA assignments for character sets]
* [http://www.microsoft.com/globaldev/reference/dbcs/932.htm Microsoft's definition of Code Page 932]
* [http://publibn.boulder.ibm.com/doc_link/en_US/a_doc_lib/aixprggd/genprogc/codeset_over.htm#HDRMGC0DAN IBM Code Page description page] Includes a brief description of where all the IBM 943 extensions came from.
* [http://www.pitt.edu/~ctnst3/cjk/jis.c C source code for JIS/SJIS/EUC character set transformation]
*Forms of Shift-JIS in ICU (International Components for Unicode)
** [http://demo.icu-project.org/icu-bin/convexp?conv=ibm-942 ibm-942 (sjis78)]
** [http://demo.icu-project.org/icu-bin/convexp?conv=ibm-943 ibm-943 (Contains the u00A5 ↔ x5C mapping)]
** [http://demo.icu-project.org/icu-bin/convexp?conv=Shift_JIS Shift JIS (Contains the u005C ↔ x5C mapping)]
* [http://wakaba-web.hp.infoseek.co.jp/table/sjis-0208-1997-std.txt Mapping table between Shift JIS and Unicode]
* [http://www.ibiblio.org/pub/academic/communications/papers/Virtual-Communities-in-Japan page mentions ASCII corporation as co-developer of Shift-JIS]
* [http://examples.oreilly.com/cjkvinfo/errata/cjkv-errata-1-2.txt errata for Lunde's CJKV, corrects "developed by Microsoft Corporation" to "co-developed by ASCII Corporation and Microsoft Corportion"]


Wikimedia Foundation. 2010.

Игры ⚽ Поможем написать реферат

Look at other dictionaries:

  • Shift-jis — (SJIS) est un codage de caractères pour la langue japonaise, développé par Microsoft. Comme son nom l indique, il est basé sur l encodage ISO 2022 JP (JIS), mais avec un nombre plus importants d octets permettant l utilisation de 64 katakana… …   Wikipédia en Français

  • Shift-JIS — (SJIS) est un codage de caractères pour la langue japonaise, développé par Microsoft. Comme son nom l indique, il est basé sur l encodage ISO 2022 JP (JIS), mais avec un nombre plus importants d octets permettant l utilisation de 64 katakana… …   Wikipédia en Français

  • Shift JIS — (также MS Kanji, MIME Shift JIS) кодировка для японского языка, описана в приложении 1 стандарта JIS X 0208. Расширяет кодировку JIS X 0201, добавляя в неё символы из JIS X 0208. Для кодирования символа используется 1 или 2 байта. JIS X 0201 JIS… …   Википедия

  • Shift JIS art — is artwork created from characters within the Shift JIS character set, a superset of ASCII intended for Japanese usage. Naturally there are many similarities between Shift JIS artwork and ASCII art.Shift JIS has become very popular on web based… …   Wikipedia

  • Shift-JIS — (Abkürzung SJIS) ist eine Zeichenkodierung für die japanische Schrift, entwickelt von dem japanischen Unternehmen ASCII in Zusammenarbeit mit Microsoft. Es basiert auf der Norm JIS X 0208 (JIS), verschiebt (shift) aber Bytewerte, um 64… …   Deutsch Wikipedia

  • Shift JIS — (Abkürzung SJIS) ist eine Zeichenkodierung für die japanische Schrift, entwickelt von dem japanischen Unternehmen ASCII in Zusammenarbeit mit Microsoft. Es basiert auf der Norm JIS X 0208 (JIS), verschiebt (shift) aber Bytewerte, um 64… …   Deutsch Wikipedia

  • 2channel Shift JIS art — >>1 san>>1 san (>>1さん, ichisan ) is a common Shift JIS art character on the popular Japanese BBS 2channel ( 2ch for short), standing in for the user who started a given thread. He is commonly characterized as a clueless newbie who rarely thinks… …   Wikipedia

  • JIS X 0212 — is a Japanese Industrial Standard defining coded character set for encoding the characters used in Japanese. This standard extends JIS X 0208.HistoryIn 1990 the [http://www.jsa.or.jp/ Japanese Standards Association] (JSA) released a supplementary …   Wikipedia

  • JIS X 0201 — JIS X 0201, a Japanese Industrial Standard developed in 1969 (then called JIS C 6220 until the JIS category reform), was the first Japanese character encoding to become widely used. It is either 7 bit encoding or 8 bit encoding, although 8 bit… …   Wikipedia

  • JIS C 6226 — JIS X 0208 (jap. 7ビット及び8ビットの2バイト情報交換用符号化漢字集合, dt. „7 Bit und 8 Bit paarbytekodierte Kanji Mengen zum Informationsaustausch“) ist ein Zeichensatz sowie ein Japan Industrial Standard, der die japanische Schrift kodiert. Er enthält 6.879 Zeichen.… …   Deutsch Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”