Japanese language and computers

Japanese language and computers

In relation to the Japanese language and computers many adaptation issues arise, some unique to Japanese and others common to languages which have a very large number of characters. The number of characters needed in order to write English is very small, and thus it is possible to use only one byte to encode one English character. However, the number of characters in Japanese is much more than 256, and hence Japanese cannot be encoded using only one byte, and Japanese is thus encoded using two or more bytes, in a so-called "double byte" or "multi-byte" encoding. Some problems relate to transliteration and romanization, some to character encoding, and some to the input of Japanese text.

Character encodings

There are several standard methods to encode Japanese characters for use on a computer, including JIS, Shift-JIS, EUC, and Unicode. While mapping the set of kana is a simple matter, kanji has proven more difficult. Despite efforts, none of the encoding schemes have become the de facto standard, and multiple encoding standards are still in use today.

For example, most Japanese e-mails are in JIS encoding and web pages in Shift-JIS and yet mobile phones in Japan usually use some form of Extended Unix Code. If a program fails to determine the encoding scheme employed, it can cause "mojibake" (misconverted characters, literally "transformed characters" from the combination of "moji" _ja. 文字 meaning "character" and the stem of "bakeru" _ja. 化ける meaning "to change form") and thus unreadable text on computers.

To understand how this state of affairs has arisen, it is useful to learn a little about the history of the encodings. The first encoding to become widely used was JIS X 0201, which is a single-byte encoding that only covers standard 7-bit ASCII characters with half-width katakana extensions. This was widely used in systems that were neither powerful enough nor had the storage to handle kanji (including DOS and old embedded equipment such as cash registers). The development of kanji encodings was the beginning of the split. Shift JIS was developed to be completely backward compatible with JIS X 0201, and thus is used in Windows (for backwards compatibility with DOS), and in much embedded electronic equipment.Confusing|date=September 2007 However, Shift JIS has the unfortunate property that it often breaks any parser that is not specifically designed to handle it. EUC, on the other hand, is not backwards compatible with JIS X 0201, but is handled much better by parsers that have been written for 7-bit ASCII (and thus EUC encodings are used on UNIX where much of the file-handling code was historically only written for English encodings). Further complications arise because the original Internet e-mail standards only support 7-bit transfer protocols. Thus JIS encoding was developed for sending and receiving e-mails.

Not all required characters may be included in a character set standard such as JIS, so gaiji ( _ja. 外字 "external characters") are sometimes used to supplement the character set. Gaiji may come in the form of external font packs, where normal characters have been replaced with new characters, or the new characters have been added to unused character positions. However, gaiji are not practical in Internet environments since the font set must be transferred with text to use the gaiji. As a result, such characters are written with similar or simpler characters in place, or the text may need to be written using a larger character set (such as Unicode) that supports the required character.

Unicode is supposed to solve all encoding problems in all languages of the world. For Japanese, the kanji characters have been unified with Chinese, that is a character considered to be the same in both Japanese and Chinese have been given one and the same code number in Unicode, even if they look a little different. This process, called Han unification, has caused controversy. The previous encodings in Japan, Taiwan, China and Korea have only handled one language and Unicode should handle all. There has been resistance against Unicode in Japan since it is said to be an American invention not Japanese. The handling of Kanji/Chinese have however been designed by a committee of Japanese/Korean/Taiwanese/Chinese people. Unicode is slowly growing because it is better supported by US made software, but still most homepages in Japanese use Shift-JIS. Wikipedia uses Unicode.

Text input

Written Japanese uses several different scripts: kanji (Chinese characters), 2 sets of "kana" (phonetic syllabaries) and roman letters. While kana and roman letters can be typed directly into a computer, entering kanji is a more complicated process as there are far more kanji than there are keys on most keyboards. To input kanji on modern computers, the reading of kanji is usually entered first, then an input method editor (IME), also sometimes known as a front-end processor, shows a list of candidate kanji that are a phonetic match, and allows the user to choose the correct kanji. More-advanced IMEs work not by word but by phrase, thus increasing the likelihood of getting the desired characters as the first option presented. Kanji readings inputs can be either via romanization ("rōmaji nyūryoku," Nihongo2|ローマ字入力) or direct kana input ("kana nyūryoku," Nihongo2|かな入力). Direct kana input is not commonly used, but is widely supported.

There are two main systems for the romanization of Japanese, known as "Kunrei-shiki" and "Hepburn"; "keyboard romaji" (also known as "wāpuro rōmaji" or "word processor romaji") generally allows a loose combination of both. IME implementations may even handle keys for letters unused in any romanization scheme, such as "L", converting them to the most appropriate equivalent. With kana input, each key on the keyboard directly corresponds to one kana. The JIS keyboard system is national standard, but some people use alternatives like Oyayubi shift system.

Direction of text

Japanese has two directions of writing, called yokogaki and tategaki. The "yokogaki" style is the same as English, but the "tategaki" style involves columns of text written downwards, stacked right to left.

At present, handling of downward text is incomplete. For example, HTML has no support for "tategaki" and Japanese users must use HTML tables to simulate it. However, CSS level 3 includes a property "writing-mode" which can render "tategaki" when given the value "tb-rl" (i.e top to bottom, right to left). Word processors and DTP software have more complete support for it.

See also

*Japanese writing system
*Japanese language
*CJK characters
*Korean language and computers

External links

* [http://shittoku.com/default_cat.asp?data_id=32 Japanese Owned computer companies in United States]
* [http://web.archive.org/web/20060527013315/http://www.cs.mcgill.ca/~aelias4/encodings.html A complete introduction to Japanese character encodings]
* [http://examples.oreilly.com/cjkvinfo/doc/cjk.inf Chinese, Japanese, and Korean character set standards and encoding systems]
* [http://lfw.org/text/jp.html Japanese text encoding]
* [http://www.geocities.jp/ep3797/japanese_fonts.html A collection of free Japanese typefaces]
* [http://www.hesjapanese.com How to install japanese font]

Wikimedia Foundation. 2010.

Look at other dictionaries:

  • Korean language and computers — This article addresses how computers are used to read and write Korean, using Hangul.Character encodingsIn , a method known as ISO 2022 KR for a 7 bit encoding of Korean characters in email was described. Where 8 bits are allowed, the EUC KR… …   Wikipedia

  • Japanese language — Japanese 日本語 Nihongo Nihongo (Japanese) in Japanese script Pronunciation …   Wikipedia

  • Japanese input methods — are the methods used to input Japanese characters on a computer. There are two main methods of inputting Japanese on computers. One is via a romanized version of Japanese called rōmaji (literally Roman letters ), and the other is via keyboard… …   Wikipedia

  • Japanese dictionary — Japanese dictionaries have a history that began over 1300 years ago when Japanese Buddhist priests, who wanted to understand Chinese sutras, adapted Chinese character dictionaries. Present day Japanese lexicographers are exploring computerized… …   Wikipedia

  • Computers and Information Systems — ▪ 2009 Introduction Smartphone: The New Computer.       The market for the smartphone in reality a handheld computer for Web browsing, e mail, music, and video that was integrated with a cellular telephone continued to grow in 2008. According to… …   Universalium

  • Japanese script reform — The Japanese script reform is the attempt to correlate standard spoken Japanese with the written word, which began during the Meiji period. This issue is known in Japan as the kokugo kokuji mondai (国語国字問題, national language and script problem?).… …   Wikipedia

  • language — /lang gwij/, n. 1. a body of words and the systems for their use common to a people who are of the same community or nation, the same geographical area, or the same cultural tradition: the two languages of Belgium; a Bantu language; the French… …   Universalium

  • language — lan|guage W1S2 [ˈlæŋgwıdʒ] n ▬▬▬▬▬▬▬ 1¦(English/French/Arabic etc)¦ 2¦(communication)¦ 3¦(style/type of words)¦ 4¦(computers)¦ 5¦(swearing)¦ 6 strong language 7¦(sounds/signs/actions)¦ ▬▬▬▬▬▬▬ [Date: 1200 1300; : Old French; Origin: langue …   Dictionary of contemporary English

  • Language teaching methods — Main article: Language education Language education may take place as a general school subject or in a specialized language school. There are many methods of teaching languages. Some have fallen into relative obscurity and others are widely used; …   Wikipedia

  • language — I (New American Roget s College Thesaurus) System of communication Nouns 1. language, tongue, lingo, vernacular, mother tongue, protolanguage; living or dead language; idiom, parlance, phraseology; wording; dialect, patois, cant, jargon, lingo,… …   English dictionary for students

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”