CJK characters

CJK characters

CJK is a collective term for Chinese, Japanese, and Korean, which constitute the main East Asian languages. The term is used in the field of software and communications internationalization.

The term CJKV means CJK plus Vietnamese, which in the past used Hán tự/Chinese characters and Chữ Nôm prior to adopting Quốc Ngữ.

These languages all have a shared characteristic: Their writing systems all completely or partly use Chinese characters — hànzì in Chinese, kanji in Japanese, and hanja in Korean. Chinese is written in Chinese characters only and requires c. 4,000 characters for general literacy although there are up to 40,000 characters for reasonably complete coverage. Japanese uses fewer characters — general literacy in Japan can be expected with about 2,000 characters — together with two syllabaries. The use of Chinese characters in Korea is becoming increasingly rare altogether, although idiosyncratic use of Chinese characters in proper names requires knowledge (and therefore availability) of many more characters. The number of characters required for complete coverage of all these languages' needs cannot fit in the 256-character code space of 8-bit character encodings, requiring at least a 16-bit fixed width encoding or multi-byte variable-length encodings. The 16-bit fixed width encodings, such as Unicode up to and including version 2.0, are now deprecated due to the requirement to encode more characters than a 16-bit encoding can accommodate — Unicode 5.0 has some 90,000 Han characters — and the requirement by the Chinese government that software in China support the GB18030 character set.

Although CJK encodings have common character sets, the encodings often used to represent them have been developed separately by different East Asian governments and software companies, and are mutually incompatible. Unicode has attempted, with some controversy, to unify the character sets in a process known as Han unification.

CJK character encodings should consist minimally of Han characters plus language-specific phonetic scripts such as pinyin, bopomofo, hiragana, katakana, and hangul.

CJK character encodings include:
*Big5
*EUC-JP
*EUC-KR
*GB18030 (the mandated standard in the People's Republic of China)
*GB2312
*ISO 2022-JP
*KS C 5861
*Shift-JIS
*Unicode

The CJK character sets take up the bulk of the Unicode code space. There is much controversy among Japanese experts of Chinese characters about the desirability and technical merit of the Han unification process used to map multiple Chinese and Japanese characters sets into a single set of unified characters.

Chinese and Japanese can be written both left-to-right and top-to-bottom, but is usually considered a left-to-right script when discussing encoding issues.

ee also

*Chinese character encoding
*Han unification
*Chinese input methods for computers
*Japanese language and computers
*Korean language and computers
*Input method editor
*Variable-width encoding
*Complex Text Layout languages (CTL)
* CJK strokes
*Horizontal and vertical writing in East Asian scripts
*Graphics tablet

References

*DeFrancis, John. "". Honolulu: University of Hawaii Press, 1990. ISBN 0-8248-1068-6.
*Hannas, William C. "Asia's Orthographic Dilemma". Honolulu: University of Hawaii Press, 1997. ISBN 0-8248-1892-X (paperback); ISBN 0-8248-1842-3 (hardcover).
*Lemberg, Werner: The CJK package for LATEX2ε—Multilingual support beyond babel. TUGboat, Volume 18 (1997), No. 3—Proceedings of the 1997 Annual Meeting
*Lunde, Ken. "CJKV Information Processing". Sebastopol, Calif.: O'Reilly & Associates, 1998. ISBN 1-56592-224-7.

External links

* [http://www.linfo.org/cjkv.html CJKV: A Brief Introduction]
* [http://tug.org/TUGboat/Articles/tb18-3/cjkintro600.pdf: Lemberg CJK article from above, TUGboat18-3]


Wikimedia Foundation. 2010.

Игры ⚽ Нужно сделать НИР?

Look at other dictionaries:

  • CJK Unified Ideographs — is a range of Unicode code points assigned for ideographs used by Chinese characters. Since its introduction in Unicode 1.00, the use of CJK ideographs has been extended to multiple blocks.Unicode rangesThese ideographic characters appear in the… …   Wikipedia

  • Stroke (CJK character) — The CJK strokes (also known as the CJK(V) or CJKV strokes) are the strokes needed to write the Chinese characters used in East Asia. The corresponding CJKV characters being the characters that come from Chinese Hanzi, and which are now used in… …   Wikipedia

  • List of CJK fonts — Contents 1 Ming 1.1 Pan Unicode 1.2 Chinese 1.3 Japanese …   Wikipedia

  • Unicode compatibility characters — In discussing Unicode and the UCS, many often refer to compatibility characters. Compatibility characters are graphical characters that are discouraged by the Unicode Consortium. As the [http://www.unicode.org/glossary/#compatibility character… …   Wikipedia

  • List of CJK Unified Ideographs — s. In this system the characters written with the fewest strokes are listed first.The terms Ideographs or ideograms may be misleading, since the Chinese script is not strictly a picture writing system.The block is the result of Han unification [… …   Wikipedia

  • Mapping of Unicode characters — Unicode’s Universal Character Set has a potential capacity to support over 1 million characters. Each UCS character is mapped to a code point which is an integer between 0 and 1,114,111 used to represent each character within the internal logic… …   Wikipedia

  • Universal Character Set Characters — The Unicode Consortium (UC) and the International Organisation for Standardisation (ISO) collaborate on the Universal Character Set. (UCS)] . The UCS is an international standard to map characters used in natural language (as opposed to… …   Wikipedia

  • Mapping of Unicode graphic characters — Main article: Mapping of Unicode characters By far the most common Unicode characters are graphical characters. Graphical characters all have some visual representation or glyphs associated with them. While Unicode does not specify the concrete… …   Wikipedia

  • Chinese characters description languages — The Chinese characters description languages are several proposed languages to most accurately and completely describe Chinese (or CJKV) characters and information such their list of components, list of strokes (basic and complex), their order,… …   Wikipedia

  • List of Unicode characters — This is a list of Unicode characters. Basic Latin Latin 1Latin Extended ALatin Extended BIPA Extensionspacing Modifier LettersCyrillic SupplementArabicThaanaBlock elementsMiscellaneous Mathematical Symbols ACJK Unified IdeographsThe CJK Unified… …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”