GB 2312

GB 2312

GB2312 is the registered internet name for a key official character set of the People's Republic of China, used for simplified Chinese characters. GB abbreviates Guojia Biaozhun (国家标准), which means "national standard" in Chinese.

GB2312 (1980) has been superseded by GBK and GB18030, which include additional characters, but GB2312 is nonetheless still in widespread use.

While GB2312 covers 99.75% of the characters used for Chinese input, historical texts and many names remain out of scope. GB2312 includes 6,763 Chinese characters (on two levels: the first is arranged by reading, the second by radical then number of strokes), along with symbols and punctuation, Japanese kana, the Greek and Cyrillic alphabets, Zhuyin, and a double-byte set of Pinyin letters with tone marks.

There is an analog character set closely related to GB2312, with traditional character forms replacing simplified forms, known as GB/T 12345. GB-encoded fonts often come in pairs, one with the GB 2312 (jianti) character set and the other with the GB/T 12345 (fanti) character set.

Characters

Characters in GB2312 are arranged in a 94x94 grid (as in ISO_2022), and the two-byte codepoint of each character is expressed in the kuten (or quwei) form, which specifies a row (ku or qu) and the position of the character within the row (ten or wei).

The rows (numbered from 1 to 94) contain characters as follows:
* 01-09, comprising punctuation and other special characters.
* 16-55, the first plane for Chinese characters, arranged according to Pinyin.
* 56-87, the second plane for Chinese characters, arranged according to radical and strokes.

The rows 10-15 and 88-94 are unassigned.

Encodings of GB2312

EUC-CN

EUC-CN is often used as the character encoding (i.e. for external storage) in programs that deal with GB2312, thus maintainingcompatibility with ASCII. Two bytes are used to represent every character not found in ASCII. The value of the firstbyte is from 0xA1-0xF7 (161-247), while the value of the second byte is from 0xA1-0xFE (161-254). Hence, like UTF-8, it is possibleto check if a byte is part of a two-byte construct when using EUC-CN.

Compared to UTF-8, GB2312 (whether native or encoded in EUC-CN) is also more storage efficient, since Chinese characters are limited to a maximum of two bytes each, while UTF-8 uses at least three bytes.

To map the code points to bytes, add 160 (0xA0) to the 1000's and 100's value of the code point to form the high byte, and add 160 (0xA0) to the 10's and 1's value of the code point to form the low byte.

For example, if you have the GB2312 code point 4566 ("foreign,"), the high byte will come from 45 (4500), and the low byte will come from 66 (0066). For the high byte, add 45 to 160, giving 205 or 0xCD. For the low byte do the same, add 66 to 160, giving 226 or 0xE2. So, the full encoding is 0xCDE2.

HZ

HZ is another encoding of GB2312 that is used mostly for Usenet postings.

See also

*Guobiao code
*CJK
*Chinese character encoding
*Unicode
*GB18030
*GBK
*Big5 - standard used in Taiwan and Hong Kong

External links

* [http://demo.icu-project.org/icu-bin/convexp?conv=gb2312 Graphical View of GB2312 in ICU's Converter Explorer]
* [http://developers.sun.com/dev/gadc/technicalpublications/articles/gb18030.html Evolution of GBK and GB2312 into GB18030]


Wikimedia Foundation. 2010.

Игры ⚽ Нужно сделать НИР?

Look at other dictionaries:

  • 2312 год до н. э. — 2312 год до н. э. Годы 2316 до н. э. · 2315 до н. э. · 2314 до н. э. · 2313 до н. э. 2312 до н. э. 2311 до н. э. · 2310 до н. э. · 2309 до н. э. · 2308… …   Википедия

  • 2312 Duboshin — Duboshin Discovery and designation Discovered by Chernykh, N. Discovery site Nauchnyj Discovery date April 1, 1976 Designations MPC designation …   Wikipedia

  • (2312) Duboshin — Asteroid (2312) Duboshin Eigenschaften des Orbits (Animation) Orbittyp Hauptgürtel Große Halbachse 3,9730 AE …   Deutsch Wikipedia

  • 2312 v. Chr. — Portal Geschichte | Portal Biografien | Aktuelle Ereignisse | Jahreskalender ◄ | 4. Jt. v. Chr. | 3. Jahrtausend v. Chr. | 2. Jt. v. Chr. | ► ◄ | 26. Jh. v. Chr. | 25. Jh. v. Chr. | 24. Jahrhundert v. Chr. | 23. Jh. v. Chr. | 22. Jh. v. Chr …   Deutsch Wikipedia

  • 2312 до н. э. — …   Википедия

  • 2312 г до н.э. — Вмешательство в войну между Лугальзагеси и Лагашем Аккада. Разгром и пленение Лугалбзагеси. В клетке для собак Лугальзагеси отправлен в Ниппур …   Хронология всемирной истории: словарь

  • 2312 — матем. • Запись римскими цифрами: MMCCCXII …   Словарь обозначений

  • NGC 2312 — Звезда История исследования Открыватель Джон Гершель Дата открытия 18 января 1828 Наблюдательные данные (Эпоха J2 …   Википедия

  • Богдан-2312 — Общие данные …   Википедия

  • ДСТУ 2312-93 — Регенерація травильних розчинів. Здобування міді. Технічні вимоги та порядок проведення робіт [br] НД чинний: від 1995 01 01 Зміни: Технічний комітет: Мова: +Ru Метод прийняття: Кількість сторінок: 17+16 Код НД згідно з ДК 004: 25.220.01 …   Покажчик національних стандартів

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”