KOI8-U

KOI8-U

KOI8-U is an 8-bit character encoding, designed to cover Ukrainian, which uses the Cyrillic alphabet. It is based on KOI8-R, which covers Russian and Bulgarian, but replaces eight graphic characters with four Ukrainian letters Ґ, Є, І, and Ї in both upper case and lower case.

KOI8 remains much more commonly used than ISO 8859-5, which never really caught on. Another common Cyrillic character encoding is Windows-1251. In the future, both may eventually give way to Unicode.

In Russian, KOI8 stands for "Kod Obmena Informatsiey, 8 bit" (Код Обмена Информацией, 8 бит) which means "Code for Information Exchange, 8 bit".

The KOI8 character sets have the property that the Russian Cyrillic letters are in pseudo-Roman order rather than the natural Cyrillic alphabetical order as in ISO 8859-5. Although this may seem unnatural, it has the useful property that if the 8th bit is stripped, the text can still be read (or at least deciphered) in case-reversed transliteration on an ordinary ASCII terminal. For instance, "Русский Текст" in KOI8-U becomes "rUSSKIJ tEKST" ("Russian Text") if the 8th bit is stripped.

Codepage layout

KOI8-U
x0x1x2x3x4x5x6x7x8x9xAxBxCxDxExF
0x"unused"
1x
2xSP!"#$%&'()*+,-./
3xnum|0num|1num|2num|3num|4num|5num|6num|7num|8num|9:;<=>?
4x@ABCDEFGHIJKLMNO
5xPQRSTUVWXYZ^_
6x`abcdefghijklmno
7xpqrstuvwxyz{|}~
8x
9xNBSP°²·÷
Axёєіїґ
BxЁЄІЇҐ©
Cxюабцдефгхийклмно
Dxпярстужвьызшэщчъ
ExЮАБЦДЕФГХИЙКЛМНО
FxПЯРСТУЖВЬЫЗШЭЩЧЪ

In the table above, 20 is the regular SPACE character, and 9A is the NO-BREAK SPACE.

The difference with KOI8-R consists of the positions 0xA4; 0xA6; 0xA7; 0xAD; and 0xB4; 0xB6; 0xB7; 0xBD; which consist of extra letters that don't exist in Russian.

Although RFC 2319 says that character 95 should be U+2219 (∙), it may also be U+2022 (•) to match the bullet character in Windows-1251.

Some references have a typo and incorrectly state that character B4 is U+0403, rather than the correct U+0404. This typo is present in Appendix A of RFC 2319 (but the table in the main text of the RFC gives the correct mapping).

See also

* Ukrainian alphabet

External links

* RFC 2319


Wikimedia Foundation. 2010.

Игры ⚽ Нужен реферат?

Look at other dictionaries:

  • KOI8-R — is an 8 bit character encoding, designed to cover Russian, which uses the Cyrillic alphabet. It also happens to cover Bulgarian. A derivative encoding is KOI8 U, which adds Ukrainian characters. The original KOI 8 encoding was designed by Soviet… …   Wikipedia

  • KOI8-U — (Код Обмена Информацией, 8 бит KOI8) est un encodage 8 bits créé pour l ukrainien, qui utilise l alphabet cyrillique. KOI8 U est basé sur KOI8 R, qui couvre le russe et le bulgare, ajoutant ou remplaçant 8 caractères : Ґ, Є, І et Ї, en… …   Wikipédia en Français

  • Koi8-u — (Код Обмена Информацией, 8 бит KOI8) est un encodage 8 bits créé pour l ukrainien, qui utilise l alphabet cyrillique. KOI8 U est basé sur KOI8 R, qui couvre le russe et le bulgare, ajoutant ou remplacant 8 caractères : Ґ, Є, І et Ї, en… …   Wikipédia en Français

  • KOI8-R — ist eine 8 Bit Zeichenkodierung des kyrillischen Alphabets, wie es für die russische Sprache benutzt wird. KOI8 R ist eine Übermenge von ASCII und enthält somit auch die 26 Buchstaben des lateinischen Alphabets. Die Kodierung kann auch für… …   Deutsch Wikipedia

  • KOI8-R — es una codificación de caracteres de 8 bits, diseñado para el idioma ruso,para el uso del alfabeto cirílico. También sirve para el idioma búlgaro. Una derivación de esta codificación es el KOI8 U, el cual agrega caracteres para el idioma… …   Wikipedia Español

  • KOI8-U — ist ein Zeichensatz, der für die Zeichenkodierung des kyrillischen Alphabetes für die ukrainische Sprache in Computersystemen benutzt wird und verwendet nur ein einzelnes Byte zur Kodierung. KOI8 U ist eine Übermenge von ASCII und enthält somit… …   Deutsch Wikipedia

  • KOI8-R — (Код Обмена Информацией, 8 бит) est une page de code, conçue pour représenter les lettres cyrilliques (russe, par exemple). La RFC adéquate est RFC 1489. Il y a aussi une description GOST 19768 74. Le codage KOI8 R est vu comme le standard de… …   Wikipédia en Français

  • Koi8-r — (Код Обмена Информацией, 8 бит) est une page de code, conçue pour représenter les lettres cyrilliques (russe, par exemple). La RFC adéquate est RFC 1489. Il y a aussi une description GOST 19768 74. Le codage KOI8 R est vu comme le standard de… …   Wikipédia en Français

  • KOI8 — …   Википедия

  • KOI8-R — …   Википедия

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”