- KOI8-U
KOI8-U is an 8-bit
character encoding , designed to cover Ukrainian, which uses theCyrillic alphabet. It is based onKOI8-R , which covers Russian and Bulgarian, but replaces eight graphic characters with four Ukrainian letters Ґ, Є, І, and Ї in both upper case and lower case.KOI8 remains much more commonly used than
ISO 8859-5 , which never really caught on. Another common Cyrillic character encoding isWindows-1251 . In the future, both may eventually give way toUnicode .In Russian, KOI8 stands for "Kod Obmena Informatsiey, 8 bit" (Код Обмена Информацией, 8 бит) which means "Code for Information Exchange, 8 bit".
The KOI8 character sets have the property that the Russian Cyrillic letters are in pseudo-Roman order rather than the natural Cyrillic alphabetical order as in ISO 8859-5. Although this may seem unnatural, it has the useful property that if the 8th bit is stripped, the text can still be read (or at least deciphered) in case-reversed transliteration on an ordinary ASCII terminal. For instance, "Русский Текст" in KOI8-U becomes "rUSSKIJ tEKST" ("Russian Text") if the 8th bit is stripped.
Codepage layout
KOI8-U x0 x1 x2 x3 x4 x5 x6 x7 x8 x9 xA xB xC xD xE xF 0x "unused" 1x 2x SP ! " # $ % & ' ( ) * + , - . / 3x num|0 num|1 num|2 num|3 num|4 num|5 num|6 num|7 num|8 num|9 : ; < = > ? 4x @ A B C D E F G H I J K L M N O 5x P Q R S T U V W X Y Z ^ _ 6x ` a b c d e f g h i j k l m n o 7x p q r s t u v w x y z { | } ~ 8x ─ │ ┌ ┐ └ ┘ ├ ┤ ┬ ┴ ┼ ▀ ▄ █ ▌ ▐ 9x ░ ▒ ▓ ⌠ ■ ∙ √ ≈ ≤ ≥ NBSP ⌡ ° ² · ÷ Ax ═ ║ ╒ ё є ╔ і ї ╗ ╘ ╙ ╚ ╛ ґ ╝ ╞ Bx ╟ ╠ ╡ Ё Є ╣ І Ї ╦ ╧ ╨ ╩ ╪ Ґ ╬ © Cx ю а б ц д е ф г х и й к л м н о Dx п я р с т у ж в ь ы з ш э щ ч ъ Ex Ю А Б Ц Д Е Ф Г Х И Й К Л М Н О Fx П Я Р С Т У Ж В Ь Ы З Ш Э Щ Ч Ъ In the table above, 20 is the regular SPACE character, and 9A is the NO-BREAK SPACE.
The difference with
KOI8-R consists of the positions 0xA4; 0xA6; 0xA7; 0xAD; and 0xB4; 0xB6; 0xB7; 0xBD; which consist of extra letters that don't exist in Russian.Although RFC 2319 says that character 95 should be U+2219 (∙), it may also be U+2022 (•) to match the bullet character in
Windows-1251 .Some references have a typo and incorrectly state that character B4 is U+0403, rather than the correct U+0404. This typo is present in Appendix A of RFC 2319 (but the table in the main text of the RFC gives the correct mapping).
See also
*
Ukrainian alphabet External links
* RFC 2319
Wikimedia Foundation. 2010.