CESU-8

CESU-8

"Compatibility Encoding Scheme for UTF-16: 8-Bit" (CESU-8) is a variant of UTF-8 that is described in Unicode Technical Report #26 [http://www.unicode.org/reports/tr26/] . A Unicode code point from the Basic Multilingual Plane (BMP), i.e. a code point in the range U+0000 to U+FFFF, is encoded in the same way as in UTF-8. A Unicode supplementary character, i.e. a code point in the range U+10000 to U+10FFFF, is first represented as a surrogate pair, like in UTF-16, and then each surrogate code point is encoded in UTF-8. Therefore, CESU-8 needs six bytes (3 bytes per surrogate) for each Unicode supplementary character while UTF-8 needs only four. Each CESU-8 character code (1, 2, or 3 bytes) can be converted to exactly one UTF-16 code (2 bytes).

CESU-8 is not an official part of the Unicode Standard, because Unicode Technical Reports are informative documents only. It should be used exclusively for internal processing and never for external data exchange.

CESU-8 is similar to Java's Modified UTF-8 but does not have the special encoding of the NUL character (U+0000).

The CESU-8 encoding form is used in the Oracle database software. Oracle's UTF8 character set (unfortunately, a misnomer), available since version 8.0 of the database, is actually CESU-8. The character set AL32UTF8, introduced in version 9.0, is UTF-8 compliant.

Examples

External links

* [http://www.unicode.org/reports/tr26/ Unicode Technical Report #26]
* [http://java.sun.com/j2se/1.5.0/docs/guide/jni/spec/types.html Modified UTF-8 overview]
* [http://demo.icu-project.org/icu-bin/convexp?conv=CESU-8 Graphical View of CESU-8 in ICU's Converter Explorer]


Wikimedia Foundation. 2010.

Игры ⚽ Нужно сделать НИР?

Look at other dictionaries:

  • CESU-8 — (kurz für Compatibility Encoding Scheme for UTF 16: 8 Bit) ist eine Variante von UTF 8, die im Unicode Technical Report #26 beschrieben wird. Der Codepoint wird zunächst in UTF 16 ausgedrückt, dann wird das Ergebnis in UTF 8 rekodiert, als wäre… …   Deutsch Wikipedia

  • čėsu — ×čėsù adv. K; R411 reikiamu momentu, nustatytu terminu: Čėsu dar atvažiavo brolis J. Ne čėsu gimęs vaikas N. Taigi dabokitės ir mašnas čėsù prisikraukit K.Donel. ^ Geriau čėsu skatikas, negu po čėso rublis KrvP(Mrs) …   Dictionary of the Lithuanian Language

  • CESU — Chèque emploi service universel « CESU » redirige ici. Pour l article homonyme, voir Centre d enseignement des soins d urgence …   Wikipédia en Français

  • Cesu — Chèque emploi service universel « CESU » redirige ici. Pour l article homonyme, voir Centre d enseignement des soins d urgence …   Wikipédia en Français

  • CESU-8 — Le CESU 8 (Compatibility Encoding Scheme for UTF 16: 8 Bit) est un codage de caractères variante d UTF 8 décrit dans le document Unicode Technical Report #26[1] publié par le consortium Unicode. C est un encodage d Unicode sur 8 bits non… …   Wikipédia en Français

  • Cēsu Alus — Industry Light alcoholic and non alcoholic beverages Founded 1995 (roots 1590) Headquarters Aldaru laukums 1, Cēsis, Latvia Key people Eva Sietiņsone Zatlere Products Beer, cider …   Wikipedia

  • Cēsu Namiņš — (Цесис,Латвия) Категория отеля: Адрес: Lielā Skolas iela 7, Цесис, LV 4101, Латвия …   Каталог отелей

  • Cesu Rajons — Cesis Pays Lettonie Population * 59914 hab …   Wikipédia en Français

  • Cēsu rajons — Cesu rajons Cesis Pays Lettonie Population * 59914 hab …   Wikipédia en Français

  • Cēsu alus — AS «Cēsu alus» …   Википедия

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”