Code point

Code point: Not to be confused with Point code.

In character encoding terminology, a code point or code position is any of the numerical values that make up the code space (or code page).^[1] For example, ASCII comprises 128 code points in the range 0_hex to 7F_hex, Extended ASCII comprises 256 code points in the range 0_hex to FF_hex, and Unicode comprises 1,114,112 code points in the range 0_hex to 10FFFF_hex. The Unicode code space is divided into seventeen planes (the basic multilingual plane, and 16 supplementary planes), each with 65,536 (= 2¹⁶) code points. Thus the total size of the Unicode code space is 17 × 65,536 = 1,114,112.

Definition

The notion of a code point is used for abstraction, to distinguish both:

the number from an encoding as a sequence of bits, and

the abstract character from a particular graphical representation (glyph).

This is because one may wish to make these distinctions:

encode a particular code space in different ways, or

display a character via different glyphs.

For Unicode, the particular sequence of bits is called a code value – for the UCS-4 encoding, characters/code points are encoded as 4-byte (octet) binary numbers (which is fixed width and simple, but inefficient), while in the UTF-8 encoding, characters are encoded as 1 to 4 byte numbers (which is variable-width, hence more efficient but more complex, and backwards compatible with ASCII).

Code points are normally assigned to abstract characters. An abstract character is not a graphical glyph but a unit of textual data. The precise appearance of the character depends on the font. However code points may also be left reserved for future assignment (most of the Unicode code space is unassigned), or given other designated functions.

Unicode text

A Unicode text file is not necessarily merely a sequence of code points encoded into 4 byte blocks. Instead, an encoding scheme is used to serialize a sequence of code points into a sequence of bytes. A number of such schemes exist, and these trade between space efficiency and ease of encoding. A variable number of bytes can be used for each character. For example, UTF-8 maintains some compatibility with ASCII. Encoding schemes also take into account endianness, and may have the property of being a self-synchronizing code, meaning character boundaries can be found without having to read from the beginning of the string.

Notes

^ Glossary of Unicode Terms

v · d · eUnicode

Unicode
Unicode Consortium · ISO/IEC 10646 (Universal Character Set)

Code points
Code point · Plane · Block · Mapping characters · Character property · Character charts

Characters

Special purpose

BOM · Combining grapheme joiner · Left-to-right mark and Right-to-left mark · Soft hyphen · Zero-width non-breaking space · Zero-width joiner · Zero-width non-joiner · Zero-width space

Miscellaneous lists

Combining character · Duplicate characters · Graphic characters

Processing

Algorithms

Bi-directional text · Collation (ISO 14651) · Equivalence

Transformation

BOCU-1 · CESU-8 · UTF-1 · UTF-7 · UTF-8 · UTF-9/UTF-18 · UTF-16/UCS-2 · UTF-32/UCS-4 · UTF-EBCDIC · Punycode · SCSU · Comparison

On pairs
of code points
Equivalence · Combining character · Duplicates · Homoglyph · Precomposed character (List) · Compatibility characters · Z-variant

Usage
Unicode and e-mail · Unicode and HTML · Character entity references · Unicode input · Internationalized domain name · Numeric character reference · Private Use U+F8FF · Typefaces (fonts) ·

Related standards
Common Locale Data Repository (CLDR) · GB 18030 · Han unification · ISO/IEC 8859 (8-bit encodings) · ISO 14651 (Collation) · ISO 15924 (Script codes)

Related topics
Anomalies · ConScript Unicode Registry · Ideographic Rapporteur Group · International Components for Unicode · MUFI · People related to Unicode

Scripts and symbols in Unicode

Common and
inherited scripts
Combining marks · Diacritics · Punctuation · Space

Modern scripts
Arabic (diacritics · Unicode blocks) · Armenian · Balinese · Batak · Bamum · Bengali · Bopomofo · Braille · Buginese · Buhid · Canadian Aboriginal · Cham · Cherokee · CJK Unified Ideographs (Han) · Cyrillic · Deseret · Devanagari · Ethiopic · Georgian · Greek · Gujarati · Gurmukhi · Kanji · Hanja · Hán tự · Hangul · Hanunoo · Hebrew (diacritics) · Hiragana · Javanese · Kannada · Katakana · Kayah Li · Khmer · Lao · Latin · Lepcha · Limbu · Lisu · Malayalam · Mandaic · Meetei Mayek · Mongolian · Manchu · Myanmar · N'Ko · New Tai Lue · Ol Chiki · Oriya · Osmanya · Rejang · Samaritan · Saurashtra · Shavian · Sinhala · Sundanese · Syloti Nagri · Syriac · Tagalog · Tagbanwa · Tai Le · Tai Tham · Tai Viet · Tamil · Telugu · Thaana · Thai · Tibetan · Tifinagh · Vai · Yi

Ancient and
historic scripts
Avestan · Brāhmī · Carian · Coptic · Sumero-Akkadian · Cypriot · Egyptian Hieroglyphs · Glagolitic · Gothic · Imperial Aramaic · Inscriptional Pahlavi · Inscriptional Parthian · Kaithi · Kharoshthi · Linear B · Lycian · Lydian · Ogham · Old Italic · Old Persian · Phags-pa · Phoenician · Old South Arabian · Old Turkic · Runic · Ugaritic

Symbols
Cultural, political, and religious symbols · Currency · Mathematical operators and symbols · Phonetic symbols (including IPA)

Categories:
Character encoding

Игры ⚽ Поможем решить контрольную работу

Look at other dictionaries:

code point — noun A numerical offset in a character set, etc., as opposed to the character or item it represents. In UTF 8, the number of bytes used to write a character to a file depends on the Unicode code point … Wiktionary
Differentiated Services Code Point — (DSCP) est un champ dans l entête d un paquet IP. Le but de ce champ est de permettre la différentiation de services ou DiffServ. Il est défini dans la RFC 2474. Structure et position Ce champ remplace le champ ToS en IPv4 et utilise le champ… … Wikipédia en Français
DiffServ Code Point — Differentiated Services Code Point (DSCP) is a field in the header of IP packets for packet classification purposes.Some references for DSCP are:*RFC 2474: Definition of the Differentiated Services Field (DS Field) *… … Wikipedia
Differentiated Services Code Point — DiffServ (Differentiated Services, RFC 2474, RFC 2475) ist ein Quality of Service (QoS) Verfahren zur Priorisierung von IP Datenpaketen. Ein herkömmliches IP Netzwerk unterscheidet nicht zwischen verschiedenen Anwendungen, die im Netzwerk… … Deutsch Wikipedia
DiffServ Code Point — Differentiated Services Code Point (DSCP, Точка кода дифференцированных услуг) это поле в заголовке IP пакета, которое используется в целях классификации передаваемой информации. Некоторые ссылки по DSCP: RFC 2474: Definition of the… … Википедия
Code page — is another term for character encoding. It consists of a table of values that describes the character set for a particular language. The term code page originated from IBM s EBCDIC based mainframe systems,[1] but many vendors use this term… … Wikipedia
Code page 850 — character set with 9×16 glyphs, as it usually rendered by VGA Code page 850 (also known as CP 850, IBM 00850,[1] OEM 850,[2] MS DOS Latin 1[3]) is a … Wikipedia
Code page 857 — (also known as CP 857, IBM 00857,[1] OEM 857,[2] MS DOS Turkish[3]) is a code page used under MS DOS to write Turkish. Code page 857 is based on code page 850, but with many changes. It includes all characters from ISO 8859 9. Code page layout… … Wikipedia
Code page 437 — Code page 437, as rendered by the IBM PC using a VGA adapter. IBM PC or MS DOS code page 437, often abbreviated CP437 and also known as DOS US, OEM US or sometimes misleadingly referred to as the OEM font, High ASCII or Extended ASCII,[1][2] is… … Wikipedia
Code page 852 — (also known as CP 852, IBM 00852,[1] OEM 852 (Latin II),[2][3] MS DOS Latin 2[4]) is a code page used under MS DOS to write Central European languages that use Latin script (such as Bosnian, Croatian, Czech, Hungarian … Wikipedia

Academic Dictionaries and Encyclopedias

Code point

Definition

Unicode text

Notes

Look at other dictionaries:

Share the article and excerpts

Academic Dictionaries and Encyclopedias

Wikipedia

Code point

Definition

Unicode text

Notes

Look at other dictionaries:

Share the article and excerpts

Direct link