Plane (Unicode)

Main article: Mapping of Unicode characters

In the Unicode system, planes are groups of numerical values that point to specific characters. Unicode code points are logically divided into 17 planes, each with 65,536 (= 2¹⁶) code points. Planes are identified by the numbers 0 to 16_decimal, which corresponds with the possible values 00-10_hexadecimal of the first two positions in six position format (hhhhhh). Six of these planes also have names.

Currently, about ten percent of the potential space is used. Furthermore, ranges of characters have been tentatively mapped out for every current and ancient writing system (script) the Unicode consortium has been able to identify.^[1] While Unicode may eventually need to use another of the spare 11 planes for ideographic characters, other planes remain. Even if previously unknown scripts with tens of thousands of characters are discovered, the limit of 1,114,112 code points is unlikely to be reached in the near future. The Unicode consortium has stated that limit will never be changed^{[citation needed]}.

The odd-looking limit (it is not a power of 2) is due to the design of UTF-16. In UTF-16 a "surrogate pair" of two words is used to encode 2²⁰ code points (16 planes), plus single words are used to encode plane 0. It is not due to UTF-8, which was designed with a limit of 2³¹ code points (32768 planes), and can encode 2²¹ code points (32 planes) even if limited to 4 bytes.

Sometimes, the terms “astral plane” and “astral characters” are used informally to refer to the planes above the Basic Multilingual Plane (planes 1–16) and their characters.^[2]

1 Overview
2 Basic Multilingual Plane
3 Supplementary Multilingual Plane
4 Supplementary Ideographic Plane
5 Tertiary Ideographic Plane
6 Unassigned planes
7 Supplementary Special-purpose Plane
8 Private Use Area planes
9 References

Overview

v · Unicode planes and code point (character) ranges
Basic		Supplementary
0000–FFFF		10000–1FFFF		20000–2FFFF		30000–DFFFF	E0000–EFFFF	F0000–10FFFF
Plane 0: Basic Multilingual Plane		Plane 1: Supplementary Multilingual Plane		Plane 2: Supplementary Ideographic Plane		Planes 3–13: Unassigned	Plane 14: Supplementary Special-purpose Plane	Planes 15–16: Private Use Area
BMP		SMP		SIP		—	SSP	S PUA A/B
0000–0FFF 1000–1FFF 2000–2FFF 3000–3FFF 4000–4FFF 5000–5FFF 6000–6FFF 7000–7FFF	8000–8FFF 9000–9FFF A000–AFFF B000–BFFF C000–CFFF D000–DFFF E000–EFFF F000–FFFF	10000–10FFF 11000–11FFF 12000–12FFF 13000–13FFF 16000–16FFF	1B000–1BFFF 1D000–1DFFF 1F000–1FFFF	20000–20FFF 21000–21FFF 22000–22FFF 23000–23FFF 24000–24FFF 25000–25FFF 26000–26FFF 27000–27FFF	28000–28FFF 29000–29FFF 2A000–2AFFF 2B000–2BFFF 2F000–2FFFF		E0000–E0FFF	15: PUA-A F0000–FFFFF 16: PUA-B 100000–10FFFF

Basic Multilingual Plane

A map of the Basic Multilingual Plane. Each numbered box represents 256 code points.

The first plane (plane 0), the Basic Multilingual Plane (BMP), is where most characters have been assigned so far. The BMP contains characters for almost all modern languages, and a large number of special characters. A primary objective for the BMP is to support the unification of prior character sets as well as characters for writing. Most of the allocated code points in the BMP are used to encode Chinese, Japanese, and Korean (CJK) characters.

The High Surrogates (U+D800..U+DBFF) and Low Surrogate (U+DC00..U+DFFF) codes are reserved for encoding non-BMP characters in UTF-16 by using a pair of 16-bit codes: one High Surrogate and one Low Surrogate. A single surrogate code point will never be assigned a character.

As of Unicode 6.0^[update], the BMP comprises the following blocks:

C0 Controls and Basic Latin (Basic Latin) (0000–007F)
C1 Controls and Latin-1 Supplement (0080–00FF)
Latin Extended-A (0100–017F)
Latin Extended-B (0180–024F)
IPA Extensions (0250–02AF)
Spacing Modifier Letters (02B0–02FF)
Combining Diacritical Marks (0300–036F)
Greek and Coptic (0370–03FF)
Cyrillic (0400–04FF)
Cyrillic Supplement (0500–052F)
Armenian (0530–058F)
Hebrew (0590–05FF)
Arabic (0600–06FF)
Syriac (0700–074F)
Arabic Supplement (0750–077F)
Thaana (0780–07BF)
NKo (07C0–07FF)
Samaritan (0800–083F)
Mandaic (0840–085F)
Indic scripts:

Devanagari (0900–097F)
Bengali (0980–09FF)
Gurmukhi (0A00–0A7F)
Gujarati (0A80–0AFF)
Oriya (0B00–0B7F)
Tamil (0B80–0BFF)
Telugu (0C00–0C7F)
Kannada (0C80–0CFF)
Malayalam (0D00–0D7F)
Sinhala (0D80–0DFF)

Thai (0E00–0E7F)
Lao (0E80–0EFF)
Tibetan (0F00–0FFF)
Myanmar (1000–109F)
Georgian (10A0–10FF)
Hangul Jamo (1100–11FF)
Ethiopic (1200–137F)
Ethiopic Supplement (1380–139F)
Cherokee (13A0–13FF)
Unified Canadian Aboriginal Syllabics (1400–167F)
Ogham (1680–169F)
Runic (16A0–16FF)
Philippine scripts:

Tagalog (1700–171F)
Hanunoo (1720–173F)
Buhid (1740–175F)
Tagbanwa (1760–177F)

Khmer (1780–17FF)
Mongolian (1800–18AF)
Unified Canadian Aboriginal Syllabics Extended (18B0–18FF)
Limbu (1900–194F)
Tai Le (1950–197F)
Tai Lue (1980–19DF)
Khmer Symbols (19E0–19FF)
Buginese (1A00–1A1F)
Tai Tham (1A20–1AAF)

Balinese (1B00–1B7F)
Sundanese (1B80–1BBF)
Batak (1BC0–1BFF)
Lepcha (1C00–1C4F)
Ol Chiki (1C50–1C7F)
Vedic Extensions (1CD0–1CFF)
Phonetic Extensions (1D00–1D7F)
Phonetic Extensions Supplement (1D80–1DBF)
Combining Diacritical Marks Supplement (1DC0–1DFF)
Latin extended additional (1E00–1EFF)
Greek Extended (1F00–1FFF)
Symbols:

General Punctuation (2000–206F)
Superscripts and Subscripts (2070–209F)
Currency Symbols (20A0–20CF)
Combining Diacritical Marks for Symbols (20D0–20FF)
Letterlike Symbols (2100–214F)
Number Forms (2150–218F)
Arrows (2190–21FF)
Mathematical Operators (2200–22FF)
Miscellaneous Technical (2300–23FF)
Control Pictures (2400–243F)
Optical Character Recognition (2440–245F)
Enclosed Alphanumerics (2460–24FF)
Box Drawing (2500–257F)
Block Elements (2580–259F)
Geometric Shapes (25A0–25FF)
Miscellaneous Symbols (2600–26FF)
Dingbats (2700–27BF)
Miscellaneous Mathematical Symbols-A (27C0–27EF)
Supplemental Arrows-A (27F0–27FF)
Braille Patterns (2800–28FF)
Supplemental Arrows-B (2900–297F)
Miscellaneous Mathematical Symbols-B (2980–29FF)
Supplemental Mathematical Operators (2A00–2AFF)
Miscellaneous Symbols and Arrows (2B00–2BFF)

Glagolitic (2C00–2C5F)
Latin Extended-C (2C60–2C7F)
Coptic (2C80–2CFF)
Georgian Supplement (2D00–2D2F)
Tifinagh (2D30–2D7F)
Ethiopic Extended (2D80–2DDF)
Cyrillic Extended-A (2DE0–2DFF)
Supplemental Punctuation (2E00–2E7F)
East Asian scripts and symbols:

CJK Radicals Supplement (2E80–2EFF)
Kangxi Radicals (2F00–2FDF)
Ideographic Description Characters (2FF0–2FFF)
CJK Symbols and Punctuation (3000–303F)
Hiragana (3040–309F)

Katakana (30A0–30FF)
Bopomofo (3100–312F)
Hangul Compatibility Jamo (3130–318F)
Kanbun (3190–319F)
Bopomofo Extended (31A0–31BF)
CJK Strokes (31C0–31EF)
Katakana Phonetic Extensions (31F0–31FF)
Enclosed CJK Letters and Months (3200–32FF)
CJK Compatibility (3300–33FF)
CJK Unified Ideographs Extension A (3400–4DBF)
Yijing Hexagram Symbols (4DC0–4DFF)
CJK Unified Ideographs (4E00–9FFF)

Yi Syllables (A000–A48F)
Yi Radicals (A490–A4CF)
Lisu (A4D0–A4FF)
Vai (A500–A63F)
Cyrillic Extended-B (A640–A69F)
Bamum (A6A0–A6FF)
Modifier Tone Letters (A700–A71F)
Latin Extended-D (A720–A7FF)
Syloti Nagri (A800–A82F)
Common Indic Number Forms (A830–A83F)
Phags-pa (A840–A87F)
Saurashtra (A880–A8DF)
Devanagari Extended (A8E0–A8FF)
Kayah Li (A900–A92F)
Rejang (A930–A95F)
Hangul Jamo Extended-A (A960–A97F)
Javanese (A980–A9DF)
Cham (AA00–AA5F)
Myanmar Extended-A (AA60–AA7F)
Tai Viet (AA80–AADF)
Ethiopic Extended-A (AB00–AB2F)
Meetei Mayek (ABC0–ABFF)
Hangul Syllables (AC00–D7AF)
Hangul Jamo Extended-B (D7B0–D7FF)
Surrogates:

High Surrogates (D800–DB7F)
High Private Use Surrogates (DB80–DBFF)
Low Surrogates (DC00–DFFF)

Private Use Area (E000–F8FF)
CJK Compatibility Ideographs (F900–FAFF)
Alphabetic Presentation Forms (FB00–FB4F)
Arabic Presentation Forms-A (FB50–FDFF)
Variation Selectors (FE00–FE0F)
Vertical Forms (FE10–FE1F)
Combining Half Marks (FE20–FE2F)
CJK Compatibility Forms (FE30–FE4F)
Small Form Variants (FE50–FE6F)
Arabic Presentation Forms-B (FE70–FEFF)
Halfwidth and Fullwidth Forms (FF00–FFEF)
Specials (FFF0–FFFF)

Supplementary Multilingual Plane

Plane 1, the Supplementary Multilingual Plane (SMP), is mostly used for historic scripts such as Linear B, but is also used for musical and mathematical symbols.

As of Unicode 6.0^[update], the SMP comprises the following blocks:

Linear B Syllabary (10000–1007F)
Linear B Ideograms (10080–100FF)
Aegean Numbers (10100–1013F)
Ancient Greek Numbers (10140–1018F)
Ancient Symbols (10190–101CF)
Phaistos Disc (101D0–101FF)
Lycian (10280–1029F)
Carian (102A0–102DF)
Old Italic (10300–1032F)
Gothic (10330–1034F)
Ugaritic (10380–1039F)
Old Persian (103A0–103DF)
Deseret (10400–1044F)
Shavian (10450–1047F)
Osmanya (10480–104AF)
Cypriot Syllabary (10800–1083F)

Imperial Aramaic (10840–1085F)
Phoenician (10900–1091F)
Lydian (10920–1093F)
Kharoshthi (10A00–10A5F)
Old South Arabian (10A60–10A7F)
Avestan (10B00–10B3F)
Inscriptional Parthian (10B40–10B5F)
Inscriptional Pahlavi (10B60–10B7F)
Old Turkic (10C00–10C4F)
Rumi Numeral Symbols (10E60–10E7F)
Brahmi (11000–1107F)
Kaithi (11080–110CF)
Cuneiform (12000–123FF)
Cuneiform Numbers and Punctuation (12400–1247F)
Egyptian Hieroglyphs (13000–1342F)
Bamum Supplement (16800–16A3F)

Kana Supplement (1B000–1B0FF)
Byzantine Musical Symbols (1D000–1D0FF)
Musical Symbols (1D100–1D1FF)
Ancient Greek Musical Notation (1D200–1D24F)
Tai Xuan Jing Symbols (1D300–1D35F)
Counting Rod Numerals (1D360–1D37F)
Mathematical Alphanumeric Symbols (1D400–1D7FF)
Mahjong Tiles (1F000–1F02F)
Domino Tiles (1F030–1F09F)
Playing Cards (1F0A0–1F0FF)
Enclosed Alphanumeric Supplement (1F100–1F1FF)
Enclosed Ideographic Supplement (1F200–1F2FF)
Miscellaneous Symbols And Pictographs (1F300–1F5FF)
Emoticons (1F600–1F64F)
Transport And Map Symbols (1F680–1F6FF)
Alchemical Symbols (1F700–1F77F)

Supplementary Ideographic Plane

Plane 2, the Supplementary Ideographic Plane (SIP), is used for Unified Han (CJK) Ideographs that were mostly not included in earlier character encoding standards.

As of Unicode 6.0^[update], the SIP comprises the following blocks:

CJK Unified Ideographs Extension B (20000–2A6DF)
CJK Unified Ideographs Extension C (2A700–2B73F)
CJK Unified Ideographs Extension D (2B740–2B81F)
CJK Compatibility Ideographs Supplement (2F800–2FA1F)

Tertiary Ideographic Plane

Plane 3, the Tertiary Ideographic Plane (TIP), is reserved for Oracle Bone script, Bronze Script, Small Seal Script, additional CJK unified ideographs, and other historic ideographic scripts.^[3]

As of Unicode 6.0^[update], the TIP does not include any blocks.

Unassigned planes

Unicode has not yet assigned any characters to Planes 4 through 13. It is not anticipated that all these planes will be needed, given the total sizes of the known writing systems left to be encoded. However, the number of possible symbol characters that could arise outside of the context of writing systems is potentially limitless.

Supplementary Special-purpose Plane

Plane 14 (E in hexadecimal), the Supplementary Special-purpose Plane (SSP), currently contains non-graphical characters. The first block is for language tag characters for use when language cannot be indicated through other protocols (such as the xml:lang attribute in XML). The other block contains glyph variation selectors to indicate an alternate glyph for a character that cannot be determined by context.

As of Unicode 6.0^[update], the SSP comprises the following blocks:

Tags (E0000–E007F)
Variation Selectors Supplement (E0100–E01EF)

Private Use Area planes

Two planes (planes 15 and 16) have been set aside for character assignment by parties outside the ISO and the Unicode Consortium. Use of such characters will have limited interoperability. Software and fonts that support Unicode will not necessarily support character assignments by other parties. Especially if the characters have unusual properties such as right-to-left characters, other implementations may treat those characters inappropriately.

References

Unicode

Unicode Consortium · ISO/IEC 10646 (Universal Character Set)

Code points

Code point · Plane · Block · Mapping characters · Character property · Character charts

Characters


Special purpose	BOM · Combining grapheme joiner · Left-to-right mark and Right-to-left mark · Soft hyphen · Zero-width non-breaking space · Zero-width joiner · Zero-width non-joiner · Zero-width space

Miscellaneous lists	Combining character · Duplicate characters · Graphic characters

Processing


Algorithms	Bi-directional text · Collation (ISO 14651) · Equivalence

Transformation	BOCU-1 · CESU-8 · UTF-1 · UTF-7 · UTF-8 · UTF-9/UTF-18 · UTF-16/UCS-2 · UTF-32/UCS-4 · UTF-EBCDIC · Punycode · SCSU · Comparison

On pairs
of code points

Equivalence · Combining character · Duplicates · Homoglyph · Precomposed character (List) · Compatibility characters · Z-variant

Usage

Unicode and e-mail · Unicode and HTML · Character entity references · Unicode input · Internationalized domain name · Numeric character reference · Private Use U+F8FF · Typefaces (fonts) ·

Related standards

Common Locale Data Repository (CLDR) · GB 18030 · Han unification · ISO/IEC 8859 (8-bit encodings) · ISO 14651 (Collation) · ISO 15924 (Script codes)

Look at other dictionaries:

Unicode-Block — Logo von Unicode Unicode [ˈjuːnɪkoʊd] ist ein internationaler Standard, in dem langfristig für jedes sinntragende Schriftzeichen oder Textelement aller bekannten Schriftkulturen und Zeichensysteme ein digitaler Code festgelegt wird. Ziel ist es,… … Deutsch Wikipedia
Unicode-Ebene — Logo von Unicode Unicode [ˈjuːnɪkoʊd] ist ein internationaler Standard, in dem langfristig für jedes sinntragende Schriftzeichen oder Textelement aller bekannten Schriftkulturen und Zeichensysteme ein digitaler Code festgelegt wird. Ziel ist es,… … Deutsch Wikipedia
Unicode-Schriftart — Logo von Unicode Unicode [ˈjuːnɪkoʊd] ist ein internationaler Standard, in dem langfristig für jedes sinntragende Schriftzeichen oder Textelement aller bekannten Schriftkulturen und Zeichensysteme ein digitaler Code festgelegt wird. Ziel ist es,… … Deutsch Wikipedia
Unicode Font — Logo von Unicode Unicode [ˈjuːnɪkoʊd] ist ein internationaler Standard, in dem langfristig für jedes sinntragende Schriftzeichen oder Textelement aller bekannten Schriftkulturen und Zeichensysteme ein digitaler Code festgelegt wird. Ziel ist es,… … Deutsch Wikipedia
Unicode character property — Unicode assigns character properties to each code point.[1] These properties can be used to handle characters (code points) in processes, like in line breaking, script direction right to left or applying controls. Slightly inconsequently, some… … Wikipedia
Unicode typefaces — (also known as UCS fonts and Unicode fonts) are typefaces containing a wide range of characters, letters, digits, glyphs, symbols, ideograms, logograms, etc., which are collectively mapped into the standard Universal Character Set, derived from… … Wikipedia
UNICODE — Юникод, или Уникод (англ. Unicode) стандарт кодирования символов, позволяющий представить знаки практически всех письменных языков. Стандарт предложен в 1991 году некоммерческой организацией «Консорциум Юникода» (англ. Unicode Consortium,… … Википедия
Unicode Consortium — Юникод, или Уникод (англ. Unicode) стандарт кодирования символов, позволяющий представить знаки практически всех письменных языков. Стандарт предложен в 1991 году некоммерческой организацией «Консорциум Юникода» (англ. Unicode Consortium,… … Википедия
Unicode — Юникод, или Уникод (англ. Unicode) стандарт кодирования символов, позволяющий представить знаки практически всех письменных языков. Стандарт предложен в 1991 году некоммерческой организацией «Консорциум Юникода» (англ. Unicode Consortium,… … Википедия
Unicode equivalence — is the specification by the Unicode character encoding standard that some sequences of code points represent essentially the same character. This feature was introduced in the standard to allow compatibility with preexisting standard character… … Wikipedia

Academic Dictionaries and Encyclopedias

Plane (Unicode)

Contents

Overview

Basic Multilingual Plane

Supplementary Multilingual Plane

Supplementary Ideographic Plane

Tertiary Ideographic Plane

Unassigned planes

Supplementary Special-purpose Plane

Private Use Area planes

References

Look at other dictionaries:

Share the article and excerpts

Academic Dictionaries and Encyclopedias

Wikipedia

Plane (Unicode)

Contents

Overview

Basic Multilingual Plane

Supplementary Multilingual Plane

Supplementary Ideographic Plane

Tertiary Ideographic Plane

Unassigned planes

Supplementary Special-purpose Plane

Private Use Area planes

References

Look at other dictionaries:

Share the article and excerpts

Direct link