- Plane (Unicode)
-
Main article: Mapping of Unicode characters
In the Unicode system, planes are groups of numerical values that point to specific characters. Unicode code points are logically divided into 17 planes, each with 65,536 (= 216) code points. Planes are identified by the numbers 0 to 16decimal, which corresponds with the possible values 00-10hexadecimal of the first two positions in six position format (hhhhhh). Six of these planes also have names.
Currently, about ten percent of the potential space is used. Furthermore, ranges of characters have been tentatively mapped out for every current and ancient writing system (script) the Unicode consortium has been able to identify.[1] While Unicode may eventually need to use another of the spare 11 planes for ideographic characters, other planes remain. Even if previously unknown scripts with tens of thousands of characters are discovered, the limit of 1,114,112 code points is unlikely to be reached in the near future. The Unicode consortium has stated that limit will never be changed[citation needed].
The odd-looking limit (it is not a power of 2) is due to the design of UTF-16. In UTF-16 a "surrogate pair" of two words is used to encode 220 code points (16 planes), plus single words are used to encode plane 0. It is not due to UTF-8, which was designed with a limit of 231 code points (32768 planes), and can encode 221 code points (32 planes) even if limited to 4 bytes.
Sometimes, the terms “astral plane” and “astral characters” are used informally to refer to the planes above the Basic Multilingual Plane (planes 1–16) and their characters.[2]
Contents
Overview
Basic Supplementary 0000–FFFF 10000–1FFFF 20000–2FFFF 30000–DFFFF E0000–EFFFF F0000–10FFFF Plane 0:
Basic Multilingual PlanePlane 1:
Supplementary Multilingual PlanePlane 2:
Supplementary Ideographic PlanePlanes 3–13:
UnassignedPlane 14:
Supplementary Special-purpose PlanePlanes 15–16:
Private Use AreaBMP SMP SIP — SSP S PUA A/B 0000–0FFF
1000–1FFF
2000–2FFF
3000–3FFF
4000–4FFF
5000–5FFF
6000–6FFF
7000–7FFF8000–8FFF
9000–9FFF
A000–AFFF
B000–BFFF
C000–CFFF
D000–DFFF
E000–EFFF
F000–FFFF10000–10FFF
11000–11FFF
12000–12FFF
13000–13FFF
16000–16FFF
1B000–1BFFF
1D000–1DFFF
1F000–1FFFF20000–20FFF
21000–21FFF
22000–22FFF
23000–23FFF
24000–24FFF
25000–25FFF
26000–26FFF
27000–27FFF28000–28FFF
29000–29FFF
2A000–2AFFF
2B000–2BFFF
2F000–2FFFFE0000–E0FFF 15: PUA-A
F0000–FFFFF
16: PUA-B
100000–10FFFFBasic Multilingual Plane
The first plane (plane 0), the Basic Multilingual Plane (BMP), is where most characters have been assigned so far. The BMP contains characters for almost all modern languages, and a large number of special characters. A primary objective for the BMP is to support the unification of prior character sets as well as characters for writing. Most of the allocated code points in the BMP are used to encode Chinese, Japanese, and Korean (CJK) characters.
The High Surrogates (U+D800..U+DBFF) and Low Surrogate (U+DC00..U+DFFF) codes are reserved for encoding non-BMP characters in UTF-16 by using a pair of 16-bit codes: one High Surrogate and one Low Surrogate. A single surrogate code point will never be assigned a character.
As of Unicode 6.0[update], the BMP comprises the following blocks:- C0 Controls and Basic Latin (Basic Latin) (0000–007F)
- C1 Controls and Latin-1 Supplement (0080–00FF)
- Latin Extended-A (0100–017F)
- Latin Extended-B (0180–024F)
- IPA Extensions (0250–02AF)
- Spacing Modifier Letters (02B0–02FF)
- Combining Diacritical Marks (0300–036F)
- Greek and Coptic (0370–03FF)
- Cyrillic (0400–04FF)
- Cyrillic Supplement (0500–052F)
- Armenian (0530–058F)
- Hebrew (0590–05FF)
- Arabic (0600–06FF)
- Syriac (0700–074F)
- Arabic Supplement (0750–077F)
- Thaana (0780–07BF)
- NKo (07C0–07FF)
- Samaritan (0800–083F)
- Mandaic (0840–085F)
- Indic scripts:
- Balinese (1B00–1B7F)
- Sundanese (1B80–1BBF)
- Batak (1BC0–1BFF)
- Lepcha (1C00–1C4F)
- Ol Chiki (1C50–1C7F)
- Vedic Extensions (1CD0–1CFF)
- Phonetic Extensions (1D00–1D7F)
- Phonetic Extensions Supplement (1D80–1DBF)
- Combining Diacritical Marks Supplement (1DC0–1DFF)
- Latin extended additional (1E00–1EFF)
- Greek Extended (1F00–1FFF)
- Symbols:
-
- General Punctuation (2000–206F)
- Superscripts and Subscripts (2070–209F)
- Currency Symbols (20A0–20CF)
- Combining Diacritical Marks for Symbols (20D0–20FF)
- Letterlike Symbols (2100–214F)
- Number Forms (2150–218F)
- Arrows (2190–21FF)
- Mathematical Operators (2200–22FF)
- Miscellaneous Technical (2300–23FF)
- Control Pictures (2400–243F)
- Optical Character Recognition (2440–245F)
- Enclosed Alphanumerics (2460–24FF)
- Box Drawing (2500–257F)
- Block Elements (2580–259F)
- Geometric Shapes (25A0–25FF)
- Miscellaneous Symbols (2600–26FF)
- Dingbats (2700–27BF)
- Miscellaneous Mathematical Symbols-A (27C0–27EF)
- Supplemental Arrows-A (27F0–27FF)
- Braille Patterns (2800–28FF)
- Supplemental Arrows-B (2900–297F)
- Miscellaneous Mathematical Symbols-B (2980–29FF)
- Supplemental Mathematical Operators (2A00–2AFF)
- Miscellaneous Symbols and Arrows (2B00–2BFF)
- Glagolitic (2C00–2C5F)
- Latin Extended-C (2C60–2C7F)
- Coptic (2C80–2CFF)
- Georgian Supplement (2D00–2D2F)
- Tifinagh (2D30–2D7F)
- Ethiopic Extended (2D80–2DDF)
- Cyrillic Extended-A (2DE0–2DFF)
- Supplemental Punctuation (2E00–2E7F)
- East Asian scripts and symbols:
-
- Katakana (30A0–30FF)
- Bopomofo (3100–312F)
- Hangul Compatibility Jamo (3130–318F)
- Kanbun (3190–319F)
- Bopomofo Extended (31A0–31BF)
- CJK Strokes (31C0–31EF)
- Katakana Phonetic Extensions (31F0–31FF)
- Enclosed CJK Letters and Months (3200–32FF)
- CJK Compatibility (3300–33FF)
- CJK Unified Ideographs Extension A (3400–4DBF)
- Yijing Hexagram Symbols (4DC0–4DFF)
- CJK Unified Ideographs (4E00–9FFF)
- Yi Syllables (A000–A48F)
- Yi Radicals (A490–A4CF)
- Lisu (A4D0–A4FF)
- Vai (A500–A63F)
- Cyrillic Extended-B (A640–A69F)
- Bamum (A6A0–A6FF)
- Modifier Tone Letters (A700–A71F)
- Latin Extended-D (A720–A7FF)
- Syloti Nagri (A800–A82F)
- Common Indic Number Forms (A830–A83F)
- Phags-pa (A840–A87F)
- Saurashtra (A880–A8DF)
- Devanagari Extended (A8E0–A8FF)
- Kayah Li (A900–A92F)
- Rejang (A930–A95F)
- Hangul Jamo Extended-A (A960–A97F)
- Javanese (A980–A9DF)
- Cham (AA00–AA5F)
- Myanmar Extended-A (AA60–AA7F)
- Tai Viet (AA80–AADF)
- Ethiopic Extended-A (AB00–AB2F)
- Meetei Mayek (ABC0–ABFF)
- Hangul Syllables (AC00–D7AF)
- Hangul Jamo Extended-B (D7B0–D7FF)
- Surrogates:
-
- High Surrogates (D800–DB7F)
- High Private Use Surrogates (DB80–DBFF)
- Low Surrogates (DC00–DFFF)
- Private Use Area (E000–F8FF)
- CJK Compatibility Ideographs (F900–FAFF)
- Alphabetic Presentation Forms (FB00–FB4F)
- Arabic Presentation Forms-A (FB50–FDFF)
- Variation Selectors (FE00–FE0F)
- Vertical Forms (FE10–FE1F)
- Combining Half Marks (FE20–FE2F)
- CJK Compatibility Forms (FE30–FE4F)
- Small Form Variants (FE50–FE6F)
- Arabic Presentation Forms-B (FE70–FEFF)
- Halfwidth and Fullwidth Forms (FF00–FFEF)
- Specials (FFF0–FFFF)
Supplementary Multilingual Plane
Plane 1, the Supplementary Multilingual Plane (SMP), is mostly used for historic scripts such as Linear B, but is also used for musical and mathematical symbols.
As of Unicode 6.0[update], the SMP comprises the following blocks:
- Linear B Syllabary (10000–1007F)
- Linear B Ideograms (10080–100FF)
- Aegean Numbers (10100–1013F)
- Ancient Greek Numbers (10140–1018F)
- Ancient Symbols (10190–101CF)
- Phaistos Disc (101D0–101FF)
- Lycian (10280–1029F)
- Carian (102A0–102DF)
- Old Italic (10300–1032F)
- Gothic (10330–1034F)
- Ugaritic (10380–1039F)
- Old Persian (103A0–103DF)
- Deseret (10400–1044F)
- Shavian (10450–1047F)
- Osmanya (10480–104AF)
- Cypriot Syllabary (10800–1083F)
- Imperial Aramaic (10840–1085F)
- Phoenician (10900–1091F)
- Lydian (10920–1093F)
- Kharoshthi (10A00–10A5F)
- Old South Arabian (10A60–10A7F)
- Avestan (10B00–10B3F)
- Inscriptional Parthian (10B40–10B5F)
- Inscriptional Pahlavi (10B60–10B7F)
- Old Turkic (10C00–10C4F)
- Rumi Numeral Symbols (10E60–10E7F)
- Brahmi (11000–1107F)
- Kaithi (11080–110CF)
- Cuneiform (12000–123FF)
- Cuneiform Numbers and Punctuation (12400–1247F)
- Egyptian Hieroglyphs (13000–1342F)
- Bamum Supplement (16800–16A3F)
- Kana Supplement (1B000–1B0FF)
- Byzantine Musical Symbols (1D000–1D0FF)
- Musical Symbols (1D100–1D1FF)
- Ancient Greek Musical Notation (1D200–1D24F)
- Tai Xuan Jing Symbols (1D300–1D35F)
- Counting Rod Numerals (1D360–1D37F)
- Mathematical Alphanumeric Symbols (1D400–1D7FF)
- Mahjong Tiles (1F000–1F02F)
- Domino Tiles (1F030–1F09F)
- Playing Cards (1F0A0–1F0FF)
- Enclosed Alphanumeric Supplement (1F100–1F1FF)
- Enclosed Ideographic Supplement (1F200–1F2FF)
- Miscellaneous Symbols And Pictographs (1F300–1F5FF)
- Emoticons (1F600–1F64F)
- Transport And Map Symbols (1F680–1F6FF)
- Alchemical Symbols (1F700–1F77F)
Supplementary Ideographic Plane
Plane 2, the Supplementary Ideographic Plane (SIP), is used for Unified Han (CJK) Ideographs that were mostly not included in earlier character encoding standards.
As of Unicode 6.0[update], the SIP comprises the following blocks:
- CJK Unified Ideographs Extension B (20000–2A6DF)
- CJK Unified Ideographs Extension C (2A700–2B73F)
- CJK Unified Ideographs Extension D (2B740–2B81F)
- CJK Compatibility Ideographs Supplement (2F800–2FA1F)
Tertiary Ideographic Plane
Plane 3, the Tertiary Ideographic Plane (TIP), is reserved for Oracle Bone script, Bronze Script, Small Seal Script, additional CJK unified ideographs, and other historic ideographic scripts.[3]
As of Unicode 6.0[update], the TIP does not include any blocks.
Unassigned planes
Unicode has not yet assigned any characters to Planes 4 through 13. It is not anticipated that all these planes will be needed, given the total sizes of the known writing systems left to be encoded. However, the number of possible symbol characters that could arise outside of the context of writing systems is potentially limitless.
Supplementary Special-purpose Plane
Plane 14 (E in hexadecimal), the Supplementary Special-purpose Plane (SSP), currently contains non-graphical characters. The first block is for language tag characters for use when language cannot be indicated through other protocols (such as the
xml:lang
attribute in XML). The other block contains glyph variation selectors to indicate an alternate glyph for a character that cannot be determined by context.As of Unicode 6.0[update], the SSP comprises the following blocks:
- Tags (E0000–E007F)
- Variation Selectors Supplement (E0100–E01EF)
Private Use Area planes
Two planes (planes 15 and 16) have been set aside for character assignment by parties outside the ISO and the Unicode Consortium. Use of such characters will have limited interoperability. Software and fonts that support Unicode will not necessarily support character assignments by other parties. Especially if the characters have unusual properties such as right-to-left characters, other implementations may treat those characters inappropriately.
References
Categories:
Wikimedia Foundation. 2010.