Mapping of Unicode character planes

Mapping of Unicode character planes

The Unicode characters can be categorized in many different ways, Unicode code points can be logically divided into 17 "planes", each with 65,536 (= 216) code points, although currently only a few planes are used:
*Plane 0 (0000–FFFF): Basic Multilingual Plane (BMP). This is the plane containing most of the character assignments so far. A primary objective for the BMP is to support the unification of prior character sets as well as characters for writing systems in current use.
*Plane 1 (10000–1FFFF): Supplementary Multilingual Plane (SMP).
*Plane 2 (20000–2FFFF): Supplementary Ideographic Plane (SIP)
*Planes 3 to 13 (30000–DFFFF) are unassigned
*Plane 14 (E0000–EFFFF): Supplementary Special-purpose Plane (SSP)
*Plane 15 (F0000–FFFFF) reserved for the Private Use Area (PUA)
*Plane 16 (100000–10FFFF), reserved for the Private Use Area (PUA)

Currently, about ten percent of the potential space is used. Furthermore, ranges of characters have been tentatively blocked out for every current and ancient writing system (script) the Unicode consortium has been able to identify: (see [] ). While Unicode may eventually need to use another of the spare 11 planes for ideographic characters, other planes remain, if previously unknown scripts with tens of thousands of characters are discovered. This 20 bit limit is therefore unlikely to be reached in the near future.

Basic Multilingual Plane

The first plane (plane 0), the "Basic Multilingual Plane" (BMP), is where most characters have been assigned so far. The BMP contains characters for almost all modern languages, and a large number of special characters. Most of the allocated code points in the BMP are used to encode Chinese, Japanese, and Korean (CJK) characters.

As of Unicode 5.1, The BMP includes the following scripts:

Wikimedia Foundation. 2010.

Игры ⚽ Поможем сделать НИР

Look at other dictionaries:

  • Mapping of Unicode characters — Unicode’s Universal Character Set has a potential capacity to support over 1 million characters. Each UCS character is mapped to a code point which is an integer between 0 and 1,114,111 used to represent each character within the internal logic… …   Wikipedia

  • Unicode character property — Unicode assigns character properties to each code point.[1] These properties can be used to handle characters (code points) in processes, like in line breaking, script direction right to left or applying controls. Slightly inconsequently, some… …   Wikipedia

  • Unicode typefaces — (also known as UCS fonts and Unicode fonts) are typefaces containing a wide range of characters, letters, digits, glyphs, symbols, ideograms, logograms, etc., which are collectively mapped into the standard Universal Character Set, derived from… …   Wikipedia

  • Unicode — For the 1889 Universal Telegraphic Phrase book, see Commercial code (communications). The Unicode official logo since October 2009 …   Wikipedia

  • Character encoding — Special characters redirects here. For the Wikipedia editor s handbook page, see Help:Special characters. A character encoding system consists of a code that pairs each character from a given repertoire with something else, such as a sequence of… …   Wikipedia

  • Unicode font — A Unicode font (also known as UCS font and Unicode typeface) is a computer font that contains a wide range of characters, letters, digits, glyphs, symbols, ideograms, logograms, etc., which are collectively mapped into the standard Universal… …   Wikipedia

  • Plane (Unicode) — Main article: Mapping of Unicode characters In the Unicode system, planes are groups of numerical values that point to specific characters. Unicode code points are logically divided into 17 planes, each with 65,536 (= 216) code points. Planes are …   Wikipedia

  • Universal Character Set — The Universal Character Set (UCS), defined by the ISO/IEC 10646 International Standard, is a standard set of characters upon which many character encodings are based. The UCS contains nearly a hundred thousand abstract characters, each identified …   Wikipedia

  • Universal Character Set Characters — The Unicode Consortium (UC) and the International Organisation for Standardisation (ISO) collaborate on the Universal Character Set. (UCS)] . The UCS is an international standard to map characters used in natural language (as opposed to… …   Wikipedia

  • Code point — Not to be confused with Point code. In character encoding terminology, a code point or code position is any of the numerical values that make up the code space (or code page).[1] For example, ASCII comprises 128 code points in the range 0hex to… …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”