Mapping of Unicode graphic characters

Main article: Mapping of Unicode characters

By far the most common Unicode characters are graphical characters. Graphical characters all have some visual representation or glyphs associated with them. While Unicode does not specify the concrete glyphs for these characters, it does specify recommended or prototypical glyphs. The actual glyph used by textual display software will depend on the font files used and whether those fonts provide support for contextual and non-contextual glyph variations

1 Unihan characters
2 Phonetic characters
3 Numerals
4 Punctuation and diacritics
5 Symbols
6 Music notation

Unihan characters

Main article: Unihan

Han unification is the process used by the authors of Unicode and the Universal Character Set to map multiple character sets of the CJK languages into a single set of unified characters. The Chinese characters are common to Chinese (where they are called hanzi), Japanese (where they are called kanji), and Korean (where they are called hanja). Modern Korean, Chinese and Japanese typefaces may represent a given Han character as somewhat different glyphs. However, in the formulation of Unicode, these different glyphs were treated as the same character. This unification is referred to as "Han unification", with the resulting character repertoire sometimes referred to as Unihan.

Besides the Unihan ideographs, Han unification also provides Han unified punctuation, symbols, numerals, ideograph stroke characters and ideographic description characters.

Phonetic characters

Main article: Unicode Phonetic Symbols

Unicode includes letters and marks from the International Phonetic Alphabet (IPA) and those supporting other phonetic writing systems as well.

Numerals

Main article: Unicode numerals

Numerals (often called numbers in Unicode) are characters that denote a number. The same Arabic-Indic numerals are used widely in various writing systems throughout the world and all share the same semantics for denoting numbers, However, the glyphs representing these numerals differ widely from one writing system to another. To support these glyph differences, Unicode includes duplicate encodings of these numerals within many of the script blocks. These digits are repeated in 22 separate blocks — twice in Arabic. Six additional sets of the ten decimal digits repeat again as rich text forms in the mathematical alphanumerics block within the supplementary multilingual plane (i.e., requiring 4 bytes of disk space to store each character).

Unicode also includes several less common numerals: Roman numerals, counting rod numerals, Cuneiform numerals and ancient Greek numerals.

Numerals invariably involve composition of glyphs as a limited number of characters are composed to make other numerals. For example the sequence 9 - 9 - 0 in Arabic-Indic numerals composes the numeral for nine hundred and ninety (990). In Roman numerals, the same number is expressed by the composed numeral Ⅹↀ or ⅩⅯ. Each of these is a distinct numeral for representing the same abstract number. The semantics of the numerals differ in particular in their composition. The Arabic-Indic decimal digits are positional-value compositions, while the Roman numerals are sign-value and they are additive and subtractive depending on their composition.

Punctuation and diacritics

Unicode includes several blocks for unified diacritics and other combining marks and also blocks for unified punctuation. However, when a mark or punctuation character is intended primarily for use within a particular script, the character is assigned to that particular script’s blocks. Therefore authors will find these types of characters throughout the Unicode character database. Unicode categorizes them as:

Punctuation

connector (Pc)
dash (Pd)
open (Po)
close (Pe)
initial (Pi)
final (Pf)

Mark

non-spacing (Mn)
spacing-combining (Mc)
enclosing (Me)

Symbols

Unicode has dozens of blocks dedicated to symbols that are useful regardless of one’s writing system. Other script-specific symbols are often included within a particular script’s blocks. Symbols are categorized as:

Symbols:

math (Sm)
currency (Sc)
modifier (Sk)
other (So)

Music notation

Unicode devotes a block of 256 characters for musical symbols. Since Unicode focuses on characters laid out in two dimensions, these characters do not encode pitch or other parts of Western music expressed in the vertical dimension. Therefore the music symbols are more suited for discussions of music symbols themselves or to discuss rhythm within the prose of a document. To encode more complex musical information some other data format is necessary, such as MusicXML or Midi.

Unicode

Unicode Consortium · ISO/IEC 10646 (Universal Character Set)

Code points

Code point · Plane · Block · Mapping characters · Character property · Character charts

Characters


Special purpose	BOM · Combining grapheme joiner · Left-to-right mark and Right-to-left mark · Soft hyphen · Zero-width non-breaking space · Zero-width joiner · Zero-width non-joiner · Zero-width space

Miscellaneous lists	Combining character · Duplicate characters · Graphic characters

Processing


Algorithms	Bi-directional text · Collation (ISO 14651) · Equivalence

Transformation	BOCU-1 · CESU-8 · UTF-1 · UTF-7 · UTF-8 · UTF-9/UTF-18 · UTF-16/UCS-2 · UTF-32/UCS-4 · UTF-EBCDIC · Punycode · SCSU · Comparison

On pairs
of code points

Equivalence · Combining character · Duplicates · Homoglyph · Precomposed character (List) · Compatibility characters · Z-variant

Usage

Unicode and e-mail · Unicode and HTML · Character entity references · Unicode input · Internationalized domain name · Numeric character reference · Private Use U+F8FF · Typefaces (fonts) ·

Related standards

Common Locale Data Repository (CLDR) · GB 18030 · Han unification · ISO/IEC 8859 (8-bit encodings) · ISO 14651 (Collation) · ISO 15924 (Script codes)

Look at other dictionaries:

Mapping of Unicode characters — Unicode’s Universal Character Set has a potential capacity to support over 1 million characters. Each UCS character is mapped to a code point which is an integer between 0 and 1,114,111 used to represent each character within the internal logic… … Wikipedia
Unicode character property — Unicode assigns character properties to each code point.[1] These properties can be used to handle characters (code points) in processes, like in line breaking, script direction right to left or applying controls. Slightly inconsequently, some… … Wikipedia
Unicode equivalence — is the specification by the Unicode character encoding standard that some sequences of code points represent essentially the same character. This feature was introduced in the standard to allow compatibility with preexisting standard character… … Wikipedia
Unicode — For the 1889 Universal Telegraphic Phrase book, see Commercial code (communications). The Unicode official logo since October 2009 … Wikipedia
Unicode font — A Unicode font (also known as UCS font and Unicode typeface) is a computer font that contains a wide range of characters, letters, digits, glyphs, symbols, ideograms, logograms, etc., which are collectively mapped into the standard Universal… … Wikipedia
Unicode symbols — v · Character Types Scripts Unihan ideographs, etc. Phonetic characters Punctuation and separators Diacritics and other marks Symbols Numerals Compatibility characters … Wikipedia
Phonetic symbols in Unicode — Unicode supports several phonetic scripts and notations through the existing writing systems and the addition of extra blocks with phonetic characters. These phonetic extras are derived of an existing script, usually Latin, Greek or Cyrillic. In… … Wikipedia
Plane (Unicode) — Main article: Mapping of Unicode characters In the Unicode system, planes are groups of numerical values that point to specific characters. Unicode code points are logically divided into 17 planes, each with 65,536 (= 216) code points. Planes are … Wikipedia
Comparison of Unicode encodings — This article compares Unicode encodings. Two situations are considered: 8 bit clean environments and environments that forbid use of byte values that have the high bit set. Originally such prohibitions were to allow for links that used only seven … Wikipedia
Cyrillic characters in Unicode — Cyrillic script Slavic letters А Б В Г Ґ Д … Wikipedia

Academic Dictionaries and Encyclopedias

Mapping of Unicode graphic characters

Contents

Unihan characters

Phonetic characters

Numerals

Punctuation and diacritics

Symbols

Music notation

Look at other dictionaries:

Share the article and excerpts

Academic Dictionaries and Encyclopedias

Wikipedia

Mapping of Unicode graphic characters

Contents

Unihan characters

Phonetic characters

Numerals

Punctuation and diacritics

Symbols

Music notation

Look at other dictionaries:

Share the article and excerpts

Direct link