Unicode numerals

Unicode numerals

Numerals (often called numbers in Unicode) are characters that denote a number. The same Arabic-Indic numerals are used widely in various writing systems throughout the world and all share the same semantics for denoting numbers, However, the glyphs representing these numerals differ widely from one writing system to another. To support these glyph differences, Unicode includes duplicate encodings of these numerals within many of the script blocks. The decimal digits are repeated in 23 separate blocks: twice in Arabic. Six additional blocks contain the digits again as rich text or legacy software compatibility characters. In addition to many forms of the Arabic-Indic numerals, Unicode also includes several less common numerals such as: Aegean numerals, Roman numerals, counting rod numerals, Cuneiform numerals and ancient Greek numerals.

Numerals invariably involve composition of glyphs as a limited number of characters are composed to make other numerals. For example the sequence 9 - 9 - 0 in Arabic-Indic numerals composes the numeral for nine hundred and ninety (990). In Roman numerals, the same number is expressed by the composed numeral Ⅹↀ or ⅩⅯ. Each of these is a distinct numeral for representing the same abstract number. The semantics of the numerals differ in particular in their composition. The Arabic-Indic decimal digits are positional-value compositions, while the Roman numerals are sign-value and they are additive and subtractive depending on their composition.

Arabic-Indic Numerals

The Arabic-Indic numerals involve ten digits (for base ten; 0-9 ) and a decimal separator that can be combined into composite numerals representing any rational number. Unicode includes these ten digits in the Basic Latin (or ASCII derived) block. Unicode has no decimal separator for common unified use. The Arabic script includes an Arabic specific decimal separator (U+066B). Other writing systems are to use whatever punctuation produces the appropriate glyph for the locale: for example ‘Full Stop’ (U+002E period) in United States usage and Comma (U+002C) in many other locales.

The Arabic-Indic digits are repeated in several other scripts: Arabic, Balinese, Bengali, Devanagari, Ethiopic, Gujarati, Gurmukhi, Telugu, Khmer, Lao, Limbu, Malayalam, Mongolian, Myanmar, New Tai Lue, Nko, Oriya, Telugu, Thai, Tibetan, Osmanya. Unicode includes a numeric value property for each digit to assist in collation and other text processing operations. However, there is no mapping between the various related Arabic-Indic digits.

Hexadecimal numerals

Unicode adds a ‘Hex_Digit’ property to the characters commonly used for hexadecimal digits: decimal digits 0-9, Latin capital letters A-F and Latin small letters a-f. This property is also indicated on the compatibility characters for CJK Fullwidth Forms.

Fractions

The fraction slash character (U+2044) allows authors using Unicode to compose any arbitrary fraction along with the decimal digits. Unicode also includes a handful of vulgar fractions as compatibility characters, but discourages their use.

Characters for irrational numbers, sets and other constants

As stated above, the ten decimal digits, decimal separator and fraction slash are limited to representing rational numbers. Irrational numbers would require composition of infinite character sequences and so irrational numbers and other related constructs must be represented with other characters. In principle, Unicode does not yet encode characters to solely denote these numbers. For example, although Unicode 1.1 includes a character for “natural exponent’ (U+212F) its UCS canonical name derives from its glyph: “Script Small E”. As exceptions to this general rule, Unicode does include three characters canonically named for the number they represent: Plancks constant (U+210E), Plancks Constant Over 2 π (U+210F; also known as Dirac’s constant), and Eulers constant (U+2107). These characters are all given canonical names by the UCS for the number they semantically represent. They are not necessarily irrational number though, in practical terms, they would be exceedingly difficult to represent through composition of decimal digits. Representation of other irrational number and math constants is achieved through borrowing characters from other writing systems: for example using (π (U+03C0) from the Greek script to signify the irrational number that is the ratio of the circumference to the diameter of a perfect circle.

Rich text and other compatibility numerals

The Arabic-Indic numerals also appear among the compatibility characters as rich text variant forms including bold, double-struck,, monospace, sans-serif and sans-serif bold. and fullwidth variants for legacy vertical text support.

Rich text parenthesized, circled and other variants are also included in the blocks: Enclosed CJK Letters and Months; Enclosed Alphanumerics, Superscripts and Subscripts; Number Forms; and Dingbats.

CJK Suzhou (huāmǎ) numerals

The "huāmǎ" system is a variation of the rod numeral system. Rod numerals are closely related to the counting rods and the abacus, which is why the numeric symbols for 1, 2, 3, 6, 7 and 8 in the "huāmǎ" system are represented in a similar way as on the abacus. Nowadays, the "huāmǎ" system is only used for displaying prices in Chinese markets or on traditional handwritten invoices.

Suzhou (huāmǎ) numerals in Unicode

According to the Unicode standard version 3.0, these characters are called Hangzhou style numerals. This indicates that it is not used only by Cantonese in Hong Kong. In the Unicode standard 4.0, an erratum was added which stated:

The digits of the Suzhou numerals are designated in the CJK Symbols and Punctuation block between U+3021 and U+3029, U+3007, U+5341, U+5344, and. U+5345.

Ancient Greek numerals

Unicode provides support for several variants of Greek numerals, assigned to the Supplementary Multilingual Plane from U+10140 through U+1018F.

Attic numerals were used by ancient Greeks, possibly from the 7th century BC. They were also known as Herodianic numerals because they were first described in a 2nd century manuscript by Herodian. They are also known as acrophonic numerals because all of the symbols used derive from the first letters of the words that the symbols represent: 'one', 'five', 'ten', 'hundred', 'thousand' and 'ten thousand'. See Greek numerals and acrophony.

Roman numerals

Roman numerals are a numeral system originating in ancient Rome, adapted from Etruscan numerals. The system used in classical antiquity was slightly modified in the Middle Ages to produce the system we use today. It is based on certain letters which are given values as numerals.

Roman numerals are commonly used today in numbered lists (in outline format), clockfaces, pages preceding the main body of a book, chord triads in music analysis, the numbering of movie and video game sequels, book publication dates, successive political leaders or children with identical names, and the numbering of some sport events, such as the Olympic Games or the Super Bowl.

Roman Numerals in Unicode

Unicode includes Roman Numerals in both lowercase and uppercase form variants in the Number Forms block U+2150 – U+218F. It also includes many precomposed numerals in the range of 1 thru 12. One reason for their existence is to facilitate the setting of multiple-letter numbers (such as VIII) in a single "square" in Asian vertical text. Another is 12-hour clock-face use; however these and any other possible Roman numeral can be composed form the individual primitive numerals: Ⅰ (U+2160), Ⅴ (U+2164) Ⅹ (U+2169), Ⅼ (U+216C), Ⅽ (U+216D), Ⅾ (U+216E) and Ⅿ (U+216F). ↁ (U+2181) and ↂ (U+2182) Roman numeral one thousand is encoded with a second variant form: ↀ (U+2180)

Currently, these numerals up to one thousand — both the precomposed and the individual characters — are all considered compatibility characters that decompose to the visually similar Latin letters. For example, the roman numeral Ⅰ (U+2160) has a compatibility mapping to the Latin Capital Letter I. Similarly the precomposed roman numerals such as Ⅻ (U+216B) decompose directly to a Latin Capital Letter X and two Latin Capital Letter I characters. Presumably this is meant to deal with the visually similar presentation of typical Roman numeral glyphs.

Counting rod numerals

The counting rods (zh-tsp|t=籌|s=筹|p=chóu) were used by ancient Chinese before the invention of the abacus. The way that a number is presented by counting rods is called the rod numeral system. The rod numeral system is a decimal place value system, where the digits 0 to 9 are represented in two ways depending on their positions.

The vertical rods are usually for even powers of ten (1, 100, 10000...) and the horizontal for odd powers (10, 1000...). For example 126 is represented by .

Counting rod numerals in Unicode

Counting rod numerals are included in their own block in the Supplementary Multilingual Plane (SMP) from U+1D360 to U+1D37F. Eighteen characters for vertical and horizontal digits of 1-9 are included as of Unicode 5.0, though vertical and horizontal are opposite from the description above. Fourteen code points reserved for future use. Zero should be represented by U+3007 (〇, ideographic number zero) and the negative sign should be represented by U+20E5 (combining reverse solidus overlay) [ Citation
title=The Unicode Standard, Version 5.0 - Electronic edition
url=http://unicode.org/versions/Unicode5.0.0/ch15.pdf
year=2006
pages=499-500
publisher=Unicode, Inc.
] . As these were recently added to the character set and since they are included in the SMP, font support may still be limited.

References

See also

*Number Forms


Wikimedia Foundation. 2010.

Игры ⚽ Нужен реферат?

Look at other dictionaries:

  • Unicode character property — Unicode assigns character properties to each code point.[1] These properties can be used to handle characters (code points) in processes, like in line breaking, script direction right to left or applying controls. Slightly inconsequently, some… …   Wikipedia

  • Unicode equivalence — is the specification by the Unicode character encoding standard that some sequences of code points represent essentially the same character. This feature was introduced in the standard to allow compatibility with preexisting standard character… …   Wikipedia

  • Unicode cuneiform — Unicode (as of version 5.0) assigns to the Sumero Akkadian Cuneiform script the following ranges of the Supplementary Multilingual Plane: *U+12000 to U+1236E (879 characters) Sumero Akkadian Cuneiform *U+12400 to U+12473 (103 characters)… …   Wikipedia

  • Unicode subscripts and superscripts — Unicode has subscripted and superscripted versions of a number of characters including a full set of arabic numerals. These characters allow any polynomial equation and some other equations to be represented in plain text without using any form… …   Wikipedia

  • Unicode compatibility characters — In discussing Unicode and the UCS, many often refer to compatibility characters. Compatibility characters are graphical characters that are discouraged by the Unicode Consortium. As the [http://www.unicode.org/glossary/#compatibility character… …   Wikipedia

  • Unicode scripts — In Unicode, a script is a collection of letters and other written signs used to represent textual information in one or more writing systems. [ [http://unicode.org/glossary/ Glosary of Unicode Terms] ] For example the Latin script supports… …   Wikipedia

  • Unicode symbols — v · Character Types Scripts Unihan ideographs, etc. Phonetic characters Punctuation and separators Diacritics and other marks Symbols Numerals Compatibility characters …   Wikipedia

  • Unicode font — A Unicode font (also known as UCS font and Unicode typeface) is a computer font that contains a wide range of characters, letters, digits, glyphs, symbols, ideograms, logograms, etc., which are collectively mapped into the standard Universal… …   Wikipedia

  • Unicode-Block Arabisch — Der Unicode Block Arabic (Arabisch) (0600–06FF) enthält die Standardbuchstaben und diakritika sowie weitere Diakritika und Arabisch Indische Ziffern (en). Weitere drei Arabisch Blöcke sind Arabisch, Ergänzung sowie Arabische Präsentationsformen A …   Deutsch Wikipedia

  • Unicode-Block Zählstabziffern — Der Unicode Block Counting Rod Numerals (Zählstabziffern) (1D360–1D37F) enthält 18 der Markierungen, wie sie früher beim Zählen mit Kerbstöcken verwendet wurden. Unicode Nummer Zeichen Beschreibung Offizielle Bezeichnung U+1D360 (119648) …   Deutsch Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”