HTML decimal character rendering

HTML decimal character rendering

Not all web browsers or email clients used by receivers of HTML documents, or text editors used by authors of HTML documents, will be able to render all HTML characters. Most modern web browsers are able to display many more characters than the latest versions of Microsoft Internet Explorer. This is due to different "font linking" capabilities that allow glyphs to be used from fonts according to what characters are needed and supported by the fonts on the system.

For codes from 0 to 127, the original 7-bit ASCII standard set, most of these characters can be used without a character reference. Codes from 160 to 255 can all be created using character entity names. Only a few higher-numbered codes can be created using entity names, but all can be created by decimal number character reference.

Illegal characters

HTML forbids [http://www.w3.org/TR/REC-html40/sgml/sgmldecl.html] the use of the characters with Universal Character Set/Unicode code points

* 0000 to 0008
* 0011 to 0012
* 0014 to 0031
* 0127
* 0128 to 0159
* 55296 to 57343

These characters are "not even allowed by reference". That is, you are not even allowed to write them as numeric character references. However, references to characters 0128–0159 are commonly interpreted by lenient web browsers as if they were references to the characters assigned to "bytes" 128–159 (decimal) in the Windows-1252 character encoding. This is in violation of HTML and SGML standards, and the characters are already assigned to higher code points, so HTML document authors should always use the higher code points. For example, for the trademark sign (™), use ™, not ™.

The characters 0009 (tab), 0010 (linefeed), and 0013 (carriage return) are allowed in HTML documents, but, along with 0032 (space) are all considered "whitespace" [http://www.w3.org/TR/REC-html40/struct/text.html#h-9.1] . The "form feed" control character, which would be at 0012, is not allowed in HTML documents, but is also mentioned as being one of the "white space" characters — perhaps an oversight in the specifications. In HTML, most consecutive occurrences of white space characters, except in a <pre> block, are interpreted as comprising a single "word separator" for rendering purposes. A word separator is typically rendered a single en-width space in European languages, but not in others.


Wikimedia Foundation. 2010.

Игры ⚽ Поможем сделать НИР

Look at other dictionaries:

  • HTML — For the use of HTML on Wikipedia, see Help:HTML in wikitext. HTML (HyperText Markup Language) Filename extension .html, .htm Internet media type text/html Type code TEXT …   Wikipedia

  • Apostrophe — redirects here. For other uses, see (disambiguation). Apostrophes redirects here. For the music book, see Apostrophes: A Book of Tributes to Masters of Music. For other uses, see Apostrophe (disambiguation). ’ Apostrophe …   Wikipedia

  • Web colors — HTML HTML and HTML5 Dynamic HTML XHTML XHTML Mobile Profile and C HTML Canvas element Character encodings Document Object Model Font family HTML editor HTML element HTML Frames HTML5 video HTML scrip …   Wikipedia

  • Mapping of Unicode characters — Unicode’s Universal Character Set has a potential capacity to support over 1 million characters. Each UCS character is mapped to a code point which is an integer between 0 and 1,114,111 used to represent each character within the internal logic… …   Wikipedia

  • Unicode — For the 1889 Universal Telegraphic Phrase book, see Commercial code (communications). The Unicode official logo since October 2009 …   Wikipedia

  • Abkürzungen/Computer — Dies ist eine Liste technischer Abkürzungen, die im IT Bereich verwendet werden. A [nach oben] AA Antialiasing AAA authentication, authorization and accounting, siehe Triple A System AAC Advanced Audio Coding AACS …   Deutsch Wikipedia

  • Liste der Abkürzungen (Computer) — Dies ist eine Liste technischer Abkürzungen, die im IT Bereich verwendet werden. A [nach oben] AA Antialiasing AAA authentication, authorization and accounting, siehe Triple A System AAC Advanced Audio Coding AACS …   Deutsch Wikipedia

  • Dash — Not to be confused with Hyphen or Minus sign. This article is about the punctuation mark. For other uses, see Dash (disambiguation). For guidelines on dash usage in Wikipedia, see Wikipedia:Manual of Style#Dashes …   Wikipedia

  • Wikipedia:Manual of Style — This guideline is a part of the English Wikipedia s Manual of Style. Use common sense in applying it; it will have occasional exceptions. Please ensure that any edits to this page reflect consensus. Shortcuts …   Wikipedia

  • Space (punctuation) — In writing, a space ( ) is a blank area that is devoid of content, which separates words, letters, numbers, and punctuation. Conventions for interword and intersentence spaces vary among languages, and in some cases the spacing rules are quite… …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”