Whitespace (computer science)

Whitespace (computer science)

In computer science, whitespace is any single character or series of characters that represents horizontal or vertical space in typography. When rendered, a whitespace character does not correspond to a visual mark, but typically does occupy an area on a page. For example, the common whitespace symbol " " (the Unicode character at the 32nd code point) represents a blank space, as used between words and sentences in Western scripts.

The term whitespace is based on the assumption that the background color used for rendered text is white, and is thus confusing if it is not.

As is common in technical literature, the two words "white space" have found widespread usage as the single term "whitespace", especially when used as an adjective, as in "whitespace character". Some specifications refer to "white space" while others refer to "whitespace"; there is no difference between the terms, although exactly which characters are being referred to does vary from context to context. For example, in HTML, "whitespace" includes the form feed character, while in XML, "white space" does not.

The most common whitespace characters may be typed via the space bar or the Tab key. Depending on context, a line-break generated by the Return key (Enter key) may be considered whitespace as well.

Runs of whitespace occurring within source code written in computer programming languages are generally ignored; such languages are "free-form". But, for example, in Haskell and Python, whitespace and indentation are used for syntactical purposes. In many programming languages, abundant use of whitespace, especially "trailing whitespace" at the end of lines, is considered a nuisance. In interpreted languages, parsing of unnecessary whitespace may affect the speed of execution. In markup languages like HTML, unnecessary whitespace increases the file size, and may so affect the speed of transfer over a network. On the other hand, unnecessary whitespace can also inconspicuously mark code, similar to, but less obvious than comments in code. This can be desirable to prove an infringement of license or copyright that was committed by copying and pasting.

The C language defines whitespace to be "... space, horizontal tab, new-line, vertical tab, and form-feed". The HTTP network protocol has very strict requirements about what type of whitespace can occur in the control structures (such as the header fields) and where it must and must not occur.

On some occasions, such as a textbook on the Modula-2 computer language published ca. 1985 by Springer-Verlag, it is necessary to explicitly show a symbol to indicate a space code. That book, at least, used the symbol ␣ (Unicode U+2423, decimal 9251, OPEN BOX) to show an explicit space code. (In case it doesn't render well on a monitor screen, it's like a ] (closing square bracket) rotated a quarter-turn clockwise, although not as wide, and placed below the writing line. Some fonts render it too narrowly.)

Such usage is similar to multiword file names written for operating systems and applications that are confused by embedded space codes—such file names instead use a low line (_) as a word separator, as_in_this_phrase.

Another such symbol was ␢ (Unicode U+2422, decimal 9250, LATIN SMALL B WITH STROKE). This was used in the early years of computer programming (especially by IBM?) when writing on coding forms. Keypunch operators immediately recognized the symbol as an "explicit space".

Unicode

In Unicode (Unicode Character Database) the following codepoints are defined as whitespace:
*U0009-U000D (Control characters, containing TAB, CR and LF)
*U0020 SPACE
*U0085 NEL
*U00A0 NBSP
*U1680 OGHAM SPACE MARK
*U180E MONGOLIAN VOWEL SEPARATOR
*U2000-U200A (different sorts of spaces)
*U2028 LSP
*U2029 PSP
*U202F NARROW NBSP
*U205F MEDIUM MATHEMATICAL SPACE
*U3000 IDEOGRAPHIC SPACE

See also

* Programming style
* Indent style
* Space (punctuation)
* Trim (programming)

External links

[http://unicode.org/Public/UNIDATA/PropList.txt Propertylist of Unicode Character Database]


Wikimedia Foundation. 2010.

Игры ⚽ Нужно сделать НИР?

Look at other dictionaries:

  • Computer programming — Programming redirects here. For other uses, see Programming (disambiguation). Software development process Activities and steps …   Wikipedia

  • Comment (computer programming) — For comments in Wikipedia markup, see Help:Wiki markup#Character formatting and WP:COMMENT. An illustration of Java source code with prologue comments indicated in red and inline comments in green. Program code is in blue …   Wikipedia

  • White space — White space, commonly called whitespace in technical fields, may mean:* WhiteSpace (Resource Scheduling), name used since 2002 to denote available time for People or Resources when scheduling time * White space (visual arts), or negative space,… …   Wikipedia

  • Boolean algebra — This article discusses the subject referred to as Boolean algebra. For the mathematical objects, see Boolean algebra (structure). Boolean algebra, as developed in 1854 by George Boole in his book An Investigation of the Laws of Thought,[1] is a… …   Wikipedia

  • Delimiter — This article is about delimiters in computing. For delimiters in written human languages, see interword separation. A stylistic depiction of a fragment from a CSV formatted text file. The commas (shown in red) are used as field delimiters. A… …   Wikipedia

  • Regular expression — In computing, a regular expression provides a concise and flexible means for matching (specifying and recognizing) strings of text, such as particular characters, words, or patterns of characters. Abbreviations for regular expression include… …   Wikipedia

  • XML — Infobox file format name = Extensible Markup Language icon = logo = extension = .xml mime = application/xml, text/xml (deprecated) type code = uniform type = public.xml magic = owner = World Wide Web Consortium genre = Markup language container… …   Wikipedia

  • Lexical analysis — In computer science, lexical analysis is the process of converting a sequence of characters into a sequence of tokens. Programs performing lexical analysis are called lexical analyzers or lexers. A lexer is often organized as separate scanner and …   Wikipedia

  • List of computing topics — Originally, the word computing was synonymous with counting and calculating, and the science and technology of mathematical calculations. Today, computing means using computers and other computing machines. It includes their operation and usage,… …   Wikipedia

  • Index (search engine) — Search engine indexing collects, parses, and stores data to facilitate fast and accurate information retrieval. Index design incorporates interdisciplinary concepts from linguistics, cognitive psychology, mathematics, informatics, physics, and… …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”