Escape character

Escape character

In computing and telecommunication, an escape character is a character which invokes an alternative interpretation on subsequent characters in a character sequence. An escape character is a particular case of metacharacters. Generally, the judgement of whether something is an escape character or not depends on context.

Contents

Definition

Escape characters are part of the syntax for many programming languages, data formats and communication protocols. For a given alphabet an escape character's purpose is to start character sequences (so named escape sequences) which have to be interpreted differently from the same characters occurring alone. An escape character may not have its own meaning, so all escape sequences are of 2 or more characters.

There are usually two functions of escape sequences. The first is to encode a syntactic entity, such as device commands or special data which cannot be directly represented by the alphabet. The second use, referred to as character quoting, is to represent characters which cannot be typed in current context, or would have an undesired interpretation. In the latter case an escape sequence is a digraph consisting of an escape character itself and a "quoted" character.

Escape character vs control character

Generally, an escape character is not a particular case of (device) control characters, nor vice versa. If we define control characters as non-graphic, or as having a special meaning for an output device (e.g. printer or text terminal) then any escape character for this device is a control one. But escape characters used in programming (see below) are graphic, hence are not control characters. Conversely most (but not all) of the ASCII "control characters" have some control function in isolation, therefore are not escape characters.

Examples

ASCII escape character

The ASCII "escape" character (octal: \033, or ^[, or, in decimal, 27) is used in many output devices to start a series of characters called a control sequence or escape sequence. Typically, the escape character was sent first in such a sequence to alert the device that the following characters were to be interpreted as a control sequence rather than as plain characters, then one or more characters would follow to specify some detailed action, after which the device would go back to interpreting characters normally. For example, the sequence of ^[, followed by the printable characters [2;10H, would cause a DEC VT102 terminal to move its cursor to the 10th cell of the 2nd line of the screen. This was later developed to ANSI escape codes covered by the ANSI X3.64 standard. The escape character also starts each command sequence in the Hewlett Packard Printer Command Language.

Early reference to the term "escape character" is found in Bob Bemer's IBM technical publications. Apparently, it is he who invented this mechanism, during his work on the ASCII character set.

The Escape key is usually found on standard PC keyboards. However it is commonly absent from keyboards for PDAs and other devices not designed primarily for ASCII communications, and not generally used as part of the common user interface for applications on the Windows operating system. Linux systems, or applications such as FireFox, often use the key as the functional equivalent to clicking on a Cancel button with a mouse. The DEC VT220 series was one of the few popular keyboards that did not have a dedicated Esc key, instead using one of the keys above the main keypad. In user interfaces of 1970s1980s it was not uncommon to use this key as an escape character, but in modern desktop computers such use is dropped. Sometimes the key was identified with AltMode (for alternative mode). Even with no dedicated key, the escape character code could be generated by typing '[' while simultaneously holding down the Control key, 'Ctrl'.

Programming and data formats

Many modern programming languages specify the doublequote character (") as a delimiter for a string literal. The backslash (\) escape character provides two ways to include doublequotes inside a string literal, either by modifying the meaning of the doublequote character embedded in the string (\" becomes "), or by modifying the meaning of the three characters that are the hexadecimal value of a doublequote character (\x22 becomes ").

In Perl or Python

print "Nancy said "Hello World!" to the crowd.";

produces a syntax error, whereas:

print "Nancy said \"Hello World!\" to the crowd.";  ### example of \"

produces the intended output. Another alternative:

print "Nancy said \x22Hello World!\x22 to them.";  ### example of \x22

uses numeric escape-sequence of hexadecimal "x22" for a quotemark. This would not produce the required text if run on a non-ASCII machine.

C, C++, and Java all allow exactly the same two backslash escape styles. The PostScript language and Microsoft Rich Text Format also use backslash escapes. The quoted-printable encoding uses the equals sign as an escape character.

URL and URI use %-escapes to quote characters with a special meaning, as for non-ASCII characters. The ampersand (&) character may be considered as an escape character in SGML and derived formats such as HTML and XML.

Another similar (and partially overlapping) syntactic trick is stropping.

Some programming languages also provide other ways to represent special characters in literals, without requiring an escape character (see e.g. delimiter collision).

Communication protocols

The Point-to-Point Protocol uses the 0x7D octet (\175, or ASCII: } ) as an escape character. The octet immediately following should be XORed by 0x20 before being passed to a higher level protocol. This is applied to both 0x7D itself and the control character 0x7E (which is used in PPP to mark the beginning and end of a frame) when those octets need to be transmitted by a higher level protocol encapsulated by PPP, as well as other octets negotiated when the link is established. That is, when a higher level protocol wishes to transmit 0x7D, it is transmitted as the sequence 0x7D 0x5D, and 0x7E is transmitted as 0x7D 0x5E.

Bourne shell

In Bourne shell (sh), the asterisk (*) and question mark (?) characters are wildcard characters expanded via globbing. Without a preceding escape character, an * will expand to the names of all files in the working directory that don't start with a period iff there are such files, otherwise * remains unexpanded. So to refer to a file literally called "*", the shell must be told not to interpret it in this way, by preceding it with a backslash (\). This modifies the interpretation of the asterisk (*). Compare:

 
rm *    # delete all files in the current directory
 
rm \*   # delete the file named *

Windows Command Prompt

The Windows command-line interpreter uses a caret character (^) to escape reserved characters that have special meanings (in particular: & | ( ) < > ^).[1] The DOS command-line interpreter, though it supports similar syntax, does not support this.

For example, on the Windows Command Prompt, this will result in a syntax error.

echo <wiki>

whereas this will output the string: <wiki>

echo ^<wiki^>

See also

References

  1. ^ Tim Hill (1998). "The Windows NT Command Shell". MacMillan Technical Publishing. http://technet.microsoft.com/en-us/library/cc723564.aspx. Retrieved 2010-01-13. 

External links

 This article incorporates public domain material from the General Services Administration document "Federal Standard 1037C".


Wikimedia Foundation. 2010.

Игры ⚽ Нужен реферат?

Look at other dictionaries:

  • escape character — noun (computing) 1. A control character that often terminates an action, etc 2. (in pl) a special sequence of control characters used to control a printer • • • Main Entry: ↑escape …   Useful english dictionary

  • Escape Character —   [engl.], Escape Zeichen …   Universal-Lexikon

  • escape character — noun A single metacharacter, usually a control code, which in a sequence of characters signifies that what is to follow takes an alternative interpretation. The term escape sequence refers to the escape character and the subsequent character or… …   Wiktionary

  • escape character — kaitos ženklas statusas T sritis informatika apibrėžtis ↑Valdymo ženklas, informuojantis įtaisą arba programą, kad tekste po jo einančių ženklų grupė sudaro ↑kaitos seką. Yra specialus kaitos ženklas, žymimas ESC. Jo kodai: 27 (ASCII,… …   Enciklopedinis kompiuterijos žodynas

  • Escape Character — Escape Sequenzen (von englisch to escape: entfliehen, entgehen, entkommen) sind Zeichenkombinationen, die für die Darstellung nicht direkt angebbarer Zeichen verwendet werden. Bei den nicht darstellbaren Zeichen handelt es sich meistens um… …   Deutsch Wikipedia

  • Escape — may refer to: * Escape (hold), a maneuver used to exit a wrestling or grappling hold * Escapism, mental diversion by means of entertainment or recreation * Escapology, the study and practice of escaping from physical restraints * Prison escape,… …   Wikipedia

  • escape sequence — noun (computing) A sequence of characters, usu beginning with an escape character, that gives or initiates a command • • • Main Entry: ↑escape …   Useful english dictionary

  • Escape sequence — This article refers to codes used as commands for computing devices. Escape sequence can also refer to a sequence of escape characters used in parsing source code. An escape sequence is a series of characters used to change the state of computers …   Wikipedia

  • Character Generator Protocol — Not to be confused with character generator. Internet protocol suite Application layer BGP …   Wikipedia

  • escape sequence — noun An escape character together with the subsequent characters that specify a particular meaning …   Wiktionary

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”