String literal

String literal

A string literal is the representation of a string value within the source code of a computer program. There are numerous alternate notations for specifying string literals, and the exact notation depends on the individual programming language in question. Nevertheless, there are somegeneral guidelines that most modern programming languages follow.

Specifically, most string literals can be specified using:

* declarative notation;
* whitespace delimiters (indentation);
* bracketed delimiters (quoting);
* escape characters; or
* a combination of some or all of the above

Declarative notation

In the original FORTRAN programming language, string literals were written in so-called "Hollerith" notation, where a decimal count of the number of characters was followed by the letter H, and then the characters of the string:

27HAn example Hollerith string

This declarative notation style is contrasted with bracketed delimiter quoting, because it doesnot require the use of balanced "bracketed" characters on either side of the string.

* avoids the problem of delimiter collision
* enables the inclusion of metacharacters that might otherwise be mistaken as commands

* this type of notation is error-prone for manual entry by programmers

Because of the drawbacks, most programming languages do not use this style of declarativenotation.

Whitespace delimiters

In YAML, string literals may be specified by the relative positioning of whitespace andindentation.

- title: An example multi-line string in YAML body :
This is a multi-line string. "special" metacharacters may appear here. The content of this string is indicated by indentation.

Bracketed delimiters

Most modern programming languages use bracket delimiters (also balanced delimiters, or quoting)to specify string literals. Double quotes are the most common quoting delimiters used:

"Hi There!"

Some languages also allow the use of single quotes as an alternative to double quotes (though the string must begin and end with the same kind of quotation mark):

'Hi There!'

Note that these quotation marks are "unpaired" (the same character is used as an opener and a closer), which is a hangover from the typewriter technology which was the precursor of the earliest computer input and output devices. The Unicode character set includes paired (separate opening and closing) versions of both single and double quotes:

“Hi There!” ‘Hi There!’

The paired double quotes can be used in Visual Basic .NET.

The PostScript programming language uses parentheses, with embedded newlines allowed,and also embedded unescaped parentheses provided they are properly paired:

(The quick (brown fox))

Similarly, the Tcl programming language uses braces (embedded newlines allowed, embedded unescaped braces allowed provided properly paired):

{The quick {brown fox

This practice derives on one hand from the single quotes in Unix shells (these are "raw" strings) and on the other from the use of braces in C for compound statements, since blocks of code is in Tcl syntactically the same thing as string literals. That the delimiters are paired is essential for making this feasible.

Delimiter collision

Delimiter collision is a common problem for string literal notations that usebalanced delimiters and quoting. The problem occurs when a programmer attempts to use a quoting character as part of the string literal itself. Because this is a very common problem, a number of methods for avoiding delimiter collision have been invented.

Dual quoting style

Some languages ("e.g." Modula-2, JavaScript) attempt to avoid the delimiter collision problem by allowing a dual quotingstyle. Typically, this consists of allowing the programmer to use either single quotesor double quotes interchangeably.

"This is John's apple." 'I said, "Can you hear me?"'

One problem with dual quoting is that it doesn't allow for the inclusion of "both" stylesof quotes at once within the same literal (unless "escaped", see below).

Some programming languages allow subtle variations on dual quoting, treating single quotesand double quotes slightly differently ("e.g." sh, Perl).

Escape character

One method for avoiding delimiter collision is to use escape characters:

"I said, "Can you hear me?""

The most commonly-used escape character for this purpose is the backslash "",the tradition for which originated on Unix. From a language design standpoint, thisapproach is adequate, but there are drawbacks:

* text can be rendered unreadable when littered with numerous escape characters
* escape characters are required to be escaped, when not intended as escape characters
* although easy to type, they can be cryptic to someone unfamiliar with the language

"I said, "The Windows path is C:\Foo\Bar\Baz""

The confusing presence of too many escape and slash characters in a string is commonly disparaged as leaning toothpick syndrome.

Escape sequence

An extended concept of the escape character, an escape sequence is also a means of avoidingdelimiter collision. An escape sequence consists of two or more consecutive characters that can havespecial meaning when used in the context of a string literal.

"I said, x22Can you hear me?x22"

Escape sequences can also be used for purposes other than avoiding delimiter collision, andcan also include metacharacters. (see Metacharacters below).

Double-up escape sequence

Some languages (such as Pascal, BASIC and DCL) avoid delimiter collisionby "doubling up" on the quotation marks that are intended to be part of the string literalitself:

'This Pascal string"contains two apostrophes"' "I said, ""Can you hear me?"""

Extended quoting styles

Some languages extend the previously-mentioned quoting conventions even further. These extended approaches provide an even more flexible style of notation for avoiding delimiter collision.

Triple quoting:One such extension, the use of "triple quoting", is used in Python:

This is John's apple.

"""John is Nancy's so-called "boyfriend"."""

Triple quoted string literals may be delimited by """ or "'. Triple quoting in Python also has the added benefit of allowing string literals to span more than one physical line of source code.

Multiple quoting:Another such extension is the use of "multiple quoting", which allows the author to choose which characters should specify the bounds of a string literal.

For example in Perl:

qq^I said, "Can you hear me?"^

qq@I said, "Can you hear me?"@

qq§I said, "Can you hear me?"§

all produce the desired result.Although this notation is more flexible, few languages support it. Perland Ruby are two that do.

Here documents

A Here document is an alternate quoting notation that allows the programmerto specify an arbitrary unique identifier as a content boundary for a string literal.This avoids delimiter collision, and also preserves newlines in the source codeas newlines in the string literal itself.


Many languages support the use of metacharacters inside string literals. Metacharactershave varying interpretations depending on the context and language, but are generally a kindof 'processing command' for representing printing or nonprinting characters.

For instance, in a C string literal, if the backslash is followedby a letter such as "b", "n" or "t", then this represents a nonprinting "backspace", "newline"or "tab" character respectively. Or if the backslash is followed by 3 octal digits,then this sequence is interpreted as representing the arbitrary character with the specified
ASCII code. This was later extended to allow more modern hexadecimal character code notation:

"I said, x22Can you hear me?x22 "

Raw strings

A few languages provide a method of specifying that a literal is to be processed without any language specific interpretation.

For example, in Python 'raw strings' are preceded by an r. In such strings backslashes are not interpreted as escape sequences, making it simpler to write DOS/Windows paths and regular expressions: r"The Windows path is C:FooBarBaz "

C#'s notation is called @-quoting:@"C:FooBarBaz"Which also allows double-up quotes:@"I said, ""Hello there."""

In XML documents, CDATA sections allows use of characters such as &amp; and &lt; without an XML parser attempting to interpret them as part of the structure of the document itself. This can be useful when including literal text and scripting code, to keep the document well formed. if (path!=null && depth<2) { add(path); } >

Variable interpolation

Languages differ on whether and how to interpret string literals as either'raw' or 'variable interpolated'. Variable interpolation is the processof evaluating an expression containing one or more variables, and returning output where the variables are replaced with their corresponding values in memory.In sh-compatible Unix shells, quote-delimited (") strings are interpolated, while apostrophe-delimited (') strings are not.

For example, the following Perl code:

$sName = "Nancy";$sGreet = "Hello World";print "$sName said $sGreet to the crowd of people.";

produces the output:

Nancy said Hello World to the crowd of people.

The sigil character ($) is interpreted to indicate variableinterpolation.

Similarly, the printf function produces the same outputusing notation such as:

printf "%s said %s to the crowd of people.", ($sName,$sGreet);

The metacharacters (%s) indicate variable interpolation.

This is contrasted with "raw" strings:

print '$sName said $sGreet to the crowd of people.';

which produce output like:

$sName said $sGreet to the crowd of people.

Here the $ characters are not sigils, and are not interpreted to have any meaning other than plain text.

Binary and hexadecimal strings

REXX uses suffix characters to specify characters or strings using their hexadecimal or binary code. E.g.,

'20'x"0010 0000"b"00100000"b
all yield the space character, avoiding the function call X2C(20).

Embedding source code in string literals

Languages that lack flexibility in specifying string literals makeit particularly cumbersome to write programming code that generatesother programming code. This is particularly true when the generationlanguage is the same or similar to the output language.

for example:
* writing code to produce quines
* generating an output language from within a web template;
* using XSLT to generate XSLT, or SQL to generate more SQL
* generating a PostScript representation of a document for printing purposes, from within a document-processing application written in C or some other language.

Nevertheless, some languages are particularly well-adapted to producethis sort of self-similar output, especially those that support multiple optionsfor avoiding delimiter collision.

Apart from the mechanics of specifying stringliterals, however, one must consider security implications of code that generatesother code, especially if the output is based at least partially on untrusteduser input. This is potentially a serious security weakness.This is particularly acute in the case of Web-based applications, where malicious users can take advantage of such weaknesses to subvert the operation of the application, for example by mounting an SQL injection attack.

External links

* [ Escape sequences in Java, C, C++, VB, Python and other languages]
* [ Literals In Programming]

Wikimedia Foundation. 2010.

Игры ⚽ Нужен реферат?

Look at other dictionaries:

  • Literal — puede referirse a: Lo relativo a la letra (no debe confundirse con literario) Lo que se lee o reproduce al pie de la letra (completamente y con exactitud) Lenguaje literal por oposición al lenguaje figurativo (que utiliza recursos que alteran la… …   Wikipedia Español

  • Literal — may refer to:*Literal and figurative language, taken in a non figurative sense. *Literal translation, the close adherence to the forms of a source language text. *Terminal symbol in regular expressions and in descriptions of formal grammars.… …   Wikipedia

  • String (computer science) — In formal languages, which are used in mathematical logic and theoretical computer science, a string is a finite sequence of symbols that are chosen from a set or alphabet. In computer programming, a string is traditionally a sequence of… …   Wikipedia

  • Literal (computer programming) — In computer science, a literal is a notation for representing a fixed value in source code. Almost all programming languages have notations for atomic values such as integers, floating point numbers, strings, and booleans; some also have… …   Wikipedia

  • String — Wiktionarypar|stringGenerally, string is a thin, flexible piece of rope or twine which is used to tie, bind, or hang other objects. String can be made from a variety of fibres.Examples of string use include: * String figures, designs formed by… …   Wikipedia

  • Comparison of programming languages (string functions) — String functions redirects here. For string functions in formal language theory, see String operations. Programming language comparisons General comparison Basic syntax Basic instructions Arrays …   Wikipedia

  • Object literal — In computer science, a literal is a notation for representing a fixed value in source code, eg string literal. In contrast to this, variables or constants are symbols that can take on one of a class of fixed values, the constant being constrained …   Wikipedia

  • C string handling — C string redirects here. For the underwear and swimwear, see C string (clothing). C Standard Library Data types Character classification Strings Mathematics …   Wikipedia

  • List of formal language and literal string topics — This is a list of formal language and literal string topics, by Wikipedia page. Contents 1 Formal languages 2 Literal strings 3 Classical cryptography Formal languages Abstract syntax tree …   Wikipedia

  • printf format string — An example of the printf function. Printf format string (which stands for print formatted ) refers to a control parameter used by a class of functions typically associated with some types of programming languages. The format string specifies a… …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”