Terminal and nonterminal symbols

Terminal and nonterminal symbols

In computer science, terminal and nonterminal symbols are the lexical elements used in specifying the production rules that constitute a formal grammar. The terminals and nonterminals of a particular grammar are two disjoint sets.

Contents

Terminal symbols

Terminal symbols are literal characters that can appear in the inputs to or outputs from the production rules of a formal grammar and that cannot be broken down into "smaller" units. To be precise, terminal symbols cannot be changed using the rules of the grammar. For example, a grammar that is defined by two rules:

  1. x can become xa
  2. x can become ax

has a as a terminal symbol because no rule exists that would change it to something else. (On the other hand, x has two rules that can change it, so it is nonterminal.) A formal language defined (or generated) by a particular grammar is the set of strings that can be produced by the grammar and that consist only of terminal symbols; nonterminals that do not consist entirely of terminals may not appear in the lexemes that are said to belong to the language.

In the context of syntax analysis, as opposed to the theory of programming languages and compilers, the terms "terminal symbol" and "token" are often treated as synonymous. Quoting the so-called Dragon Book (a standard reference on the latter subject):

In a compiler, the lexical analyzer reads the characters of the source program, groups them into lexically meaningful units called lexemes, and produces as output tokens representing these lexemes. A token consists of two components, a token name and an attribute value. The token names are abstract symbols that are used by the parser for syntax analysis. Often, we shall call these token names terminals, since they appear as terminal symbols in the grammar for a programming language. The attribute value, if present, is a pointer to the symbol table that contains additional information about the token. This additional information is not part of the grammar, so in our discussion of syntax analysis, often we refer to tokens and terminals synonymously. [1]

Terminal symbols, or just terminals, are the elementary symbols of the language defined by a formal grammar.

Nonterminal symbols

Nonterminal symbols, or just nonterminals, are the symbols which can be replaced; thus there are strings composed of some combination of terminal and nonterminal symbols. They may also be called simply syntactic variables. A formal grammar includes a start symbol, a designated member of the set of nonterminals from which all the strings in the language may be derived by successive applications of the production rules. In fact, the language defined by a grammar is precisely the set of terminal strings that can be so derived.

Context-free grammars are those grammars in which the left-hand side of each production rule consists of only a single nonterminal symbol. This restriction is non-trivial; not all languages can be generated by context-free grammars. Those that can are called context-free languages. These are exactly the languages that can be recognized by a non-deterministic pushdown automaton. Context-free languages are the theoretical basis for the syntax of most programming languages.

Production rules

A grammar is defined by production rules that specify which lexemes may replace which other lexemes; these rules may be used to generate strings, or to parse them. Each such rule has a head, or left-hand side, which consists of the string that may be replaced, and a body, or right-hand side, which consists of a string that may replace it. Rules are often written in the form <head \rarr body>; e.g., the rule z0 → z1 specifies that z0 can be replaced by z1.

In the classic formalization of generative grammars first proposed by Noam Chomsky in the 1950s,[2][3] a grammar G consists of the following components:

  • A finite set N of nonterminal symbols.
  • A finite set Σ of terminal symbols that is disjoint from N.
  • A finite set P of production rules, each rule of the form
(\Sigma \cup N)^{*} N (\Sigma \cup N)^{*} \rightarrow (\Sigma \cup N)^{*}
where * is the Kleene star operator and \cup denotes set union, so (\Sigma \cup N)^{*} represents zero or more symbols, and N means one nonterminal symbol. That is, each production rule maps from one string of symbols to another, where the first string contains at least one nonterminal symbol. In the case that the body consists solely of the empty string—i.e., that it contains no symbols at all—it may be denoted with a special notation (often Λ, e or \epsilon) in order to avoid confusion.
  • A distinguished symbol S \in N that is the start symbol.

A grammar is formally defined as the ordered quadruple < N,Σ,P,S > . Such a formal grammar is often called a rewriting system or a phrase structure grammar in the literature.[4][5]

Example

For instance, the following represents an integer (which may be signed) expressed in a variant of Backus–Naur form:

<integer> ::= ['-'] <digit> {<digit>}
<digit> ::= '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9'

In this example, the symbols (-,0,1,2,3,4,5,6,7,8,9) are terminal symbols and <digit> and <integer> are nonterminal symbols.

Notes

  1. ^ Aho, Lam, Sethi, & Ullman, Compilers: Principles, Techniques, and Tools, second edition; Pearson/Addison-Wesley, 2006. Box, p. 43.
  2. ^ Chomsky, Noam (1956). "Three Models for the Description of Language". IRE Transactions on Information Theory 2 (2): 113–123. doi:10.1109/TIT.1956.1056813. 
  3. ^ Chomsky, Noam (1957). Syntactic Structures. The Hague: Mouton. 
  4. ^ Ginsburg, Seymour (1975). Algebraic and automata theoretic properties of formal languages. North-Holland. pp. 8–9. ISBN 0720425069. 
  5. ^ Harrison, Michael A. (1978). Introduction to Formal Language Theory. Reading, Mass.: Addison-Wesley Publishing Company. pp. 13. ISBN 0201029553. 

References


Wikimedia Foundation. 2010.

Игры ⚽ Поможем сделать НИР

Look at other dictionaries:

  • formal grammar — noun A precise mathematical description of a formal language, consisting of terminal symbols, nonterminal symbols, a nonterminal symbol serving as start symbol, and a set of production rules that control the expansion of nonterminal symbols into… …   Wiktionary

  • Context-sensitive grammar — A context sensitive grammar (CSG) is a formal grammar in which the left hand sides and right hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols. Context sensitive grammars are more general than… …   Wikipedia

  • Conjunctive grammar — Conjunctive grammars are a class of formal grammars studied in formal language theory. They extend the basic type of grammars, the context free grammars, with a conjunction operation. Besides explicit conjunction, conjunctive grammars allow… …   Wikipedia

  • Algoritmo CYK — El algoritmo de Cocke Younger Kasami (CYK) determina si una cadena puede ser generada por una gramática libre de contexto y, si es posible, cómo puede ser generada. Este proceso es conocido como análisis sintáctico de la cadena. El algoritmo es… …   Wikipedia Español

  • Context-free grammar — In formal language theory, a context free grammar (CFG) is a formal grammar in which every production rule is of the form V → w where V is a single nonterminal symbol, and w is a string of terminals and/or nonterminals (w can be empty). The… …   Wikipedia

  • Parsing expression grammar — A parsing expression grammar, or PEG, is a type of analytic formal grammar that describes a formal language in terms of a set of rules for recognizing strings in the language. A parsing expression grammar essentially represents a recursive… …   Wikipedia

  • Metasyntax — A metasyntax describes the allowable structure and composition of phrases and sentences of a metalanguage, which is used to describe either a natural language or a computer programming language. Some of the widely used formal metalanguages for… …   Wikipedia

  • Formal grammar — In formal semantics, computer science and linguistics, a formal grammar (also called formation rules) is a precise description of a formal language ndash; that is, of a set of strings over some alphabet. In other words, a grammar describes which… …   Wikipedia

  • Chomsky hierarchy — Within the field of computer science, specifically in the area of formal languages, the Chomsky hierarchy (occasionally referred to as Chomsky–Schützenberger hierarchy) is a containment hierarchy of classes of formal grammars. This hierarchy of… …   Wikipedia

  • Top-down parsing language — (TDPL) is a type of analytic formal grammar developed by Alexander Birman in the early 1970s in order to study formally the behavior of a common class of practical top down parsers that support a limited form of backtracking. Birman originally… …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”