LR parser

LR parser

In computer science, an LR parser is a parser for context-free grammars that reads input from Left to right and produces a Rightmost derivation. The term LR("k") parser is also used; here the "k" refers to the number of unconsumed "look ahead" input symbols that are used in making parsing decisions. Usually "k" is 1 and is often omitted. A context-free grammar is called LR("k") if there exists an LR("k") parser for it.

An LR parser is said to perform bottom-up parsing because it attempts to deduce the top level grammar productions by building up from the leaves.

Many programming languages are described by a grammar that is LR(1), or close to being so, and for this reason LR parsers are often used by compilers to perform syntax analysis of source code.

In typical usage an "LR parser" means a particular parser capable of recognizing a particular language specified by a context free grammar. It is common, however, to use the term to mean a driver program that can be supplied with a suitable table to produce a wide number of different particular LR parsers. However, these programs are more accurately called parser generators.

LR parsing has many benefits:
* Many programming languages can be parsed using some variation of an LR parser. One notable exception is C++.
* LR parsers can be implemented very efficiently.
* Of all parsers that scan their input left to right, LR parsers detect syntactic errors (that is, when the input does not conform to the grammar) as soon as possible.

LR parsers are difficult to produce by hand; they are usually constructed by a parser generator or a compiler-compiler. Depending on how the parsing table is generated, these parsers can be called simple LR parsers (SLR), look-ahead LR parsers (LALR), or canonical LR parsers. These types of parsers can deal with increasingly large sets of grammars; LALR parsers can deal with more grammars than SLR. Canonical LR parsers work on more grammars than LALR parsers. The popular Yacc produces LALR parsers.

Architecture of LR parsers

Conceptually, an LR Parser is a recursive program that can be proven correct by direct computation, and can be implemented efficiently as a recursive ascent parser, a set of mutually-recursive functions for every grammar, much like a recursive descent parser. Conventionally, however, LR parsers are presented and implemented as table-based stack machines in which the call stack of the underlying recursive program is explicitly manipulated.

A table-driven bottom-up parser can be schematically presented as in Figure 1. The following describes a rightmost derivation by this parser.

General case

The parser is a state machine. It consists of:
* an input buffer
* a stack on which is stored a list of states it has been in
* a goto table that prescribes to which new state it should move
* an action table that gives a grammar rule to apply given the current state and the current terminal in the input stream

Parsing algorithm

Since the LR parser reads input from left to right but needs to produce a rightmost derivation, it uses reductions, instead of derivations to process input. That is, the algorithm works by creating a "leftmost reduction" of the input. The end result, when reversed, will be a rightmost derivation.

The LR parsing algorithm works as follows:
# The stack is initialized with [0] . The current state will always be the state that is at the top of the stack.
# Given the current state and the current terminal on the input stream an action is looked up in the action table. There are four cases:
#* a shift s"n":
#** the current terminal is removed from the input stream
#** the state "n" is pushed onto the stack and becomes the current state
#* a reduce r"m":
#** the number "m" is written to the output stream
#** for every symbol in the right-hand side of rule "m" a state is removed from the stack
#** given the state that is then on top of the stack and the left-hand side of rule "m" a new state is looked up in the goto table and made the new current state by pushing it onto the stack
#* an accept: the string is accepted
#* no action: a syntax error is reported
# Step 2 is repeated until either the string is accepted or a syntax error is reported.

Concrete example

To explain its workings we will use the following small grammar whose start symbol is E:

: (1) E → E * B: (2) E → E + B: (3) E → B: (4) B → 0: (5) B → 1

and parse the following input:: 1 + 1

Action and goto tables

The two LR(0) parsing tables for this grammar look as follows:

Constructing the action and goto tables

From this table and the found item sets we construct the action and goto table as follows:
# the columns for nonterminals are copied to the goto table
# the columns for the terminals are copied to the action table as shift actions
# an extra column for '$' (end of input) is added to the action table that contains "acc" for every item set that contains S → E •.
# if an item set "i" contains an item of the form "A" → "w" • and "A" → "w" is rule "m" with "m" > 0 then the row for state "i" in the action table is completely filled with the reduce action r"m".The reader may verify that this results indeed in the action and goto table that were presented earlier on.

A note about LR(0) versus SLR and LALR parsing

Note that only step 4 of the above procedure produces reduce actions, and so all reduce actions must occupy an entire table row, causing the reduction to occur regardless of the next symbol in the input stream. This is why these are LR(0) parse tables: they don't do any lookahead (that is, they look ahead zero symbols) before deciding which reduction to perform. A grammar that needs lookahead to disambiguate reductions would require a parse table row containing different reduce actions in different columns, and the above procedure is not capable of creating such rows.

Refinements to the LR(0) table construction procedure (such as SLR and LALR) are capable of constructing reduce actions that do not occupy entire rows. Therefore, they are capable of parsing more grammars than LR(0) parsers.

Conflicts in the constructed tables

The automaton is constructed in such a way that it is guaranteed to be deterministic. However, when reduce actions are added to the action table it can happen that the same cell is filled with a reduce action and a shift action (a "shift-reduce conflict") or with two different reduce actions (a "reduce-reduce conflict"). However, it can be shown that when this happens the grammar is not an LR(0) grammar.

A small example of a non-LR(0) grammar with a shift-reduce conflict is:

: (1) E → 1 E: (2) E → 1

One of the item sets we then find is:

: Item set 1: E → 1 • E: E → 1 • : + E → • 1 E: + E → • 1

There is a shift-reduce conflict in this item set because in the cell in the action table for this item set and the terminal '1' there will be both a shift action to state 1 and a reduce action with rule 2.

A small example of a non-LR(0) grammar with a reduce-reduce conflict is:

: (1) E → A 1: (2) E → B 2: (3) A → 1: (4) B → 1

In this case we obtain the following item set:

: Item set 1: A → 1 •: B → 1 •

There is a reduce-reduce conflict in this item set because in the cells in the action table for this item set there will be both a reduce action for rule 3 and one for rule 4.

Both examples above can be solved by letting the parser use the follow set (see LL parser) of a nonterminal "A" to decide if it is going to use one of "A"s rules for a reduction; it will only use the rule "A" → "w" for a reduction if the next symbol on the input stream is in the follow set of "A". This solution results in so-called Simple LR parsers.

References

* "", Aho, Sethi, Ullman, Addison-Wesley, 1986. ISBN 0-201-10088-6. An extensive discussion of LR parsing and the principal source for some sections in this article.
* [http://www.cs.vu.nl/~dick/PTAPG.html Parsing Techniques - A Practical Guide] web page of book includes downloadable pdf.
* The Functional Treatment of Parsing (Boston: Kluwer Academic, 1993), R. Leermakers, ISBN 0-7923-9376-7

External links

* [http://cs.uic.edu/~spopuri/cparser.html Internals of an LALR(1) parser generated by GNU Bison] - Implementation issues

ee also

*Deterministic context-free grammar


Wikimedia Foundation. 2010.

Игры ⚽ Поможем написать курсовую

Look at other dictionaries:

  • Parser (magazine) — Parser: New Poetry Poetics is an annual journal of anarchist poetry and poetics from Vancouver, BC, edited by Roger Farr and Reg Johanson. The first issue was published in May 2007, featuring writing by Alice Becker Ho, Alfredo M. Bonanno, P.… …   Wikipedia

  • parser — (izg. pàrser) m DEFINICIJA inform. program koji obavlja sintaktičku analizu nekog jezika; sintaktički analizator ETIMOLOGIJA engl …   Hrvatski jezični portal

  • parser — pars er, n. One who parses. [1913 Webster] …   The Collaborative International Dictionary of English

  • Parser — Ein Parser [ˈpɑːʁzɐ] (engl. to parse, „analysieren“, bzw. lateinisch pars, „Teil“; im Deutschen gelegentlich auch Zerteiler) ist ein Computerprogramm, das in der Computertechnik für die Zerlegung und Umwandlung einer beliebigen Eingabe in ein für …   Deutsch Wikipedia

  • Parser combinator — In functional programming, a parser combinator is a higher order function which accepts several parsers as input and returns a new parser as its output. In this context, a parser is a function accepting strings as input and returning some… …   Wikipedia

  • Parser Combinator — In mathematics and functional programming, Higher Order functions (HOF) are defined as the functions that can take functions as their input and can also produce functions as their output. The use of a HOF as an infix operator in a function… …   Wikipedia

  • Parser — Это статья о языке программирования, об алгоритме синтаксического анализа см. Синтаксический анализ. Parser Семантика: мультипарадигменный Тип исполнения: Интерпретатор компилирующего типа Появился в …   Википедия

  • Parser-Generator — Ein Parsergenerator ist ein Computerprogramm, das unter Eingabe einer Spezifikation einen Parser erzeugt. Inhaltsverzeichnis 1 Grundlagen 2 Algorithmen 3 Siehe auch 4 Weblinks // …   Deutsch Wikipedia

  • Parser — Pạr|ser 〈m. 3; EDV〉 Bestandteil eines Compilers, Programm, das die syntaktische Analyse eines Quellprogramms durchführt, um es in eine Maschinensprache zu übertragen [zu engl. parse „(grammatisch) analysieren“] * * * Pạr|ser , der; s, [engl.… …   Universal-Lexikon

  • Parser (CGI language) — Infobox programming language name = Parser paradigm = multiparadigm macro, object oriented year = since 1997 designer = Konstantin Morshnev (Art. Lebedev Studio) developer = Alexander Petrosyan (Art. Lebedev Studio) latest release version = 3.2.2 …   Wikipedia

  • Parser Grammar Engine — The Parser Grammar Engine (originally Parrot Grammar Engine) or PGE is a compiler and runtime for a Perl 6 rules for the Parrot virtual machine. [cite web | url=http://search.cpan.org/ ltoetsch/parrot 0.2.2/compilers/pge/README | title=Parrot… …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”