- LR parser
In
computer science , an LR parser is aparser forcontext-free grammar s that reads input from Left to right and produces a Rightmost derivation. The term LR("k") parser is also used; here the "k" refers to the number of unconsumed "look ahead" input symbols that are used in making parsing decisions. Usually "k" is 1 and is often omitted. A context-free grammar is called LR("k") if there exists an LR("k") parser for it.An LR parser is said to perform
bottom-up parsing because it attempts to deduce the top level grammar productions by building up from the leaves.Many
programming language s are described by a grammar that is LR(1), or close to being so, and for this reason LR parsers are often used bycompiler s to perform syntax analysis ofsource code .In typical usage an "LR parser" means a particular parser capable of recognizing a particular language specified by a context free grammar. It is common, however, to use the term to mean a driver program that can be supplied with a suitable table to produce a wide number of different particular LR parsers. However, these programs are more accurately called parser generators.
LR parsing has many benefits:
* Many programming languages can be parsed using some variation of an LR parser. One notable exception isC++ .
* LR parsers can be implemented very efficiently.
* Of all parsers that scan their input left to right, LR parsers detect syntactic errors (that is, when the input does not conform to the grammar) as soon as possible.LR parsers are difficult to produce by hand; they are usually constructed by a
parser generator or acompiler-compiler . Depending on how the parsing table is generated, these parsers can be calledsimple LR parser s (SLR), look-ahead LR parsers (LALR), orcanonical LR parser s. These types of parsers can deal with increasingly large sets of grammars; LALR parsers can deal with more grammars than SLR. Canonical LR parsers work on more grammars than LALR parsers. The popularYacc produces LALR parsers.Architecture of LR parsers
Conceptually, an LR Parser is a recursive program that can be proven correct by direct computation, and can be implemented efficiently as a
recursive ascent parser , a set of mutually-recursive functions for every grammar, much like arecursive descent parser . Conventionally, however, LR parsers are presented and implemented as table-based stack machines in which thecall stack of the underlying recursive program is explicitly manipulated.A table-driven bottom-up parser can be schematically presented as in Figure 1. The following describes a rightmost derivation by this parser.
General case
The parser is a state machine. It consists of:
* an input buffer
* a stack on which is stored a list of states it has been in
* a goto table that prescribes to which new state it should move
* an action table that gives a grammar rule to apply given the current state and the current terminal in the input streamParsing algorithm
Since the LR parser reads input from left to right but needs to produce a rightmost derivation, it uses reductions, instead of derivations to process input. That is, the algorithm works by creating a "leftmost reduction" of the input. The end result, when reversed, will be a rightmost derivation.
The LR parsing algorithm works as follows:
# The stack is initialized with [0] . The current state will always be the state that is at the top of the stack.
# Given the current state and the current terminal on the input stream an action is looked up in the action table. There are four cases:
#* a shift s"n":
#** the current terminal is removed from the input stream
#** the state "n" is pushed onto the stack and becomes the current state
#* a reduce r"m":
#** the number "m" is written to the output stream
#** for every symbol in the right-hand side of rule "m" a state is removed from the stack
#** given the state that is then on top of the stack and the left-hand side of rule "m" a new state is looked up in the goto table and made the new current state by pushing it onto the stack
#* an accept: the string is accepted
#* no action: a syntax error is reported
# Step 2 is repeated until either the string is accepted or a syntax error is reported.Concrete example
To explain its workings we will use the following small grammar whose start symbol is E:
: (1) E → E * B: (2) E → E + B: (3) E → B: (4) B → 0: (5) B → 1
and parse the following input:: 1 + 1
Action and goto tables
The two LR(0) parsing tables for this grammar look as follows:
Constructing the action and goto tables
From this table and the found item sets we construct the action and goto table as follows:
# the columns for nonterminals are copied to the goto table
# the columns for the terminals are copied to the action table as shift actions
# an extra column for '$' (end of input) is added to the action table that contains "acc" for every item set that contains S → E •.
# if an item set "i" contains an item of the form "A" → "w" • and "A" → "w" is rule "m" with "m" > 0 then the row for state "i" in the action table is completely filled with the reduce action r"m".The reader may verify that this results indeed in the action and goto table that were presented earlier on.A note about LR(0) versus SLR and LALR parsing
Note that only step 4 of the above procedure produces reduce actions, and so all reduce actions must occupy an entire table row, causing the reduction to occur regardless of the next symbol in the input stream. This is why these are LR(0) parse tables: they don't do any lookahead (that is, they look ahead zero symbols) before deciding which reduction to perform. A grammar that needs lookahead to disambiguate reductions would require a parse table row containing different reduce actions in different columns, and the above procedure is not capable of creating such rows.
Refinements to the LR(0) table construction procedure (such as SLR and LALR) are capable of constructing reduce actions that do not occupy entire rows. Therefore, they are capable of parsing more grammars than LR(0) parsers.
Conflicts in the constructed tables
The automaton is constructed in such a way that it is guaranteed to be deterministic. However, when reduce actions are added to the action table it can happen that the same cell is filled with a reduce action and a shift action (a "shift-reduce conflict") or with two different reduce actions (a "reduce-reduce conflict"). However, it can be shown that when this happens the grammar is not an LR(0) grammar.
A small example of a non-LR(0) grammar with a shift-reduce conflict is:
: (1) E → 1 E: (2) E → 1
One of the item sets we then find is:
: Item set 1: E → 1 • E: E → 1 • : + E → • 1 E: + E → • 1
There is a shift-reduce conflict in this item set because in the cell in the action table for this item set and the terminal '1' there will be both a shift action to state 1 and a reduce action with rule 2.
A small example of a non-LR(0) grammar with a reduce-reduce conflict is:
: (1) E → A 1: (2) E → B 2: (3) A → 1: (4) B → 1
In this case we obtain the following item set:
: Item set 1: A → 1 •: B → 1 •
There is a reduce-reduce conflict in this item set because in the cells in the action table for this item set there will be both a reduce action for rule 3 and one for rule 4.
Both examples above can be solved by letting the parser use the follow set (see
LL parser ) of a nonterminal "A" to decide if it is going to use one of "A"s rules for a reduction; it will only use the rule "A" → "w" for a reduction if the next symbol on the input stream is in the follow set of "A". This solution results in so-calledSimple LR parser s.References
* "", Aho, Sethi, Ullman,
Addison-Wesley , 1986. ISBN 0-201-10088-6. An extensive discussion of LR parsing and the principal source for some sections in this article.
* [http://www.cs.vu.nl/~dick/PTAPG.html Parsing Techniques - A Practical Guide] web page of book includes downloadable pdf.
* The Functional Treatment of Parsing (Boston: Kluwer Academic, 1993), R. Leermakers, ISBN 0-7923-9376-7External links
* [http://cs.uic.edu/~spopuri/cparser.html Internals of an LALR(1) parser generated by GNU Bison] - Implementation issues
ee also
*
Deterministic context-free grammar
Wikimedia Foundation. 2010.