- Scannerless parsing
Scannerless parsing (also called "lexerless parsing") refers to the use of a single formalism to express both the lexical and context-free syntax used to parse a language.
This parsing strategy is suitable when a clear lexer-parser distinction is not required. examples of when this is appropriate include
TeX , most wiki grammars, makefiles, and simple per application control languages.Advantages
* Only a single metasyntax is required
* Non-regular lexical structure is handled easily
* "Token classification" is not required which eliminates the need for design accommodations such as "the lexer hack " and language keywords (such as "while" inC )
* Grammars are compositional (can be merged without human intervention) ref|compositionalDisadvantages
* since the lexical scanning and syntactic parsing processing is combined, the resulting parser tends to be harder to understand and debug for more complex languages
Required Extensions
Unfortunately, when parsed at the character level, most popular programming languages are no longer strictly context-free. Visser identified five key extensions to classical context-free syntax which handle almost all common non-context-free constructs arising in practice:
* Follow restrictions, a limited form of "longest match"
* Reject productions, a limited form of negative matching (as found inboolean grammar s)
* Preference attributes to handle thedangling else construct in C-like languages
* "Per-production transitions" rather than per-nonterminal transitions in order to facilitate:
** Associativity attributes, which prevent a self-reference in a particular production of a nonterminal from producing that same production
** Precedence/priority rules, which prevent self-references in higher-precedence productions from producing lower-precedence productionsImplementations
* [http://dparser.sourceforge.net/ dparser] generates ANSI C code for scannerless
GLR parser s
* the [http://www.cwi.nl/htbin/sen1/twiki/bin/view/SEN1/SGLR SGLR] component of ASF+SDF generates and interprets parse tables for scannerless grammars
* SGLR is also a component of theStratego/XT program transformation system
* Spirit allows for both scannerless and scanner-based parsing
* SBP is a scannerless parser for boolean grammars (a superset of context-free grammars), written in Java.Related Material
[http://citeseer.ist.psu.edu/visser97scannerles.html Visser, E. (1997b). Scannerless generalized-LR parsing. Technical Report P9707, Programming Research Group, University of Amsterdam]
# since lexerless parsers handle the entire class of
context-free languages , which is closed under composition.
Wikimedia Foundation. 2010.