Maximal munch

Maximal munch

In computer programming and computer science, "maximal munch" or "longest match" is the principle that when creating some construct, as much of the available input as possible should be consumed. It is similar, though not entirely identical, to the concept of a greedy algorithm.

Contents

Application

For instance, the lexical syntax of many programming languages requires that tokens be built from the maximum possible number of characters from the input stream. This is done to resolve the problem of inherent ambiguity in commonly used regular expressions such as [a-z]+ (one or more lower-case letters).[1]

The term has also been used in the field of compilers to describe a process of "tiling" — determining how a structured tree representing a program in an intermediate language should be converted into linear machine code. An entire subtree might be converted into just one machine instruction, and the problem is how to split the tree into non-overlapping "tiles," each representing one machine instruction. An effective strategy is simply to make a tile of the largest subtree possible at any given point, which is called "maximal munch."[2]

Drawbacks

In some situations, "maximal munch" leads to undesirable or unintuitive outcomes. For example, C++ uses the "angle bracket" characters < and > in the syntax for template specialization, but two consecutive > characters are interpreted as the right-shift operator >>.[3] Prior to C++11, the following code would produce a parse error, because the right-shift operator token is encountered instead of two right-angle-bracket tokens:

    // This is not valid syntax in standard C++.
    std::vector<std::vector<int>> incorrect;
    // Adding whitespace between the brackets yields the correct parse.
    std::vector<std::vector<int> > correct;

The C++11 standard adopted in August 2011 amended the grammar so that a right-shift token is accepted as synonymous with a pair of right-angle brackets, (as in Java,) which complicates the grammar but allows the continued use of the maximal munch principle.

Alternatives

Programming languages researchers have also responded by replacing or supplementing the principle of maximal munch with other lexical disambiguation tactics. One approach is to utilize "follow restrictions," which instead of directly taking the longest match will put some restrictions on what characters can follow a valid match. For example, stipulating that strings matching [a-z]+ cannot be followed by an alphabetic character achieves the same effect as maximal munch with that regular expression.[4] Another approach is to keep the principle of maximal munch but make it subordinate to some other principle, such as context (e.g., the right-shift token in Java would not be matched in the context of a template expression, where it is syntactically invalid).[5]

References

  1. ^ Aho et al., 168.
  2. ^ Page, 470.
  3. ^ Vandevoorde.
  4. ^ Van den Brand et al., 26.
  5. ^ Van Wyk et al., 63.

Bibliography


Wikimedia Foundation. 2010.

Игры ⚽ Поможем решить контрольную работу

Look at other dictionaries:

  • maximal munch — noun A method of parsing or lexing where the longest possible section of the input is matched on each iteration …   Wiktionary

  • Munch — may refer to: Edvard Munch (1863–1944), a Norwegian expressionist painter and printmaker best known for his work The Scream Edvard Munch (film), 1973 biographical film written and directed by Peter Watkins Munch Museum in Oslo, Norway, dedicated… …   Wikipedia

  • Lexical analysis — In computer science, lexical analysis is the process of converting a sequence of characters into a sequence of tokens. Programs performing lexical analysis are called lexical analyzers or lexers. A lexer is often organized as separate scanner and …   Wikipedia

  • C syntax — The syntax of the C programming language is a set of rules that specifies whether the sequence of characters in a file is conforming C source code. The rules specify how the character sequences are to be chunked into tokens (the lexical grammar) …   Wikipedia

  • Deutscher Angriff auf Stalingrad — Teil von: Schlacht von Stalingrad (Zweiter Weltkrieg) …   Deutsch Wikipedia

  • test — 1. To prove; to try a substance; to determine the chemical nature of a substance by means of reagents. 2. A method of examination, as to determine the presence or absence of a definite disease or of some substance in any of the fluids, tissues,… …   Medical dictionary

  • Motorrad-WM-Saison 1970 — Inhaltsverzeichnis 1 Punkteverteilung 2 Wissenswertes 2.1 Regeländerungen 2.2 Todesfälle …   Deutsch Wikipedia

  • Grimlock — is the name of several fictional characters in the Transformers universes.Transformers: Generation 1Transformers character name = Grimlock caption = affiliation = Autobot subgroup =Action Masters, Deluxe Vehicles, Dinobots, Pretenders rank = 9 5… …   Wikipedia

  • Diamant-Kaltnadel — Kaltnadelradierung Die Kaltnadelradierung ist ein grafisches Tiefdruckverfahren, eine mögliche Form der Radierung. Inhaltsverzeichnis 1 Grundlagen 2 …   Deutsch Wikipedia

  • Exzellenzcluster — Die Exzellenzinitiative des Bundes und der Länder zur Förderung von Wissenschaft und Forschung an deutschen Hochschulen ist das Ergebnis langwieriger Verhandlungen zwischen dem Bund und den Ländern in Deutschland. Sie zielt darauf ab,… …   Deutsch Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”