Perl 6 rules

Perl 6 rules

Perl 6 rules are Perl 6's regular expression, pattern matching and general-purpose parsing facility, and are a core part of the language. Since Perl's pattern-matching constructs have exceeded the capabilities of formal regular expressions for some time, Perl 6 documentation will exclusively refer to them as "regexes", distancing the term from the formal definition.

Perl 6 provides a superset of Perl 5 features with respect to regexes, folding them into a larger framework called "rules" which provide the capabilities of a parsing expression grammar, as well as acting as a closure with respect to their lexical scope. [cite web | url=http://dev.perl.org/perl6/doc/design/syn/S05.html | title=Synopsis 5: Regexes and Rules | author=Wall, Larry | date=June 24, 2002 ] Rules are introduced with the rule keyword which has a usage quite similar to subroutine definition. Anonymous rules can also be introduced with the regex (or rx) keyword, or they can simply be used inline as regexps were in Perl 5 via the m (matching) or s (search and replace) operators.

History

In "Apocalypse 5", Larry Wall enumerated 20 problems with "current regex culture". Among these were that Perl's regexes were "too compact and 'cute'", had "too much reliance on too few metacharacters", "little support for named captures", "little support for grammars", and "poor integration with [the] 'real' language". [cite web | url=http://dev.perl.org/perl6/doc/design/apo/A05.html | title=Apocalypse 5: Pattern Matching | author=Wall, Larry | date=June 4, 2002 ]

Between late 2004 and mid-2005, a compiler for Perl 6 style rules was developed for the Parrot virtual machine called Parrot Grammar Engine (PGE) which was later re-named to the more generic, Parser Grammar Engine. PGE is a combination of runtime and compiler for Perl 6 style grammars that allows any parrot-based compiler to use these tools for parsing, and also to provide rules to their runtimes.

Among other Perl 6 features, support for named captures was added to Perl 5.10 in 2007 [ [http://perlbuzz.com/2007/12/perl-510-now-available.html Perl 5.10 now available - Perl Buzz ] ] .

Changes from Perl 5

There are only six unchanged features from Perl 5's regexes:
* Literals: word characters (letters, numbers and underscore) will be matched literally.
* Capturing: (...)
* Alternatives: |
* Backslash escape:
* Repetition quantifiers: *, +, and ?, but not {m,n}
* Minimal matching suffix: *?, +?, ??A few of the most powerful additions include:
* The ability to reference rules using to build up entire grammars.
* A handful of commit operators that allow the programmer to control backtracking during matching.The following changes greatly improve the readability of regexes
* Simplified non-capturing groups: [...] which are the same as Perl 5's: (?:...)
* Simplified code assertions:
* Extended regex formatting (Perl 5's /x) is now the default.

Implicit changes

Some of the features of Perl 5 regular expressions become more powerful in Perl 6 because of their ability to encapsulate the expanded features of Perl 6 rules. For example, in Perl 5, there were positive and negative lookahead operators (?=...) and (?!...). In Perl 6 these same features exist, but are called and .

However, because before can encapsulate arbitrary rules, it can be used to express lookahead as a syntactic predicate for a grammar. For example, the following parsing expression grammar describes the classic non-context-free language { a^n b^n c^n : n ge 1 } :

S ← &(A !b) a+ B A ← a A? b B ← b B? c

In Perl 6 rules that would be:

rule S { > a+ }rule A { a ? b }rule B { b ? c }

Of course, given the ability to mix rules and regular code, that can be simplified even further:

rule S { (a+) (b+) (c+) <{$0.elems = $1.elems = $2.elems}> }

However, this makes use of assertions, which is a subtly different concept in Perl 6 rules but more substantially different in parsing theory, making this a semantic rather than syntactic predicate. The most important difference in practice is performance. There is no way for the rule engine to know what conditions will be matched by the assertion, so no optimization of this process can be made.

Integration with Perl

In many languages, regular expressions are entered as strings, which are then passed to library routines that parse and compile them into an internal state. In Perl 5, regular expressions shared some of the lexical analysis with Perl's scanner. This simplified many aspects of regular expression usage, though it added a great deal of complexity to the scanner. In Perl 6, rules are part of the grammar of the language. No separate parser exists for rules, as it did in Perl 5. This means that code, embedded in rules, is parsed at the same time as the rule itself and its surrounding code. For example, it is possible to nest rules and code without re-invoking the parser:

rule ab { (a.) # match "a" followed by any character # Then check to see if that character was "b" # If so, print a message. { $0 ~~ /b {say "found the b"}/ } }

The above is a single block of Perl 6 code which contains an outer rule definition, an inner block of assertion code, and inside of that a regex that contains one more level of assertion.

Implementation

Keywords

There are several keywords used in conjunction with Perl 6 rules:

;regex: A named or anonymous regex which will ignore whitespace within the regex by default.;rule: A named or anonymous regex which implies the :ratchet and :sigspace modifiers.;token: A named or anonymous regex which implies the :ratchet modifier.;rx: An anonymous regex which can take arbitrary delimiters such as // where regex can only take braces.;m: An operator form of anonymous regex which can be used to perform matches with arbitrary delimiters.;ms: Shorthand for m with the :sigspace modifier.;s: An operator form of anonymous regex which can be used to perform substitution with arbitrary delimiters.;ss: Shorthand for s with the :sigspace modifier.;/.../: Simply placing a regex between slashes is shorthand for m/.../.

Here is an example of typical use:

token word { w+ } rule phrase { [ , ] * . } if $string ~~ / / { ... }

Modifiers

Modifiers may be placed after any of the regex keywords, and before the delimiter. If a regex is named, the modifier comes after the name. Modifiers control the way regexes are parsed and how they behave. They are always introduced with a leading : character.

Some of the more important modifiers include:

* :i or :ignorecase &ndash; Perform matching without respect to case.
* :g or :global &ndash; Perform the match more than once on a given target string.
* :s or :sigspace &ndash; Replace whitespace in the regex with a whitespace-matching rule, rather than simply ignoring it.
* :Perl5 &ndash; Treat the regex as a Perl 5 regular expression.
* :ratchet &ndash; Never perform backtracking in the rule.

For example:

rule addition :ratchet :sigspace { + }

Grammars

A grammar may be defined using the grammar operator. A grammar is essentially just a namespace for rules:

grammar Str::SprintfFormat { regex format_token { %: ? ? ? } token index { d+ $ } token precision { ? ? } token flags { < [ +0#-] >+ } token precision_count { [ < [1-9] >d* | * ] ? [ . [ d* | * ] ] ? } token vector { *? v } token modifier { ll | < [lhmVqL] > } token directive { < [%csduoxefgXEGbpniDUOF] > } }

This is the grammar used to define Perl's sprintf string formatting notation.

Outside of this namespace, you could use these rules like so:

if / / { ... }

A rule used in this way is actually identical to the invocation of a subroutine with the extra semantics and side-effects of pattern matching (e.g. rule invocations can be backtracked).

Examples

Here are some example rules in Perl 6:

rx { a [ b | c ] ( d | e ) f : g } rx { ( ab* ) <{ $1.size % 2 = 0 }> }

That last is identical to:

rx { ( ab [bb] * ) }

References

External links

* [http://www.programmersheaven.com/2/Perl6-FAQ-Regex Perl 6 Regex FAQ] - Answers a range of questions about Perl 6 regexes.


Wikimedia Foundation. 2010.

Look at other dictionaries:

  • PERL — Paradigmen: prozedural, modular, teilweise objektorientiert Erscheinungsjahr: 1987 Entwickler: Larry Wall, Perl Porter Aktuelle  …   Deutsch Wikipedia

  • Perl (Programmiersprache) — Perl Paradigmen: prozedural, modular, teilweise objektorientiert Erscheinungsjahr: 1987 Entwickler: Larry Wall, Perl Porter Aktuelle Version …   Deutsch Wikipedia

  • Perl 6 — Saltar a navegación, búsqueda Perl 6 Paradigma: multiparadigma Apareció en: 2000 Diseñado por: Larry Wall y la comunidad Perl Tipo de dato: Dinámico y estático Implementaciones …   Wikipedia Español

  • Perl Compatible Regular Expressions — Original author(s) Philip Hazel Stable release 8.20 / 2011 10 21; 25 days ago (2011 10 21) Written in C …   Wikipedia

  • Perl 6 — Apparu en Spécification : 2001 2011, mise en œuvre partielle par Rakudo Star Auteur …   Wikipédia en Français

  • PERL, JOSEPH — (1773–1839), author of significant satirical works and leading figure in the Galician haskalah . Perl was born in Tarnopol, where he spent most of his life. In his youth he was attracted to Ḥasidism and acquired knowledge of the movement s way of …   Encyclopedia of Judaism

  • Perl 6 — Infobox programming language name = Perl paradigm = Multi paradigm year = 2000 designer = Larry Wall latest release version = pre release latest release date = typing = dynamic, static influenced by = Perl 5, Haskell, Smalltalk influenced =… …   Wikipedia

  • Programmiersprache Perl — Perl Paradigmen: prozedural, modular, teilweise objektorientiert Erscheinungsjahr: 1987 Entwickler: Larry Wall, Perl Porter Aktuelle  …   Deutsch Wikipedia

  • Perl — This article is about the programming language. For other uses, see Perl (disambiguation). Perl Paradig …   Wikipedia

  • PGE (Perl) — Moteur d analyse de grammaire Pour les articles homonymes, voir PGE. Le Parser Grammar Engine (PGE ou en français, moteur d analyse de grammaire) est un compilateur et un moteur d exécution pour les regex Perl 6 pour la machine virtuelle… …   Wikipédia en Français

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”