- Comparison of regular expression engines
-
Contents
Libraries
Languages
List of languages and frameworks coming with regular expression support Language Official website Software license Remarks .NET MSDN Proprietary C++ since ISO14822:2011(e) D D Boost Software License[Note 1] Go Golang.org BSD-style license Haskell Haskell.org BSD3 Not included in the language report; nor in GHC's Hierarchical Libraries Java Java GNU General Public License REs are written as strings in source code (all backslashes must be doubled, hurting readability). JavaScript/ECMAScript ? Limited but REs are first-class citizens of the language with a specific /.../mod
syntax.Lua Lua.org MIT License Uses a simplified, limited dialect. Can be bound to a more powerful library, like PCRE or an alternative parser like LPeg. Object Pascal (Free Pascal) www.freepascal.org LGPL with static linking exception Free Pascal 2.6+ ships with TRegExpr from Sorokin as well as with 2 other regular expression libraries. See http://wiki.lazarus.freepascal.org/Regexpr Objective-C (Cocoa on iOS only) Apple Proprietary Currently only available on iOS 4+ OCaml Caml LGPL Perl Perl.com Artistic License or the GNU General Public License Full, central part of the language. PHP PHP.net PHP License Has two implementations, with PCRE being the more efficient (speed, functionalities). Python python.org Python Software Foundation License Ruby ruby-doc.org GNU Library General Public License Ruby 1.8 and 1.9 use different engines; Ruby 1.9 integrates Oniguruma. SAP ABAP SAP.com ? Tcl 8.4 tcl.tk Tcl/Tk License
(Permissive, similar to BSD)ActionScript 3 ? ? Language features
NOTE: An application using a library for regular expression support does not necessarily offer the full set of features of the library, e.g. GNU Grep which uses PCRE does not offer lookahead support, though PCRE does.
Part 1
Language feature comparison (part 1) "+" quantifier Negated character classes Non-greedy quantifiers[Note 1] Shy groups[Note 2] Recursion Lookahead Lookbehind Backreferences[Note 3] >9 indexable captures Boost.Regex Yes Yes Yes Yes Yes [Note 4] Yes Yes Yes Yes Boost.Xpressive Yes Yes Yes Yes Yes [Note 5] Yes Yes Yes Yes CL-PPCRE Yes Yes Yes Yes No Yes Yes Yes Yes EmEditor Yes Yes Yes Yes No Yes Yes Yes No FREJ No [Note 6] No Some [Note 6] Yes No No No Yes Yes GLib/GRegex Yes ? Yes ? No ? ? ? ? GNU Grep Yes Yes Yes Yes No Yes Yes Yes ? Haskell Yes Yes Yes Yes No Yes Yes Yes Yes Java Yes Yes Yes Yes No Yes Yes Yes Yes ICU Regex Yes Yes Yes Yes No Yes Yes Yes Yes JGsoft Yes Yes Yes Yes No Yes Yes Yes Yes .NET Yes Yes Yes Yes No Yes Yes Yes Yes OCaml Yes Yes No No No No No Yes No OmniOutliner 3.6.2 Yes Yes Yes No No No No ? ? PCRE Yes Yes Yes Yes Yes Yes Yes Yes Yes Perl Yes Yes Yes Yes Yes Yes Yes Yes Yes PHP Yes Yes Yes Yes Yes Yes Yes Yes Yes Python Yes Yes Yes Yes No Yes Yes Yes Yes Qt/QRegExp Yes Yes Yes Yes No Yes No Yes Yes re2 Yes Yes Yes Yes No No No No Yes Ruby Yes Yes Yes Yes Yes Yes Yes Yes Yes TRE Yes Yes Yes Yes No No No Yes No Vim 7.3a (2010-05-24) Yes Yes Yes Yes No Yes Yes Yes No RGX Yes Yes Yes Yes No Yes Yes Yes Yes TRegExpr Yes ? Yes ? ? ? ? ? ? - ^ Non-greedy quantifiers match as few characters as possible, instead of the default as many. Note that many older, pre-POSIX engines were non-greedy and didn't have greedy quantifiers at all
- ^ Shy groups, also called non-capturing groups cannot be referred to with backreferences; non-capturing groups are used to speed up matching where the groups content needs not be accessed later.
- ^ Backreferences enable referring to previously matched groups in later parts of the regex and/or replacement string (where applicable). For instance, ([ab]+)\1 matches "abab" but not "abaab"
- ^ http://www.boost.org/doc/libs/1_47_0/libs/regex/doc/html/boost_regex/syntax/perl_syntax.html#boost_regex.syntax.perl_syntax.recursive_expressions
- ^ http://www.boost.org/doc/libs/1_47_0/doc/html/xpressive/user_s_guide.html#boost_xpressive.user_s_guide.grammars_and_nested_matches.embedding_a_regex_by_reference
- ^ a b FREJ have no repetitive quantifiers, but have "optional" element which behaves similar to simple "?" quantifier
Part 2
Language feature comparison (part 2) Directives [Note 1] Conditionals Atomic groups [Note 2] Named capture [Note 3] Comments Embedded code Partial matching[clarification needed] Fuzzy matching Unicode property support [3] Boost.Regex Yes Yes Yes Yes Yes No Yes No Some [Note 4] [Note 5] Boost.Xpressive Yes No Yes Yes Yes No Yes No No CL-PPCRE Yes Yes Yes Yes Yes Yes ? No No EmEditor Yes Yes ? ? Yes No Yes No ? FREJ No No Yes Yes Yes No No Yes ? GLib/GRegex Yes Yes Yes Yes Yes No Yes No Some [Note 4] [Note 5] GNU Grep Yes Yes ? Yes Yes No ? No No Haskell ? ? ? ? ? No ? No No Java Yes No Yes Yes [Note 6] No No ? No Some [Note 5] ICU Regex Yes No Yes No Yes No No No Yes [Note 7] JGsoft Yes Yes Yes Yes Yes No Yes ? Some [Note 5] .NET Yes Yes Yes Yes Yes No ? No Some [Note 5] OCaml No No No No No No ? No No OmniOutliner 3.6.2 ? ? ? ? No No ? No ? PCRE Yes Yes Yes Yes [Note 8] Yes Yes Yes No Some [Note 4] [Note 5] Perl Yes Yes Yes Yes [Note 9] Yes Yes No No Yes [Note 7] PHP Yes Yes Yes Yes Yes No No No No Python Yes Yes No Yes Yes No Yes No No Qt/QRegExp No No No No No No Yes No No re2 Yes No ? Yes No No Yes No Some [Note 5] Ruby Yes No Yes Yes Yes Yes No No Some [Note 5] TRE Yes No No No Yes No No Yes ? Vim 7.3a (2010-05-24) Yes No Yes No No No Yes No No RGX Yes Yes Yes Yes Yes No No No Yes - ^ Also known as Flags modifiers or Option letters. Example pattern: "(?i:test)"
- ^ Also called Independent sub-expressions
- ^ Similar to back references but with names instead of indices
- ^ a b c Requires optional Unicode support enabled.
- ^ a b c d e f g h Supports only a subset of Unicode properties, not all of them.
- ^ Available as of JDK7.
- ^ a b Supports all Unicode properties, including non-binary properties.
- ^ Available as of PCRE 7.0 (as of PCRE 4.0 with Python-like syntax
(?P<name>...)
) - ^ Available as of perl 5.9.5
API features
API feature comparison Native UTF-16 support Native UTF-8 support Non-linear input support Dot-matches-newline option Anchor-matches-newline option Boost.Regex No No Yes Yes Yes GLib/GRegex No Yes [Note 1] No Yes Yes ICU Regex Yes No Yes Yes Yes Java Yes No Yes Yes Yes .NET No [Note 2] No Yes Yes Yes PCRE No Yes [Note 1] No Yes Yes Qt/QRegExp Yes No No No No TRE No ? Yes Yes Yes RGX No No ? Yes Yes - ^ Cite error: Invalid
<ref>
tag; no text was provided for refs namedunicode_optional_1
; see Help:Cite errors/Cite error references no text - ^ Implementation is incorrect - it treats UTF-16 code units as characters, so characters outside the BMP don't work properly. See, for example, this bug report [1].
See also
External links
- Regular Expression Flavor Comparison — Detailed comparison of the most popular regular expression flavors
- Regexp Syntax Summary
Categories:- Pattern matching
- Software comparisons
- Regular expressions
Wikimedia Foundation. 2010.