# STANDARD USAGE use Regexp::Common; while (<>) { /$RE{num}{real}/ and print q{a number}; /$RE{quoted}/ and print q{a ['"`] quoted string}; m[$RE{delimited}{-delim=>'/'}] and print q{a /.../ sequence}; /$RE{balanced}{-parens=>'()'}/ and print q{balanced parentheses}; /$RE{profanity}/ and print q{a #*@%-ing word}; } # SUBROUTINE-BASED INTERFACE use Regexp::Common 'RE_ALL'; while (<>) { $_ =~ RE_num_real() and print q{a number}; $_ =~ RE_quoted() and print q{a ['"`] quoted string}; $_ =~ RE_delimited(-delim=>'/') and print q{a /.../ sequence}; $_ =~ RE_balanced(-parens=>'()'} and print q{balanced parentheses}; $_ =~ RE_profanity() and print q{a #*@%-ing word}; } # IN-LINE MATCHING... if ( $RE{num}{int}->matches($text) ) {...} # ...AND SUBSTITUTION my $cropped = $RE{ws}{crop}->subs($uncropped); # ROLL-YOUR-OWN PATTERNS use Regexp::Common 'pattern'; pattern name => ['name', 'mine'], create => '(?i:J[.]?\s+A[.]?\s+Perl-Hacker)', ; my $name_matcher = $RE{name}{mine}; pattern name => [ 'lineof', '-char=_' ], create => sub { my $flags = shift; my $char = quotemeta $flags->{-char}; return '(?:^$char+$)'; }, match => sub { my ($self, $str) = @_; return $str !~ /[^$self->{flags}{-char}]/; }, subs => sub { my ($self, $str, $replacement) = @_; $_[1] =~ s/^$self->{flags}{-char}+$//g; }, ; my $asterisks = $RE{lineof}{-char=>'*'}; # DECIDING WHICH PATTERNS TO LOAD. use Regexp::Common qw /comment number/; # Comment and number patterns. use Regexp::Common qw /no_defaults/; # Don't load any patterns. use Regexp::Common qw /!delimited/; # All, but delimited patterns.
There is an alternative, subroutine-based syntax described in ``Subroutine-based interface''.
$RE{num}{real}
and to access the pattern that matches integers:
$RE{num}{int}
Deeper layers of the hash are used to specify flags: arguments that modify the resulting pattern in some way. The keys used to access these layers are prefixed with a minus sign and may have a value; if a value is given, it's done by using a multidimensional key. For example, to access the pattern that matches base-2 real numbers with embedded commas separating groups of three digits (e.g. 10,101,110.110101101):
$RE{num}{real}{-base => 2}{-sep => ','}{-group => 3}
Through the magic of Perl, these flag layers may be specified in any order (and even interspersed through the identifier keys!) so you could get the same pattern with:
$RE{num}{real}{-sep => ','}{-group => 3}{-base => 2}
or:
$RE{num}{-base => 2}{real}{-group => 3}{-sep => ','}
or even:
$RE{-base => 2}{-group => 3}{-sep => ','}{num}{real}
etc.
Note, however, that the relative order of amongst the identifier keys is significant. That is:
$RE{list}{set}
would not be the same as:
$RE{set}{list}
if ($str =~ $RE{num}{real}{-keep}) { $number = $1; $whole = $3; $decimals = $5; }
Special care is needed if a ``kept'' pattern is interpolated into a larger regular expression, as the presence of other capturing parentheses is likely to change the ``number variables'' into which significant substrings are saved.
See also ``Adding new regular expressions'', which describes how to create new patterns with ``optional'' capturing brackets that respond to "-keep".
if ($str =~ /$RE{some}{pattern}/ ) {...}
you can write:
if ( $RE{some}{pattern}->matches($str) ) {...}
For matching this would seem to have no great advantage apart from readability (but see below).
For substitutions, it has other significant benefits. Frequently you want to perform a substitution on a string without changing the original. Most people use this:
$changed = $original; $changed =~ s/$RE{some}{pattern}/$replacement/;
The more adept use:
($changed = $original) =~ s/$RE{some}{pattern}/$replacement/;
Regexp::Common allows you do write this:
$changed = $RE{some}{pattern}->subs($original=>$replacement);
Apart from reducing precedence-angst, this approach has the added advantages that the substitution behaviour can be optimized from the regular expression, and the replacement string can be provided by default (see ``Adding new regular expressions'').
For example, in the implementation of this substitution:
$cropped = $RE{ws}{crop}->subs($uncropped);
the default empty string is provided automatically, and the substitution is optimized to use:
$uncropped =~ s/^\s+//; $uncropped =~ s/\s+$//;
rather than:
$uncropped =~ s/^\s+|\s+$//g;
my $num = $RE{num}{int}; my $commad = $num->{-sep=>','}{-group=>3}; my $duodecimal = $num->{-base=>12};
However, the use of tied hashes does make the access to Regexp::Common patterns slower than it might otherwise be. In contexts where impatience overrules laziness, Regexp::Common provides an additional subroutine-based interface.
For each (sub-)entry in the %RE hash ($RE{key1}{key2}{etc}), there is a corresponding exportable subroutine: "RE_key1_key2_etc()". The name of each subroutine is the underscore-separated concatenation of the non-flag keys that locate the same pattern in %RE. Flags are passed to the subroutine in its argument list. Thus:
use Regexp::Common qw( RE_ws_crop RE_num_real RE_profanity ); $str =~ RE_ws_crop() and die "Surrounded by whitespace"; $str =~ RE_num_real(-base=>8, -sep=>" ") or next; $offensive = RE_profanity(-keep); $str =~ s/$offensive/$bad{$1}++; "<expletive deleted>"/ge;
Note that, unlike the hash-based interface (which returns objects), these subroutines return ordinary "qr"'d regular expressions. Hence they do not curry, nor do they provide the OO match and substitution inlining described in the previous section.
It is also possible to export subroutines for all available patterns like so:
use Regexp::Common 'RE_ALL';
Or you can export all subroutines with a common prefix of keys like so:
use Regexp::Common 'RE_num_ALL';
which will export "RE_num_int" and "RE_num_real" (and if you have create more patterns who have first key num, those will be exported as well). In general, RE_key1_..._keyn_ALL will export all subroutines whose pattern names have first keys key1 ... keyn.
pattern name => [qw( line of -char )], # other args here ;
This specifies an entry $RE{line}{of}, which may take a "-char" flag.
Flags may also be specified with a default value, which is then used whenever the flag is specified without an explicit value (but not when the flag is omitted). For example:
pattern name => [qw( line of -char=_ )], # default char is '_' # other args here ;
pattern name => [qw( line of underscores )], create => q/(?:^_+$)/ ;
or a reference to a subroutine that will be called to create the pattern:
pattern name => [qw( line of -char=_ )], create => sub { my ($self, $flags) = @_; my $char = quotemeta $flags->{-char}; return '(?:^$char+$)'; }, ;
If the subroutine version is used, the subroutine will be called with three arguments: a reference to the pattern object itself, a reference to a hash containing the flags and their values, and a reference to an array containing the non-flag keys.
Whatever the subroutine returns is stringified as the pattern.
No matter how the pattern is created, it is immediately postprocessed to include or exclude capturing parentheses (according to the value of the "-keep" flag). To specify such ``optional'' capturing parentheses within the regular expression associated with "create", use the notation "(?k:...)". Any parentheses of this type will be converted to "(...)" when the "-keep" flag is specified, or "(?:...)" when it is not. It is a Regexp::Common convention that the outermost capturing parentheses always capture the entire pattern, but this is not enforced.
The subroutine should expect two arguments: a reference to the pattern object itself, and the string to be matched against.
It should return the same types of values as a "m/.../" does.
pattern name => [qw( line of -char )], create => sub {...}, match => sub { my ($self, $str) = @_; $str !~ /[^$self->{flags}{-char}]/; }, ;
The subroutine should expect three arguments: a reference to the pattern object itself, the string to be changed, and the value to be substituted into it. The third argument may be "undef", indicating the default substitution is required.
The subroutine should return the same types of values as an "s/.../.../" does.
For example:
pattern name => [ 'lineof', '-char=_' ], create => sub {...}, subs => sub { my ($self, $str, $ignore_replacement) = @_; $_[1] =~ s/^$self->{flags}{-char}+$//g; }, ;
Note that such a subroutine will almost always need to modify $_[1] directly.
Examples:
use Regexp::Common qw /comment number/; # Comment and number patterns. use Regexp::Common qw /no_defaults/; # Don't load any patterns. use Regexp::Common qw /!delimited/; # All, but delimited patterns.
It's also possible to load your own set of patterns. If you have a module "Regexp::Common::my_patterns" that makes patterns available, you can have it made available with
use Regexp::Common qw /my_patterns/;
Note that the default patterns will still be made available - only if you use no_defaults, or mention one of the default sets explicitly, the non mentioned defaults aren't made available.
Currently available are:
* email addresses * HTML/XML tags * more numerical matchers, * mail headers (including multiline ones), * more URLS * telephone numbers of various countries * currency (universal 3 letter format, Latin-1, currency names) * dates * binary formats (e.g. UUencoded, MIMEd)
If you have other patterns or pattern generators that you think would be generally useful, please send them to the maintainer --- preferably as source code using the "pattern" subroutine. Submissions that include a set of tests will be especially welcome.
Further thanks go to: Alexandr Ciornii, Blair Zajac, Bob Stockdale, Charles Thomas, Chris Vertonghen, the CPAN Testers, David Hand, Fany, Geoffrey Leach, Hermann-Marcus Behrens, Jerome Quelin, Jim Cromie, Lars Wilke, Linda Julien, Mike Arms, Mike Castle, Mikko, Murat Uenalan, Rafaƫl Garcia-Suarez, Ron Savage, Sam Vilain, Slaven Rezic, Smylers, Tim Maher, and all the others I've forgotten.
For a start, there are many common regexes missing. Send them in to regexp-common@abigail.be.
There are some POD issues when installing this module using a pre-5.6.0 perl; some manual pages may not install, or may not install correctly using a perl that is that old. You might consider upgrading your perl.
my $integer = $RE {num} {int}; $subj =~ /^$integer$/ and print "Matches!\n";
This module is free software, and maybe used under any of the following licenses:
1) The Perl Artistic License. See the file COPYRIGHT.AL. 2) The Perl Artistic License 2.0. See the file COPYRIGHT.AL2. 3) The BSD License. See the file COPYRIGHT.BSD. 4) The MIT License. See the file COPYRIGHT.MIT.