regexp - display the last matching expression

Mirco Wahab · May 9, 2006

Hi Glorious Perlers

how would I find out the last matching
expression from my regular expression.

I remember to have heard about the
trick - but can't get it to work.

Example:

$r = qr/(alpha|beta|gamma) (?{'greek'})
|(aleph|beth|gimel) (?{'hebrew'})
|(DIbvetlh|nobta'|tlhIngan) (?{'klingon'})
/x;

$_ = qq/nobta' tlhIngan Hol yejHaD. ghIq qo'vaD/;

print "$^R: $^N (\$###)\n" while /$r/g;

Instead of (\$###) I'd expect to print
the matching capture group, -> "DIbvetlh|nobta'|tlhIngan"
in the above example.

Who can move me into the right direction?

Thanks

Mirco

Uri Guttman · May 9, 2006

MW> how would I find out the last matching
MW> expression from my regular expression.

MW> $r = qr/(alpha|beta|gamma) (?{'greek'})
MW> |(aleph|beth|gimel) (?{'hebrew'})
MW> |(DIbvetlh|nobta'|tlhIngan) (?{'klingon'})
MW> /x;

MW> $_ = qq/nobta' tlhIngan Hol yejHaD. ghIq qo'vaD/;

MW> print "$^R: $^N (\$###)\n" while /$r/g;

don't use those features (the ?{}, $^N stuff) for this. in fact breaking
up that code into multiple expressions will be cleaner and likely faster
as well.

a simple set of pairs of qr and token is much easier (untested pseudo
code):

my @langs = (
[ qr/(alpha|beta|gamma)/ => 'greek' ],
[ qr/(aleph|beth|gimel)/ => 'hebrew' ],
) ;

that could also be an array of hashes if you want named fields instead
of 0 and 1.

then loop over those and match and return/print/last/whatever. this
assumes it is in a sub and returns.

foreach my $lang_pair ( @langs ) {

return $lang_pair->[1] if /$lang_pair->[0]/ ;
}

uri

Uri Guttman · May 10, 2006

MW> of course, my example was somehow misleading.

MW> Lets take another (simpler) example:
MW> (split some O'Reilly-address by weird regex)

MW> $r = qr{
MW> (\w+): (?{'prot'})
MW> | //([^/]*) (?{'host'})
MW> | \b([^/]+/) (?{'path'})
MW> | \b([.\w]+$)(?{'file'})
MW> }x;

MW> $_ = qq{http://conferences.oreillynet.com/pub/w/46/speakers.html};
MW> print "$^R = \t$^N\n" while /$r/g;

this is no different that the previous code. it still uses all those new
fangle features which may be experimental (i didn't check this time).

MW> - which shows the expected result:
MW> prot = http
MW> host = conferences.oreillynet.com
MW> path = pub/
MW> path = w/
MW> path = 46/
MW> file = speakers.html

so?

MW> but how could you (or I) manage (for educatory
MW> use) to display the actual matched regex part
MW> _instead_ of the ?{code assertion}, something like:

MW> \w+ = http
MW> [^/]* = conferences.oreillynet.com
MW> [^/]+/ = pub/

did you read my example at all? just break it up into a list of regexes
and apply them in a loop. why are you stuck with that horrible fat
regex? it is hard to read and understand since few know those features
well and they are not commonly used. my code will likely be faster as
alternation in regexes are slow. what do you think you gain from your
code?

just rewrite it like i showed or something similar and do a simple loop
over the regexes (or pairs or hashes or whatever structure works). but
get away from $^R and $^N and friends.

uri

Mumia W. · May 10, 2006

Mirco said:
Hi Uri

...
a simple set of pairs of qr and token is much easier (untested pseudo
code):

Click to expand...

of course, my example was somehow misleading.

Lets take another (simpler) example:
(split some O'Reilly-address by weird regex)

$r = qr{
(\w+): (?{'prot'})
| //([^/]*) (?{'host'})
| \b([^/]+/) (?{'path'})
| \b([.\w]+$)(?{'file'})
}x;

$_ = qq{http://conferences.oreillynet.com/pub/w/46/speakers.html};
print "$^R = \t$^N\n" while /$r/g;

- which shows the expected result:
prot = http
host = conferences.oreillynet.com
path = pub/
path = w/
path = 46/
file = speakers.html

but how could you (or I) manage (for educatory
use) to display the actual matched regex part
_instead_ of the ?{code assertion}, something like:

\w+ = http
[^/]* = conferences.oreillynet.com
[^/]+/ = pub/

etc, you know what I mean ...

Any ideas?

Thanks & regards

Mirco

I couldn't find the $^GIMME variable in "man perlvar"

But if you're willing to dispense with the qr{} operator, you can do it
this way:

use strict;
use warnings;

$_ = qq{http://conferences.oreillynet.com/pub/w/46/speakers.html};
my @array = (
['proto', '(\w+)'],
['host','://([^/]*)'],
['path','(.+?/)([.\w]+)$'],
['file','([.\w]+$)' ],
);

for my $var (@array) {
my ($name, $rx) = @{$var};
if (s/^$rx//) {
printf "%7s %-20s => %s\n", $name, $rx, $1;
$_ = $2 . $_ if defined ($2);
}
}

Note: It was easier for me to keep the path elements together, but I'm
sure some smart person can figure out how to separate them.

Mumia W. · May 10, 2006

Mumia said:
[...]
Note: It was easier for me to keep the path elements together, but I'm
sure some smart person can figure out how to separate them.

I got 'em separated:

use strict;
use warnings;

$_ = qq{http://conferences.oreillynet.com/pub/w/46/speakers.html};
my @array = (
['proto', '(\w+)'],
['host','://([^/]*)'],
['path','(.+?/)'],
['file','([.\w]+)$'],
);

$var = 0;
while ($var < @array) {
local our ($name, $rx) = @{$array[$var]};
if (s/^$rx//) {
printf "%7s %-20s => %s\n", $name, $rx, $1;
} else {
$var++;
}
}
exit;

FAQ 4.23 How do I find matching/nesting anything?	0	Apr 2, 2011
regexp(ing) Backus-Naurish expressions ...	7	Mar 13, 2013
How do I get the text that is found by a regular expression?	10	Apr 30, 2014
Matching multiple subexpressions in a regular expression	10	Mar 11, 2008
FAQ 5.4 How do I delete the last N lines from a file?	0	Jan 31, 2011
FAQ 6.2 I'm having trouble matching over more than one line. What's wrong?	0	Jan 30, 2011
FAQ 6.20 What good is "\G" in a regular expression?	0	Mar 3, 2011
Regular Expression Help	3	Apr 12, 2009

regexp - display the last matching expression

Mirco Wahab

Uri Guttman

Uri Guttman

Mumia W.

Mumia W.

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads