regexp - display the last matching expression

M

Mirco Wahab

Hi Glorious Perlers

how would I find out the last matching
expression from my regular expression.

I remember to have heard about the
trick - but can't get it to work.

Example:

$r = qr/(alpha|beta|gamma) (?{'greek'})
|(aleph|beth|gimel) (?{'hebrew'})
|(DIbvetlh|nobta'|tlhIngan) (?{'klingon'})
/x;

$_ = qq/nobta' tlhIngan Hol yejHaD. ghIq qo'vaD/;

print "$^R: $^N (\$###)\n" while /$r/g;

Instead of (\$###) I'd expect to print
the matching capture group, -> "DIbvetlh|nobta'|tlhIngan"
in the above example.

Who can move me into the right direction?

Thanks

Mirco
 
U

Uri Guttman

MW> how would I find out the last matching
MW> expression from my regular expression.

MW> $r = qr/(alpha|beta|gamma) (?{'greek'})
MW> |(aleph|beth|gimel) (?{'hebrew'})
MW> |(DIbvetlh|nobta'|tlhIngan) (?{'klingon'})
MW> /x;

MW> $_ = qq/nobta' tlhIngan Hol yejHaD. ghIq qo'vaD/;


MW> print "$^R: $^N (\$###)\n" while /$r/g;

don't use those features (the ?{}, $^N stuff) for this. in fact breaking
up that code into multiple expressions will be cleaner and likely faster
as well.

a simple set of pairs of qr and token is much easier (untested pseudo
code):

my @langs = (
[ qr/(alpha|beta|gamma)/ => 'greek' ],
[ qr/(aleph|beth|gimel)/ => 'hebrew' ],
) ;

that could also be an array of hashes if you want named fields instead
of 0 and 1.

then loop over those and match and return/print/last/whatever. this
assumes it is in a sub and returns.

foreach my $lang_pair ( @langs ) {

return $lang_pair->[1] if /$lang_pair->[0]/ ;
}

uri
 
U

Uri Guttman

MW> of course, my example was somehow misleading.

MW> Lets take another (simpler) example:
MW> (split some O'Reilly-address by weird regex)

MW> $r = qr{
MW> (\w+): (?{'prot'})
MW> | //([^/]*) (?{'host'})
MW> | \b([^/]+/) (?{'path'})
MW> | \b([.\w]+$)(?{'file'})
MW> }x;

MW> $_ = qq{http://conferences.oreillynet.com/pub/w/46/speakers.html};
MW> print "$^R = \t$^N\n" while /$r/g;

this is no different that the previous code. it still uses all those new
fangle features which may be experimental (i didn't check this time).

MW> - which shows the expected result:
MW> prot = http
MW> host = conferences.oreillynet.com
MW> path = pub/
MW> path = w/
MW> path = 46/
MW> file = speakers.html

so?

MW> but how could you (or I) manage (for educatory
MW> use) to display the actual matched regex part
MW> _instead_ of the ?{code assertion}, something like:

MW> \w+ = http
MW> [^/]* = conferences.oreillynet.com
MW> [^/]+/ = pub/

did you read my example at all? just break it up into a list of regexes
and apply them in a loop. why are you stuck with that horrible fat
regex? it is hard to read and understand since few know those features
well and they are not commonly used. my code will likely be faster as
alternation in regexes are slow. what do you think you gain from your
code?

just rewrite it like i showed or something similar and do a simple loop
over the regexes (or pairs or hashes or whatever structure works). but
get away from $^R and $^N and friends.

uri
 
M

Mumia W.

Mirco said:
Hi Uri
...
a simple set of pairs of qr and token is much easier (untested pseudo
code):

of course, my example was somehow misleading.

Lets take another (simpler) example:
(split some O'Reilly-address by weird regex)

$r = qr{
(\w+): (?{'prot'})
| //([^/]*) (?{'host'})
| \b([^/]+/) (?{'path'})
| \b([.\w]+$)(?{'file'})
}x;

$_ = qq{http://conferences.oreillynet.com/pub/w/46/speakers.html};
print "$^R = \t$^N\n" while /$r/g;


- which shows the expected result:
prot = http
host = conferences.oreillynet.com
path = pub/
path = w/
path = 46/
file = speakers.html

but how could you (or I) manage (for educatory
use) to display the actual matched regex part
_instead_ of the ?{code assertion}, something like:

\w+ = http
[^/]* = conferences.oreillynet.com
[^/]+/ = pub/

etc, you know what I mean ...

Any ideas?

Thanks & regards

Mirco

I couldn't find the $^GIMME variable in "man perlvar" :)
But if you're willing to dispense with the qr{} operator, you can do it
this way:

use strict;
use warnings;

$_ = qq{http://conferences.oreillynet.com/pub/w/46/speakers.html};
my @array = (
['proto', '(\w+)'],
['host','://([^/]*)'],
['path','(.+?/)([.\w]+)$'],
['file','([.\w]+$)' ],
);

for my $var (@array) {
my ($name, $rx) = @{$var};
if (s/^$rx//) {
printf "%7s %-20s => %s\n", $name, $rx, $1;
$_ = $2 . $_ if defined ($2);
}
}

Note: It was easier for me to keep the path elements together, but I'm
sure some smart person can figure out how to separate them.
 
M

Mumia W.

Mumia said:
[...]
Note: It was easier for me to keep the path elements together, but I'm
sure some smart person can figure out how to separate them.

I got 'em separated:

use strict;
use warnings;

$_ = qq{http://conferences.oreillynet.com/pub/w/46/speakers.html};
my @array = (
['proto', '(\w+)'],
['host','://([^/]*)'],
['path','(.+?/)'],
['file','([.\w]+)$'],
);

$var = 0;
while ($var < @array) {
local our ($name, $rx) = @{$array[$var]};
if (s/^$rx//) {
printf "%7s %-20s => %s\n", $name, $rx, $1;
} else {
$var++;
}
}
exit;
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,184
Messages
2,570,979
Members
47,578
Latest member
LC_06

Latest Threads

Top