Bizarre regex behaviour

B

Brian Wakem

This is driving me mad. I've been chasing a bug in a program for 3hrs, I
have narrowed it down to this complete and working code.

This is the *exact code* I am running, followed by the actual output.

I've tried it on two machines, which run different versions of perl and I
get the same result.

Please, someone tell me what the hell is going on here, why doesn't the
regex in sub1 match the 3rd and 4th time through?


#!/usr/bin/perl

use strict;
use warnings;

sub1();
sub1();
if ('a' =~ m/a/) {
sub1();
}
sub1();


sub sub1 {
print "This is sub1\n";
if ('someword' =~ m//) {
print "Regex matches\n";
}
else {
print "Regex does not match, what on Earth is going on here?\n";
}
}




$ perl -v

This is perl, v5.8.6 built for i386-linux
(with 1 registered patch, see perl -V for more detail)

Copyright 1987-2004, Larry Wall

Perl may be copied only under the terms of either the Artistic License or
the
GNU General Public License, which may be found in the Perl 5 source kit.

Complete documentation for Perl, including FAQ lists, should be found on
this system using `man perl' or `perldoc perl'. If you have access to the
Internet, point your browser at http://www.perl.org/, the Perl Home Page.

$ perl tmp37.pl
This is sub1
Regex matches
This is sub1
Regex matches
This is sub1
Regex does not match, what on Earth is going on here?
This is sub1
Regex does not match, what on Earth is going on here?
$
 
P

Paul Lalli

Brian said:
Please, someone tell me what the hell is going on here, why doesn't the
regex in sub1 match the 3rd and 4th time through?


#!/usr/bin/perl

use strict;
use warnings;

sub1();
sub1();
if ('a' =~ m/a/) {
sub1();
}
sub1();


sub sub1 {
print "This is sub1\n";
if ('someword' =~ m//) {
print "Regex matches\n";
}
else {
print "Regex does not match, what on Earth is going on here?\n";
}
}
$ perl tmp37.pl
This is sub1
Regex matches
This is sub1
Regex matches
This is sub1
Regex does not match, what on Earth is going on here?
This is sub1
Regex does not match, what on Earth is going on here?
$

I'm having trouble locating the exact place in the perldoc where this
is talked about. Basically, an 'empty' pattern match is special. It
repeats the last pattern match executed. By the time m// is seen the
3rd and fourth times, you had previously tried to match m/a/. This
pattern is therefore used again in the subsequent empty pattern
matches. 'someword' does not contain any 'a' characters, so the
pattern match fails.

If I manage to find the correct reference, I'll post an update here.

Paul Lalli
 
P

Paul Lalli

Paul said:
I'm having trouble locating the exact place in the perldoc where this
is talked about. Basically, an 'empty' pattern match is special. It
repeats the last pattern match executed. By the time m// is seen the
3rd and fourth times, you had previously tried to match m/a/. This
pattern is therefore used again in the subsequent empty pattern
matches. 'someword' does not contain any 'a' characters, so the
pattern match fails.

If I manage to find the correct reference, I'll post an update here.

Found it. I was searching perlre, perlretut, and perlreref. It's
actually about the m// operator, not regexp's themselves. So...

perldoc perlop
m/PATTERN/cgimosx
...
If the PATTERN evaluates to the empty string, the
last successfully matched regular expression is used
instead. In this case, only the "g" and "c" flags on
the empty pattern is honoured - the other flags are
taken from the original pattern. If no match has
previously succeeded, this will (silently) act
instead as a genuine empty pattern (which will
always match)

Paul Lalli
 
B

Brian Wakem

Paul said:
Found it. I was searching perlre, perlretut, and perlreref. It's
actually about the m// operator, not regexp's themselves. So...

perldoc perlop
m/PATTERN/cgimosx
...
If the PATTERN evaluates to the empty string, the
last successfully matched regular expression is used
instead. In this case, only the "g" and "c" flags on
the empty pattern is honoured - the other flags are
taken from the original pattern. If no match has
previously succeeded, this will (silently) act
instead as a genuine empty pattern (which will
always match)

Paul Lalli


Thanks Paul, that certainly explains why I'm seeing this behaviour.

What I don't understand is why on Earth is that the default behaviour? It
makes no sense to me. If there's a good reason for it I can't see it. A
bug, not a feature in my opinion.
 
X

xhoster

Brian Wakem said:
This is driving me mad. I've been chasing a bug in a program for 3hrs, I
have narrowed it down to this complete and working code.

This is the *exact code* I am running, followed by the actual output.

I've tried it on two machines, which run different versions of perl and I
get the same result.

Please, someone tell me what the hell is going on here, why doesn't the
regex in sub1 match the 3rd and 4th time through?


As documented in perldoc perlretut,

If the regexp evaluates to the empty string, the
regexp in the last successful match is used instead.

This seems like a mal-feature to me, but there you have it.
Note thst sub1 uses an empty string as the regex.
#!/usr/bin/perl

use strict;
use warnings;

sub1();

At this point, there is no last successful match. I would consider the
behavior to be undefined. Apparently perl just uses an actual empty regex,
which of course matches anything.

At this point, it uses the last successful regex, again the empty one.
if ('a' =~ m/a/) {
sub1();

At this point, sub1 uses the last successful regex, which is /a/

Xho
 
E

Eric Amick

As documented in perldoc perlretut,

If the regexp evaluates to the empty string, the
regexp in the last successful match is used instead.

This seems like a mal-feature to me, but there you have it.

It exists because of historical precedent, I think--a number of Unix
editors treat an empty regex that way to reduce typing when doing a
substitution repeatedly.
 
A

axel

It exists because of historical precedent, I think--a number of Unix
editors treat an empty regex that way to reduce typing when doing a
substitution repeatedly.

For searching rather than substitution. Vi for example, as in Perl,
a substitution with an empty string is always taken to mean
that there should be a deletion. Which is quite logical.

Axel
 
E

Eric Amick

For searching rather than substitution. Vi for example, as in Perl,
a substitution with an empty string is always taken to mean
that there should be a deletion. Which is quite logical.

I was thinking of the target of the substitution, i.e., the string to be
replaced, when I said that, but you have a point about searching in
general.
 
J

Josef Moellers

Brian said:
Paul Lalli wrote:





Thanks Paul, that certainly explains why I'm seeing this behaviour.

What I don't understand is why on Earth is that the default behaviour? It
makes no sense to me. If there's a good reason for it I can't see it. A
bug, not a feature in my opinion.

If you've ever used one of the "ed" family of editors (vi, vim, ...),
you'll find that it is the default behaviour there, too.

And ... as it is in the documentation, you can hardly call it a bug, can
you.
 
A

Anno Siegel

Josef Moellers said:
If you've ever used one of the "ed" family of editors (vi, vim, ...),
you'll find that it is the default behaviour there, too.

....with one small but essential difference: In Perl, it is necessary
for a regex to *match successfully* before it is taken as the default
for m//. The editors accept any (syntactically correct) regex.

Perl's behavior makes the feature practically useless. If you know
the regex at coding time, you can always write it out. You would want
to use the feature when the regex is only given at run time, but then
you'd have to make it match once. That is, given an arbitrary regex,
you'd have to construct a string that this regex matches. That is an
utterly non-trivial task that doesn't always have a solution. You
wouldn't want to do that just to set a default.
And ... as it is in the documentation, you can hardly call it a bug, can
you.

I can still call it a misfeature, and I do. I suppose it was an
implementation error -- it was meant to be "successfully compiled
regex" but got implemented as "successfully matching regex". Now
we're stuck with it.

Anno
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,997
Messages
2,570,241
Members
46,831
Latest member
RusselWill

Latest Threads

Top