Regex capture variable scoping

L

LBJ

Can someone please help me understand what I'm seeing as a
contradiction between a statement in the perlre man page and a bug in
a script that just bit me in the a**?

From perlre:
"The numbered variables ($1, $2, $3, etc.) and the related
punctuation set ($+, $&, $`, and $') are all dynamically
scoped until the end of the enclosing block or until the
next successful match, whichever comes first."

And here's the code in question:

# looping through a fairly standard apache log
while (<LOGFILE>)
{
# grab the datestamp and querystring
m/\s\[([^\s]+) \-\d+\] "[A-Z]+ [^\?\s]+\?([^\s]+)/;

my ($date, $querystring) = ($1, $2);

if ($querystring)
{
....
}
}

The "bug" here is that $2 seems to not get unset for each iteration of
the while loop. When the loop gets to a line that doesn't match a
querystring value, $querystring is getting assigned the value of the
previous line's $2 capture buffer.

I've always been under the impression that the capture buffers got
unset with each subsequent regular expression, regardless of success
or failure. If that's not true, I still don't underdstand how this
gels with the statement in the perlre man page about the special
variables being dynamically scoped until the end of the block.

Thanks,
Jay L.
 
J

Jeff 'japhy' Pinyan

[posted & mailed]

Can someone please help me understand what I'm seeing as a
contradiction between a statement in the perlre man page and a bug in
a script that just bit me in the a**?

I agree the documentation is misleading.
"The numbered variables ($1, $2, $3, etc.) and the related
punctuation set ($+, $&, $`, and $') are all dynamically
scoped until the end of the enclosing block or until the
next successful match, whichever comes first."

One thinks, then, that it is like saying:

while (<FOO>) {
local $something;
if (...) { $something = 1 }
print "s = $something\n";
}

However, the $DIGIT variables actually work this way:

{
# they start with their initial value
# and are scoped to JUST AROUND the block
local $something = $something;
while (<FOO>) {
if (...) { $something = 1 }
print "s = $something\n";
}
}

Compounded to that is the fact that $DIGIT variables are not reset to
undef on a failed match, and you have a sticky situation, admittedly.

But the docs could use some fixing.
 
E

Eric Amick

Can someone please help me understand what I'm seeing as a
contradiction between a statement in the perlre man page and a bug in
a script that just bit me in the a**?

From perlre:
"The numbered variables ($1, $2, $3, etc.) and the related
punctuation set ($+, $&, $`, and $') are all dynamically
scoped until the end of the enclosing block or until the
next successful match, whichever comes first."

And here's the code in question:

# looping through a fairly standard apache log
while (<LOGFILE>)
{
# grab the datestamp and querystring
m/\s\[([^\s]+) \-\d+\] "[A-Z]+ [^\?\s]+\?([^\s]+)/;

my ($date, $querystring) = ($1, $2);

if ($querystring)
{
....
}
}

The "bug" here is that $2 seems to not get unset for each iteration of
the while loop. When the loop gets to a line that doesn't match a
querystring value, $querystring is getting assigned the value of the
previous line's $2 capture buffer.

You can get what you want by combining the two statements:

my ($date, $querystring) =
m/\s\[([^\s]+) \-\d+\] "[A-Z]+ [^\?\s]+\?([^\s]+)/;

A failed pattern match returns an empty list in list context.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,125
Messages
2,570,748
Members
47,302
Latest member
MitziWragg

Latest Threads

Top