Pesky bug in Perl 5.8.6's regexeps

K

kj

I've come across what I'm almost certain is a bug in the Perl
internals, for version 5.8.6. Unfortunately, it is a very skittish
Heisenbug, and I have not been able to reproduce it in a small
script.

In fact, the bug is so quirky that I post this only in the hope
that something will ring a bell with someone who may have seen
something remotely like this before, and who could point me in the
direction of more info.

The bug shows up during a particular execution of the following
code in URI.pm (the comment is mine):

sub _scheme
{
my $self = shift;

unless (@_) {
return unless $$self =~ /^($scheme_re):/o;
return $1; # <---- junk in $1
}

# ...

and manifests itself in the form of occasional junk in $1, as I've
indicated with the added comment. By "junk" I mean stuff that is
not in $$self at all, usually non-ASCII bytes.

$scheme_re = '[a-zA-Z][a-zA-Z0-9.+\-]*';

But I see this behavior only when I run my code with perl 5.8.6.
When I run the same code under 5.8.8 everything works fine.

I can't rule out that the difference between 5.8.6 and 5.8.8 does
not lie in one of the various modules whose versions differ between
my 5.8.6 and 5.8.8 installations, but, FWIW, at least I confirmed
that the error still occurs with 5.8.6 but not with 5.8.8 even when
I ensure that all the modules mentioned in the stack trace at the
point of failure match exactly between the two version.

In case it matters, this is all running under Linux:

jones@luna:~> uname -ar
Linux luna 2.6.11.4-21.17-smp #1 SMP Fri Apr 6 08:42:34 UTC 2007 i686 i686 i386 GNU/Linux

Any troubleshooting ideas you may send my way would be much
appreciated!

TIA,

kj
 
B

Ben Morrow

Quoth kj said:
I've come across what I'm almost certain is a bug in the Perl
internals, for version 5.8.6. Unfortunately, it is a very skittish
Heisenbug, and I have not been able to reproduce it in a small
script.
The bug shows up during a particular execution of the following
code in URI.pm (the comment is mine):

sub _scheme
{
my $self = shift;

unless (@_) {
return unless $$self =~ /^($scheme_re):/o;
return $1; # <---- junk in $1

and manifests itself in the form of occasional junk in $1, as I've
indicated with the added comment. By "junk" I mean stuff that is
not in $$self at all, usually non-ASCII bytes.

That sounds like a bug in perl's Unicode handling.
But I see this behavior only when I run my code with perl 5.8.6.
When I run the same code under 5.8.8 everything works fine.

There were fixes to Unicode's interaction with regexes between 5.8.6 and
5.8.8. See e.g. perldoc perl587delta. If you can't reproduce with 5.8.8
it's likely the bug has been fixed.
Any troubleshooting ideas you may send my way would be much
appreciated!

What's the problem? Just use 5.8.8.

Ben
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,236
Members
46,822
Latest member
israfaceZa

Latest Threads

Top