Need workaround for regex bug in 5.8.6

J

James Marshall

I found a weird bug in Perl 5.8.6: If a variable in a CGI script (only)
is long enough, the script dies when it matches the variable against the
pattern /(.|ab)*/ . The critical length seems to vary by machine, or even
by data size or other environmental conditions-- memory or heap problem,
maybe? Here's an NPH CGI script that demonstrates the bug on my machine:

---------------------------------------

#!/usr/bin/perl

use strict ;

my($s)= ' ' x 15881 ; # 15880 is fine, but 15881 crashes
$s=~ /(.|ab)*/ ; # dies here with no warning
&HTTPdie('got here') ; # never gets here


# Die, outputting full HTTP response.
sub HTTPdie {
my($msg)= @_ ;

print <<EOF ;
HTTP/1.0 200 OK
Cache-Control: no-cache
Pragma: no-cache
Content-Type: text/plain

$msg
EOF

exit ;
}

---------------------------------------

This bug doesn't happen if the script is run from the command line, no
matter how large $s is.

Have you seen this bug, and if so do you know a good workaround? Do you
know if it's fixed in 5.8.7? Even if so, I'd like a workaround for 5.8.6,
since the software will be used in many environments where the user has no
control over the Perl version.

I'm running this on Linux, kernel 2.6.11 (SuSE 9.3).

If it helps, running this script with Perl 5.8.4 results in a segmentation
fault, even when run from the command line. (The critical length of $s is
smaller.)

Thanks a lot for any help!

James
.............................................................................
James Marshall (e-mail address removed) Berkeley, CA @}-'-,--
"Teach people what you know."
.............................................................................
 
A

A. Sinan Unur

I found a weird bug in Perl 5.8.6:

I would not call it a bug. Rather, you are getting what you deserve.

....
#!/usr/bin/perl

use strict ;

my($s)= ' ' x 15881 ; # 15880 is fine, but 15881 crashes
$s=~ /(.|ab)*/ ; # dies here with no warning
&HTTPdie('got here') ; # never gets here

Yeah, but did you ever read the server logs?
This bug doesn't happen if the script is run from the command line, no
matter how large $s is.

Now you are lying.

D:\Home\asu1\UseNet\clpmisc> cat r.pl
use strict;
use warnings;

my($s)= ' ' x 100000 ; # 15880 is fine, but 15881 crashes
$s=~ /(.|ab)*/ ; # dies here with no warning

D:\Home\asu1\UseNet\clpmisc> perl r.pl
Complex regular subexpression recursion limit (32766) exceeded at r.pl
line 5.
Have you seen this bug, and if so do you know a good workaround?

It is not a bug, and the workaround is not to do something this stupid.
...................................................................... ..
.....
James Marshall (e-mail address removed) Berkeley, CA
@}-'-,--
"Teach people what you know."
......................................................................

By the way, your signature is formatted incorrectly. It should be around
70 characters wide, and there should be a sig separator on the line
above it. A sig separator is dash-dash-space-newline.

Sinan
 
S

Sisyphus

"A. Sinan Unur"
..
..
D:\Home\asu1\UseNet\clpmisc> cat r.pl
use strict;
use warnings;

my($s)= ' ' x 100000 ; # 15880 is fine, but 15881 crashes
$s=~ /(.|ab)*/ ; # dies here with no warning

D:\Home\asu1\UseNet\clpmisc> perl r.pl
Complex regular subexpression recursion limit (32766) exceeded at r.pl
line 5.

I get the same on Windows 2000, perl5.8.4 - but on Windows 2000, perl5.8.7
all I get is an "Unknown software exception ..." WIndows popup - which in
the past has usually meant that the stack overflowed.
On linux, perl 5.8.7, it just outputs "Segmentation fault". Seems that
somewhere along the way, perl has lost the capability of handling the error,
and it's now left up to the operating system to deal with.

Something else has changed, too. On my Win32 box, using perl 5.8.7, the
"Unknown sopftware exception..." occurs with just 5207 spaces assigned to
$s. Using perl 5.8.4 (on the same box/os) there's no problem until at least
32767 spaces are assigned to $s (when the perl error occurs).

Cheers,
Rob
 
A

A. Sinan Unur

"A. Sinan Unur"
.
.

I get the same on Windows 2000, perl5.8.4 - but on Windows 2000,
perl5.8.7 all I get is an "Unknown software exception ..." WIndows
popup - which in the past has usually meant that the stack overflowed.
On linux, perl 5.8.7, it just outputs "Segmentation fault". Seems that
somewhere along the way, perl has lost the capability of handling the
error, and it's now left up to the operating system to deal with.

Something else has changed, too. On my Win32 box, using perl 5.8.7,
the "Unknown sopftware exception..." occurs with just 5207 spaces
assigned to $s. Using perl 5.8.4 (on the same box/os) there's no
problem until at least 32767 spaces are assigned to $s (when the perl
error occurs).

Interesting because I am using AS Perl 5.8.7 on Windows, and I cannot
observe the behavior.

Sinan
 
S

Sisyphus

A. Sinan Unur said:
Interesting because I am using AS Perl 5.8.7 on Windows, and I cannot
observe the behavior.

Aaah ... my perl 5.8.7 was built using gcc (MinGW port), whereas my perl
5.8.4 is AS build 810. So it looks like the compiler used has a bearing.

In fact, I also have a perl 5.8.7 built using MSVC++ 7.0 (.NET), and I now
find it exhibits the same behaviour as my perl 5.8.4 (and your AS perl
5.8.7).

That's notable in that I can't recall ever coming across a situation where
the compiler used to build a native Win32 perl has had such a marked effect
as we're seeing here.

Cheers,
Rob
 
I

Ilya Zakharevich

[A complimentary Cc of this posting was sent to
James Marshall
I found a weird bug in Perl 5.8.6: If a variable in a CGI script (only)
is long enough, the script dies when it matches the variable against the
pattern /(.|ab)*/ .

This is a very old limitation of the Perl REx engine: it uses C stack
for backtracking-data storage; since C stack is a very scarse
resource, and running out of stack is a catastrophic process (as
opposed to running out of heap), this makes things very restrictive.

Actually, about 5 years ago I added the necessary infrastructure to
the REx engine to keep these data on Perl stacks (as opposed to C
stacks, Perl stacks can grow, and running out of stack can be caught -
at least in some situations); moreover, I converted one part of the
REx engine (out of 4 or 5 different parts) to use this infrastructure.

At this moment I had no time to convert the remaining constructs. I
hoped that "everybody" will be able to continue and "copy" the
provided modification to the other constructs. Apparently, nobody
volunteered.

=======================================================

Meanwhile, you have several alternatives:

a) Make sure that your Perl is compiled with "stack checking code",
so that running out of stack is not catastrophic (will not help
with data processing :-(, but will help with bookkeeping ;-);

b) Increase amount of stack so that your data can be processed (not
always feasible);

c) Do not use ()* on complicated constructs (likewise).

Sorry to be a bearer of a sad news,
Ilya
 
X

xhoster

I found a weird bug in Perl 5.8.6: If a variable in a CGI script (only)
is long enough, the script dies when it matches the variable against the
pattern /(.|ab)*/

Why match against that in the first place? Is there any case in which that
pattern match will fail?

Xho
 
J

James Marshall

Why match against that in the first place? Is there any case in which that
pattern match will fail?

The actual pattern I'm using is much longer and more complex. The pattern
above was the result of reducing it to a simple test case.


James
 
J

James Marshall

OK, thanks very much for the explanation. Unfortunately, none of the
alternatives are possible in this situation, except maybe the third-- I'll
have to think about it some more. Thanks for fixing part of it five years
ago; if I knew more about Perl internals I'd finish it myself.

Thanks also to Rob for his experimentation and feedback under Windows.

Cheers,
James
.............................................................................
James Marshall (e-mail address removed) Berkeley, CA @}-'-,--
"Teach people what you know."
.............................................................................


On Thu, 10 Nov 2005, Ilya Zakharevich wrote:

IZ> [A complimentary Cc of this posting was sent to
IZ> James Marshall
IZ> > I found a weird bug in Perl 5.8.6: If a variable in a CGI script (only)
IZ> > is long enough, the script dies when it matches the variable against the
IZ> > pattern /(.|ab)*/ .
IZ>
IZ> This is a very old limitation of the Perl REx engine: it uses C stack
IZ> for backtracking-data storage; since C stack is a very scarse
IZ> resource, and running out of stack is a catastrophic process (as
IZ> opposed to running out of heap), this makes things very restrictive.
IZ>
IZ> Actually, about 5 years ago I added the necessary infrastructure to
IZ> the REx engine to keep these data on Perl stacks (as opposed to C
IZ> stacks, Perl stacks can grow, and running out of stack can be caught -
IZ> at least in some situations); moreover, I converted one part of the
IZ> REx engine (out of 4 or 5 different parts) to use this infrastructure.
IZ>
IZ> At this moment I had no time to convert the remaining constructs. I
IZ> hoped that "everybody" will be able to continue and "copy" the
IZ> provided modification to the other constructs. Apparently, nobody
IZ> volunteered.
IZ>
IZ> =======================================================
IZ>
IZ> Meanwhile, you have several alternatives:
IZ>
IZ> a) Make sure that your Perl is compiled with "stack checking code",
IZ> so that running out of stack is not catastrophic (will not help
IZ> with data processing :-(, but will help with bookkeeping ;-);
IZ>
IZ> b) Increase amount of stack so that your data can be processed (not
IZ> always feasible);
IZ>
IZ> c) Do not use ()* on complicated constructs (likewise).
IZ>
IZ> Sorry to be a bearer of a sad news,
IZ> Ilya
IZ>
 
I

Ilya Zakharevich

[A complimentary Cc of this posting was sent to
James Marshall
IZ> c) Do not use ()* on complicated constructs (likewise).

Actually, there is

d) Use ()* only on construct I fixed 5 years ago. It may have been
the "constant length of the group case" (do not remember...); so
if you could use
(..|ab)*
instead of yours
(.|ab)*

this may be crucial. (Or maybe it was length=1 case only?)

You need to experiment,
Ilya
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,236
Members
46,822
Latest member
israfaceZa

Latest Threads

Top