regexp: segmentation fault

S.Marion · Mar 9, 2006

Hello,

I have a problem with my regexp.
I'm trying to match the following pattern:

$cmd =~ /static \{\};(.*\n)(.*)Signature:
V(.*\n){1,3}.*Code:\n(.+\n){1,$limit}\s*$offset

.*)/g;

The problem is that there can be many lines between the "Code" and the
$offset.
By many line I mean thousands.
When the offset is further than about 8000 lines, I have a segfault !
I guess the problem is that it's feeding too much info into $3 whereas
I'm only interested in the remaining of the line after $offset (that is $4).
Basically, if I could avoid using $3 I wouldn't mind !

Do you know any way I could fix this?

Thank you for your help,

Sebastien

A. Sinan Unur · Mar 9, 2006

I have a problem with my regexp.
I'm trying to match the following pattern:

$cmd =~ /static \{\};(.*\n)(.*)Signature:
V(.*\n){1,3}.*Code:\n(.+\n){1,$limit}\s*$offset.*)/g;

The problem is that there can be many lines between the "Code" and the
$offset.
By many line I mean thousands.
When the offset is further than about 8000 lines, I have a segfault !
I guess the problem is that it's feeding too much info into $3 whereas
I'm only interested in the remaining of the line after $offset (that
is $4). Basically, if I could avoid using $3 I wouldn't mind !

Without any idea what the input looks like, the regex above does not
mean much to me.

By the way, the (.*) after $offset is the fifth capture group.

If you are not interested in capturing anything before that, why are you
using capturing groups?

I have a feeling (since I have no data, I cannot test this), anchoring
the pattern, using .+ rather than .* might help.

On the other hand, depending on what the input looks like, I might be
tempted to use the .. operator.

See

perldoc perlre for non-capturing groups
perldoc perlop for range operators

Sinan

S.Marion · Mar 9, 2006

Hello,

Thank you for your reply.
Let me apologise if I wasn't clear enough.
Basically the inputs are from javap, and I want to match a particular
offset of a given output in the given method with the given signature.

By the way, the (.*) after $offset is the fifth capture group.

That's right, my mistake, got confused after moving it around.

If you are not interested in capturing anything before that, why are you
using capturing groups?

well... simply because I have no idea how else I could say "ok jump as
many lines as you want until you find my offset".

I have a feeling (since I have no data, I cannot test this), anchoring
the pattern, using .+ rather than .* might help.

No unfortunately that doesn't do the trick.

On the other hand, depending on what the input looks like, I might be
tempted to use the .. operator.

I'm not sure I understand what this does, but in any case it does not
work unfortunately

S.Marion · Mar 9, 2006

Any thoughts??

Tad McClellan · Mar 9, 2006

S.Marion said:
Any thoughts??

I think you should quote some context in followups like
everyone else does.

Dr.Ruud · Mar 9, 2006

S.Marion schreef:

Any thoughts??

Yes.

S.Marion · Mar 9, 2006

Ok, I'll try to simplify the question.

I found the following which I thing is exactly my problem:
"Items governed by * (and *?) are optional not only once, but repeatedly
forever (well, to be pedantic, Perl currently has an internal limit of
32K repeats for parenthetical items)."

basically the file I parse is something like:

bla
2000 lines
what i want: secret
blablabla
10 000 lines (more than 32k)
what i want: secret

I only want to get the "secret" after the "what I want" stuff while
being sure this is below "blablabla" AND NOT below "bla".

So the regexp looks like:
$cmd =~ /blablabla (.*\n)what i want: (.*)/g

PS: I can't really use a /gs modifier due to the complexity of the file
to parse. If I do so, I would end-up with duplicates.

Sebastien

Jürgen Exner · Mar 9, 2006

S.Marion said:
Any thoughts??

About what?

jue

ednotover · Mar 9, 2006

S.Marion said:
Ok, I'll try to simplify the question.

basically the file I parse is something like:

bla
2000 lines
what i want: secret
blablabla
10 000 lines (more than 32k)
what i want: secret

I only want to get the "secret" after the "what I want" stuff while
being sure this is below "blablabla" AND NOT below "bla".

So the regexp looks like:
$cmd =~ /blablabla (.*\n)what i want: (.*)/g

Two quick thoughts:

1) Why are you trying to do this with a regexp? Why not loop through
the input file and take actions as needed as you see the significant
input lines?

2) If you're staying with a regexp, you might be better off using
non-greedy matches for the portions that are your "filler":

$cmd =~ /blablahblah (.*?\n)what i what: (.*?)/g

The segfault may be due to the size of the input and the amount of
backtracking the RE engine has to manage.

Hope that helps,
Ed

robic0 · Mar 11, 2006

Two quick thoughts:

1) Why are you trying to do this with a regexp? Why not loop through
the input file and take actions as needed as you see the significant
input lines?

2) If you're staying with a regexp, you might be better off using
non-greedy matches for the portions that are your "filler":

$cmd =~ /blablahblah (.*?\n)what i what: (.*?)/g

The segfault may be due to the size of the input and the amount of
backtracking the RE engine has to manage.

Naaa, the size and backtracking will not produce a segmentation fault.
Personally, the OP's posted problem is absurd. In the real world, no one
would program to such an abstraction. His "blablabla's" and 32K represents
random, non-repeatable form expressions. Might as well try to do regex on
the Dictionary.

Most likely this guy has got a DFI board with overclocked cpu in the 4.0gz
and ram in the 700 ddr range, *on air* with temps in the 70c range.
OR, has got 128 meg installed for grins. Ever try to run a bunch of programs
in XP with 128 meg of ram? I would bet Perl would seg fault, wouldn't you?

-robic0-

jl_post · Mar 11, 2006

S.Marion said:
basically the file I parse is something like:

bla
2000 lines
what i want: secret
blablabla
10 000 lines (more than 32k)
what i want: secret

I only want to get the "secret" after the "what I want" stuff while
being sure this is below "blablabla" AND NOT below "bla".

Here's a short Perl script that does what you want:

#!/usr/bin/perl
use strict;
use warnings;

while (<DATA>)
{
# Skip line unless it's between the "bla" and "blablabla" lines:
next unless m/^bla$/ .. m/^blablabla$/;

if (m/^what i want: (.*)/)
{
my $wantedObject = $1;
print "I found what I want! It's $wantedObject!\n";
}
}

__DATA__
bla
2000 lines
what i want: secret1
blablabla
10 000 lines (more than 32k)
what i want: secret2

Run this program, and you'll see that the output is:

I found what I want! It's secret1!

Notice that it found "secret1" but not "secret2". That's because
the ".." operator only returns true when it is between the "bla" and
the "blablabla" lines. We told it to skip any lines that aren't
between those two lines with the line of code:

next unless m/^bla$/ .. m/^blablabla$/;

I hope this helps, Sebastien.

Have a great weekend!

-- Jean-Luc

C program: memory leak/ segmentation fault/ memory limit exceeded	0	Nov 12, 2022
App segmentation fault (CentOS 6.5)	3	Apr 23, 2014
Regular expression segmentation Fault with in-place substitution	1	Jul 29, 2009
vmkr segmentation fault	1	Jun 10, 2013
Segmentation fault (core dumped) while using Cplex Python API	2	Jun 1, 2014
Possible PHP/WP problem with code, trouble accessing custom archive links	1	Jan 5, 2023
Py 3.3.2, MacBookPro, segmentation fault, GCC issue?	3	Nov 6, 2013
small regexp help	1	Oct 30, 2013

regexp: segmentation fault

S.Marion

A. Sinan Unur

S.Marion

S.Marion

Tad McClellan

Dr.Ruud

S.Marion

Jürgen Exner

ednotover

robic0

jl_post

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads