m// on very long lines leaks memory

S

ShaunJ

The following snippet leaks memory until it breaks and falls down when
m// is used on a very long line. It works fine if the line lengths are
short. Try
../test.pl /usr/share/dict/words /usr/share/dict/words
Depending on your dictionary, you'll see that compiling the regex
takes about 200 MB. However the following matching loop leaks memory
at an alarming rate. Start up `top` and watch it run. I'm using Perl
5.8.6 built for darwin-thread-multi-2level. If anyone cares to confirm
or deny this behaviour for other architectures or version of Perl,
that would be interesting too.

Cheers,
Shaun

#!/usr/bin/perl
use strict;
use English;
open REFILE, '<' . shift;
chomp (my @restrings = <REFILE>);
close REFILE;
my @re = map { qr/$_/ } @restrings;

open TEXTFILE, '<' . shift;
chomp (my @text = <TEXTFILE>);
close TEXTFILE;
my $text = join '', @text;

foreach my $re (@re) {
if ($text =~ m/$re/) {
print $LAST_MATCH_START[0], "\n";
}
}
 
J

John W. Krahn

ShaunJ said:
The following snippet leaks memory until it breaks and falls down when
m// is used on a very long line. It works fine if the line lengths are
short. Try
./test.pl /usr/share/dict/words /usr/share/dict/words
Depending on your dictionary, you'll see that compiling the regex
takes about 200 MB. However the following matching loop leaks memory
at an alarming rate. Start up `top` and watch it run. I'm using Perl
5.8.6 built for darwin-thread-multi-2level. If anyone cares to confirm
or deny this behaviour for other architectures or version of Perl,
that would be interesting too.

Cheers,
Shaun

#!/usr/bin/perl
use strict;
use English;
open REFILE, '<' . shift;
chomp (my @restrings = <REFILE>);
close REFILE;
my @re = map { qr/$_/ } @restrings;

open TEXTFILE, '<' . shift;
chomp (my @text = <TEXTFILE>);
close TEXTFILE;
my $text = join '', @text;

foreach my $re (@re) {
if ($text =~ m/$re/) {
print $LAST_MATCH_START[0], "\n";
}
}

I tested it and if I remove the English module it works fine.
(So don't use English.pm!)



John
 
J

John W. Krahn

John said:
ShaunJ said:
The following snippet leaks memory until it breaks and falls down when
m// is used on a very long line. It works fine if the line lengths are
short. Try
./test.pl /usr/share/dict/words /usr/share/dict/words
Depending on your dictionary, you'll see that compiling the regex
takes about 200 MB. However the following matching loop leaks memory
at an alarming rate. Start up `top` and watch it run. I'm using Perl
5.8.6 built for darwin-thread-multi-2level. If anyone cares to confirm
or deny this behaviour for other architectures or version of Perl,
that would be interesting too.

Cheers,
Shaun

#!/usr/bin/perl
use strict;
use English;
open REFILE, '<' . shift;
chomp (my @restrings = <REFILE>);
close REFILE;
my @re = map { qr/$_/ } @restrings;

open TEXTFILE, '<' . shift;
chomp (my @text = <TEXTFILE>);
close TEXTFILE;
my $text = join '', @text;

foreach my $re (@re) {
if ($text =~ m/$re/) {
print $LAST_MATCH_START[0], "\n";
}
}

I tested it and if I remove the English module it works fine.
(So don't use English.pm!)

Or at least don't use the $PREMATCH, $MATCH, or $POSTMATCH variables:

use English qw( -no_match_vars );



John
 
X

xhoster

ShaunJ said:
The following snippet leaks memory until it breaks and falls down when
m// is used on a very long line. It works fine if the line lengths are
short. Try
./test.pl /usr/share/dict/words /usr/share/dict/words
Depending on your dictionary, you'll see that compiling the regex
takes about 200 MB. However the following matching loop leaks memory
at an alarming rate. Start up `top` and watch it run. I'm using Perl
5.8.6 built for darwin-thread-multi-2level. If anyone cares to confirm
or deny this behaviour for other architectures or version of Perl,
that would be interesting too.

Technically, this does not seem to be a leak. If I throw in infinite
loop around your foreach my $re (@re) loop, then memory only grows
up to 15.5Gig when the inner loop completes. Upon the next iteration of
the outer loop, memory stops growing. So it seems like it is an
inefficiency rather than a leak. With idle speculation, I'd say that each
$re maintains some kind of independent state, that that state is
proportional to the size of the string it was last used on, and that that
storage is reused next time that $re gets invoked, but not before then.

Xho

--
-------------------- http://NewsReader.Com/ --------------------
The costs of publication of this article were defrayed in part by the
payment of page charges. This article must therefore be hereby marked
advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
this fact.
 
S

ShaunJ

Or at least don't use the $PREMATCH, $MATCH, or $POSTMATCH variables:

use English qw( -no_match_vars );

Wow, thanks! If I use either English.pm or $& (even without
English.pm) it uses up tons of memory with Perl 5.8.6 (on MacOSX
10.4.11). If I use neither English.pm or $& it works fine.

If I use Perl 5.10.0 built from source it works for every case.

Cheers,
Shaun
 
U

Uri Guttman

S> Wow, thanks! If I use either English.pm or $& (even without
S> English.pm) it uses up tons of memory with Perl 5.8.6 (on MacOSX
S> 10.4.11). If I use neither English.pm or $& it works fine.

i was going to mention that but didn't want to get into this thread. $&
(which is used in english.pm without that option) is a known memory hog
(not a leak). since $& is global it must copy the entire match string
for each regex in case it might be used later anywhere in the
program. this is a well known issue and you should google for more about
it or find the points in perldoc perlvar.

S> If I use Perl 5.10.0 built from source it works for every case.

they seem to have fixed this problem (partially from what i heard but i
could be wrong) in 5.10. i still recommend never using $& and no one who
knows perl uses english.pm.

uri
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,968
Messages
2,570,152
Members
46,697
Latest member
AugustNabo

Latest Threads

Top