Memory issues

J

jm

Based on the fact that perl contains many memory leaks,

A universal way to measure how many memory is malloced is required.

Is there standard way to measure how many memory a process has
allacated, which run with cygwin perl, active perl, and strawberry perl?

This should help to localize which code makes memory leaks.
 
J

Joost Diepenmaat

jm said:
Based on the fact that perl contains many memory leaks,

It doesn't.
A universal way to measure how many memory is malloced is required.

I don't understand what that means.
Is there standard way to measure how many memory a process has
allacated, which run with cygwin perl, active perl, and strawberry perl?

That's not nearly universal. See Win32::process::Info
This should help to localize which code makes memory leaks.

It hardly would.
 
S

smallpond

Based on the fact that perl contains many memory leaks,

A universal way to measure how many memory is malloced is required.

Is there standard way to measure how many memory a process has
allacated, which run with cygwin perl, active perl, and strawberry perl?

This should help to localize which code makes memory leaks.


perldoc perlfaq3
See:
How can I make my Perl program take less memory?
How can I free an array or hash so my program shrinks?
 
J

jm

Joost Diepenmaat a écrit :
It doesn't.

I wrote a sample of code to illustrate the issue.

The code create a 10 mega characters string. this is the only big data
in this sample.

Then, the main part of the code just modify this data; that mean that
memory usage should (in my humble opinion) stay near of 10 or 20 (or 40)
mega bytes.

The main program does not manipulate directly the string, but makes
functions aa and ab to manipulate this string. Those two functions aa
and ab just make substitutions within the string.

After creating the first string, perl use (around) 20 Mbytes. It is okay.

Calling function aa (one or several times) makes a memory leak (or
memory empreint) of 150 Mbytes.
I mean that once I called this function I do not know how to free those
150 mega bytes, but if I call this same function again I will not loose
more memory.

When I call the function ab, which is quite similar to function aa,
I have the same memory issue, but with only 50 Mbytes more.



Hereafter the result of the script, and the script.
System is debian etch, with 512 Mbytes memory.

----- Result of script: ----------------------------------
/tmp$ perl essai.pl
10000001
PID TTY STAT TIME MAJFL TRS DRS RSS %MEM COMMAND
20492 pts/1 R+ 0:00 0 1022 22977 20988 4.0 perl essai.pl

10000001
PID TTY STAT TIME MAJFL TRS DRS RSS %MEM COMMAND
20492 pts/1 S+ 0:43 0 1022 159921 158132 30.5 perl essai.pl

10000001
PID TTY STAT TIME MAJFL TRS DRS RSS %MEM COMMAND
20492 pts/1 R+ 0:57 0 1022 159921 158132 30.5 perl essai.pl
10000001
PID TTY STAT TIME MAJFL TRS DRS RSS %MEM COMMAND
20492 pts/1 R+ 4:34 0 1022 218529 216740 41.9 perl essai.pl
10000001
PID TTY STAT TIME MAJFL TRS DRS RSS %MEM COMMAND
20492 pts/1 R+ 8:10 0 1022 218529 216740 41.9 perl essai.pl


----- Script: ------------------------------------------


sub aa($)
{
my ($d) = @_;
$d =~ s/x(.....)/$1y/g ;
$d =~ s/x(.....)/$1z/g ;
$d =~ s/x(.....)/$1a/g ;
$d =~ s/x(.....)/$1b/g ;
$d =~ s/x(.....)/$1c/g ;
return $d;
}

sub ab($)
{
my ($d) = @_;
$d =~ s/a(.....)/$1y/g ;
$d =~ s/b(.....)/$1z/g ;
$d =~ s/c(.....)/$1a/g ;
$d =~ s/y(.....)/$1b/g ;
$d =~ s/z(.....)/$1c/g ;
return $d;
}


my $c= 'x' x (1000*1000*10) ;
$c .= "\x{1234}" ;
print length($c) ."\n" ;
my $v = qx( ps v $$ );
print "$v\n" ;
$c = aa($c);
print length($c) ."\n" ;
my $v = qx( ps v $$ );
print "$v\n" ;
$c = aa($c);
$c = aa($c);
$c = aa($c);
$c = aa($c);
$c = aa($c);
print length($c) ."\n" ;
my $v = qx( ps v $$ );
print $v;
$c = ab($c);
$c = ab($c);
$c = ab($c);
$c = ab($c);
$c = ab($c);
print length($c) ."\n" ;
my $v = qx( ps v $$ );
print $v;
$c = ab($c);
$c = ab($c);
$c = ab($c);
$c = ab($c);
$c = ab($c);
print length($c) ."\n" ;
my $v = qx( ps v $$ );
print $v;
 
J

jm

smallpond a écrit :
perldoc perlfaq3
See:
How can I make my Perl program take less memory?
How can I free an array or hash so my program shrinks?

It is interesting, but it does not seam to solve my substitution issue.

However I does not understand this:

« Memory allocated to lexicals (i.e. my() variables)
cannot be reclaimed or reused even if they go out of scope. It is
reserved in case the variables come back into scope. Memory allocated
to global variables can be reused (within your program) by using
undef()ing and/or delete(). »

Aren't my variables local variables?
Why aren't they freed when function terminates?
 
S

smallpond

smallpond a écrit :



It is interesting, but it does not seam to solve my substitution issue.

However I does not understand this:

« Memory allocated to lexicals (i.e. my() variables)
cannot be reclaimed or reused even if they go out of scope. It is
reserved in case the variables come back into scope. Memory allocated
to global variables can be reused (within your program) by using
undef()ing and/or delete(). »

Aren't my variables local variables?
Why aren't they freed when function terminates?


sub foo {
my $v = 5;
return \$v;
}

In C, once the function terminates $v is gone and a pointer
to it will fail. In perl this reference is legal and the
space will not be reclaimed.

In your sample of code above, when you pass a string to a sub,
perl will make a copy. If you pass a reference it will not.
This isn't a memory leak in perl, it's a memory leak in your
program.
 
J

jm

smallpond a écrit :
sub foo {
my $v = 5;
return \$v;
}

In C, once the function terminates $v is gone and a pointer
to it will fail. In perl this reference is legal and the
space will not be reclaimed.

In your sample of code above, when you pass a string to a sub,
perl will make a copy. If you pass a reference it will not.
This isn't a memory leak in perl, it's a memory leak in your
program.

As you suggested, I tried to replace scalar by references, but this does
not look like saving memory (might be 10 Mbytes, I mean just the size of
the main variable):

--- results -------------------------
/tmp$ perl essai.pl && echo ok
10000001
PID TTY STAT TIME MAJFL TRS DRS RSS %MEM COMMAND
31679 pts/1 R+ 0:00 0 1022 22977 20996 4.0 perl essai.pl

10000001
PID TTY STAT TIME MAJFL TRS DRS RSS %MEM COMMAND
31679 pts/1 R+ 0:38 0 1022 150157 148372 28.7 perl essai.pl

10000001
PID TTY STAT TIME MAJFL TRS DRS RSS %MEM COMMAND
31679 pts/1 R+ 0:50 0 1022 150157 148372 28.7 perl essai.pl

10000001
PID TTY STAT TIME MAJFL TRS DRS RSS %MEM COMMAND
31679 pts/1 R+ 3:53 0 1022 198997 197212 38.1 perl essai.pl

10000001
PID TTY STAT TIME MAJFL TRS DRS RSS %MEM COMMAND
31679 pts/1 R+ 6:55 0 1022 198997 197212 38.1 perl essai.pl

ok


--- scipt ---------------------------
sub aa($)
{
my ($d) = @_;
$$d =~ s/x(.....)/$1y/g ;
$$d =~ s/x(.....)/$1z/g ;
$$d =~ s/x(.....)/$1a/g ;
$$d =~ s/x(.....)/$1b/g ;
$$d =~ s/x(.....)/$1c/g ;
return $d;
}

sub ab($)
{
my ($d) = @_;
$$d =~ s/a(.....)/$1y/g ;
$$d =~ s/b(.....)/$1z/g ;
$$d =~ s/c(.....)/$1a/g ;
$$d =~ s/y(.....)/$1b/g ;
$$d =~ s/z(.....)/$1c/g ;
return $d;
}


my $s= 'x' x (1000*1000*10) ;
$s .= "\x{1234}" ;
my $c = \$s;
print length($$c) ."\n" ;
my $v = qx( ps v $$ );
print "$v\n" ;
$c = aa($c);
print length($$c) ."\n" ;
my $v = qx( ps v $$ );
print "$v\n" ;
$c = aa($c);
$c = aa($c);
$c = aa($c);
$c = aa($c);
$c = aa($c);
print length($$c) ."\n" ;
my $v = qx( ps v $$ );
print "$v\n" ;
$c = ab($c);
$c = ab($c);
$c = ab($c);
$c = ab($c);
$c = ab($c);
print length($$c) ."\n" ;
my $v = qx( ps v $$ );
print "$v\n" ;
$c = ab($c);
$c = ab($c);
$c = ab($c);
$c = ab($c);
$c = ab($c);
print length($$c) ."\n" ;
my $v = qx( ps v $$ );
print "$v\n" ;
 
J

Jürgen Exner

jm said:
Joost Diepenmaat a écrit :

I wrote a sample of code to illustrate the issue.

The code create a 10 mega characters string. this is the only big data
in this sample.

Which subsequently you copy a few times.
Then, the main part of the code just modify this data; that mean that
memory usage should (in my humble opinion) stay near of 10 or 20 (or 40)
mega bytes.

The main program does not manipulate directly the string, but makes
functions aa and ab to manipulate this string. Those two functions aa
and ab just make substitutions within the string.
No, they don't modify the string at all, they modify a _copy_ of the
original string.
----- Script: ------------------------------------------


sub aa($)
{
my ($d) = @_;

And here you create a copy of the original string.
$d =~ s/x(.....)/$1y/g ;
$d =~ s/x(.....)/$1z/g ;
$d =~ s/x(.....)/$1a/g ;
$d =~ s/x(.....)/$1b/g ;
$d =~ s/x(.....)/$1c/g ;
return $d;

You return that copy ...
}


my $c= 'x' x (1000*1000*10) ;
$c .= "\x{1234}" ;
print length($c) ."\n" ;
my $v = qx( ps v $$ );
print "$v\n" ;
$c = aa($c);

....and you save that copy in $c, such that the memory cannot be reused.

The rest of the code seems to duplicate that action several times using
successively updated versions of the string as function argument, such
that successivly new copies of the string are created.

jue
 
J

jm

Modifying a little bit again the script, and checking execution with
ltrace, I observed malloc is called 1871 times when free is just called
922 times.
Isn't it an issue?

I just replaced
my $s= 'x' x (1000*1000*10) ;
by
my $s= 'x' x (10) ;

and did:


/tmp$ ltrace perl essai.pl 2>&1 | sed 's/(.*//g' | sort | uniq -c |
grep 'malloc\|free'
922 free
1871 malloc
 
S

smallpond

smallpond a écrit :







As you suggested, I tried to replace scalar by references, but this does
not look like saving memory (might be 10 Mbytes, I mean just the size of
the main variable):



The sub call was just answering your question about
locals.

Each of these:
$$d =~ s/x(.....)/$1a/g ;

is making string copies in $1. $1 is a persistent
variable. perl 5.10 has new regex syntax for
avoiding use of $1, $2 etc.
 
J

jm

jm a écrit :
Modifying a little bit again the script, and checking execution with
ltrace, I observed malloc is called 1871 times when free is just called
922 times.
Isn't it an issue?

and did:

ltrace perl essai.pl 2>&1 | grep 'malloc\|free\|realloc' | perl
observe_malloc_free.pl > memory.log

with observe_malloc_free.pl in mail bottom.

Hereafter, this result of memory leaks:

1 NULL is freed, but thats not a memory leak!

the rest can be read like this:
7 10 Mbytes data are not freed.
1 13 Mbytes data is not freed
8272 4 bytes data are not freed
9843 4080 bytes data are not freed

but I still do not know why...

/tmp$ cat memory.log | sed 's/.*=>//g' | sort | uniq -c
7 10
10 100
7 10000004
22 11
4 112
10 116
1 1192
26 12
1 124
4 128
12 13
1 131716
1 13334528
13 14
3 140
6 15
28 16
9 17
14 18
8 19
19 2
14 20
1 2048
6 21
5 22
3 23
230 24
1 240
6 25
7 256
4 27
1 2712
143 28
4 3
2 30
1 31
125 32
2 33
2 34
1 36
8272 4
3 40
1 4048
1 4064
9843 4080
8 4096
2 4373
1 44
1 45
78 48
1 49156
2 496
72 5
1 50
1 512
65 52
50 56
1 58
6 6
1 628
1 635
13 64
18 7
1 76
45 8
1 80
1 8080
2 84
4 88
74 9
1 98
1 freeing not allocated memory : NULL :



-- observe...pl --------------------------

my %hash = ();


while (<>)
{
my $line = $_;
#print "jmg:" . $line;
if ( $line =~ m/malloc\(([0-9]*)\).*= *([0-9xa-fNUL]*)/ )
{
my $size = $1;
my $ad = $2;
#print "malloc : $1 : $2 : \n" ;
if ( defined ( $hash{$ad}) )
{
print "redundant malloc : $1 : $2 : \n" ;
}
$hash{$ad} = $size;
}
elsif ( $line =~ m/realloc\(([0-9xa-fNUL]*) *, *([0-9]*)\).*=
*([0-9xa-fNUL]*)/ )
{
my $adp = $1;
my $size = $2;
my $ad = $3;


if ( not defined ( $hash{$adp}) )
{
print "realloc not allocated memory : $adp : \n" ;
}
delete $hash{$adp} ;

#print "malloc : $1 : $2 : \n" ;
if ( defined ( $hash{$ad}) )
{
print "redundant malloc : $size : $ad : \n" ;
}
$hash{$ad} = $size;
}
elsif ( $line =~ m/free\(([0-9xa-fNUL]*)/ )
{
my $ad = $1;
#print "free : $1 : \n" ;
if ( not defined ( $hash{$ad}) )
{
print "freeing not allocated memory : $1 : \n" ;
}
delete $hash{$ad} ;
}
else
{
print "???:" . $line;
}

}

foreach my $key ( keys ( %hash) )
{
print $key . ' => ' . $hash{$key} . "\n" ;
}
 
S

smallpond

jm a écrit :





ltrace perl essai.pl 2>&1 | grep 'malloc\|free\|realloc' | perl
observe_malloc_free.pl > memory.log

with observe_malloc_free.pl in mail bottom.

Hereafter, this result of memory leaks:

1 NULL is freed, but thats not a memory leak!

the rest can be read like this:
7 10 Mbytes data are not freed.
1 13 Mbytes data is not freed
8272 4 bytes data are not freed
9843 4080 bytes data are not freed

but I still do not know why...


I don't know much about the perl garbage collector,
but memory is not freed immediately when the ref
count goes to 0. When I run your program and watch
with top, VM goes to 200 MB and stays there for the
whole run. That seems to be some upper bound where
the garbage collector is running. Memory use does
not continue to go up.
 
J

jm

smallpond a écrit :
I don't know much about the perl garbage collector,
but memory is not freed immediately when the ref
count goes to 0. When I run your program and watch
with top, VM goes to 200 MB and stays there for the
whole run. That seems to be some upper bound where
the garbage collector is running. Memory use does
not continue to go up.

This is because I have only 500 Mbytes on my computer.
So I made a perl demo program which works within this limit.

Instead of a 10 MBytes string, you can (try to) use a 40 Mbytes string,
or a 100 Mbytes string.

And then, you will see if the garabage collector start at 200 Mbytes,
.... or not.

What I only showed with ltrace and observe_malloc_free.pl is that when
the program stops, garbage collector did not collected all garbage.
 
J

Joost Diepenmaat

jm said:
Modifying a little bit again the script, and checking execution with
ltrace, I observed malloc is called 1871 times when free is just called
922 times.
Isn't it an issue?

Please keep in mind that perl's memory allocation strategy in general is
optimized for longer running programs, not for one-off scripts (which
makes sense, since one-off scripts don't usually need the performance
gains). This means that for instance subroutines will get memory
allocated on the assumption that they'll be called again, and will take
about as much memory the next time.

This is NOT a memory leak per se, but it does mean that if you have a
subroutine that takes 100Mb to complete, your program will take that
memory and probably not give it back until the program ends. IOW, if you
have a long-running program that only means you need 100Mb for it to
run, it does NOT mean it takes a 100Mb for each call.

In your test case, don't assume that just because the regular expression
replacements don't in theory *need* to use any additional RAM, they
won't. Especially not if you're using UTF-8 encoded strings (which you
are). Perl algorithms tend to exchange RAM for speed in most cases
anyway, and replacing a match with a new string of exactly the same
length in bytes in a unicode string is a pretty uncommon use-case, so
it's likely not optimized.

Anyway, I've not seen a serious memory leak in perl itself in ages, and
I run perl processes that use up to 8 Gb of RAM and run for months
without issues.
 
J

jm

Joost Diepenmaat a écrit :
Please keep in mind that perl's memory allocation strategy in general is
optimized for longer running programs, not for one-off scripts (which
makes sense, since one-off scripts don't usually need the performance
gains).

I did not read this, nor in the documentation, nor in the faq.
Might be a faqmemory might be helpful?
This means that for instance subroutines will get memory
allocated on the assumption that they'll be called again, and will take
about as much memory the next time.

But when a same routine is called several times, with different kind and
size of data, some times it consume lot of memory, and some other time
less memory.
If your code contains hundred of functions and twenty of them consume
200 Mbytes (in a similar way as aa in my example), then a 2 Gbytes
computer will not be enough to run it.
This is NOT a memory leak per se, but it does mean that if you have a
subroutine that takes 100Mb to complete, your program will take that
memory and probably not give it back until the program ends.

I understand this. But I'd like to have the opposite feature.
Or at least one perl function to release the memory used by a function
(or a package).
IOW, if you
have a long-running program that only means you need 100Mb for it to
run, it does NOT mean it takes a 100Mb for each call.

Yes, it is what I observed.

But this mean it is not possible to free the memory consumed by one
function, when you know you need memory in another one function.
In your test case, don't assume that just because the regular expression
replacements don't in theory *need* to use any additional RAM, they
won't. Especially not if you're using UTF-8 encoded strings (which you
are). Perl algorithms tend to exchange RAM for speed in most cases
anyway, and replacing a match with a new string of exactly the same
length in bytes in a unicode string is a pretty uncommon use-case, so
it's likely not optimized.

I do not think the issue is here.
Anyway, I've not seen a serious memory leak in perl itself in ages, and
I run perl processes that use up to 8 Gb of RAM and run for months
without issues.

This mean you have a 8 Gbytes RAM memory computer.

But if memory was used by perl in a better way, might be the same
programs might work on a 512 MBytes RAM computer.
 
P

Peter J. Holzer

Joost Diepenmaat a écrit :
It doesn't.

I wrote a sample of code to illustrate the issue. [...]
Calling function aa (one or several times) makes a memory leak (or
memory empreint) of 150 Mbytes.
I mean that once I called this function I do not know how to free those
150 mega bytes, but if I call this same function again I will not loose
more memory.

Then it's not a memory leak. A memory leak is when memory which has been
allocated cannot be (re)used. But in your case it can be reused (and is,
if you call the function again).

Perl is certainly wasteful with memory - The data structures have a lot
of overhead, and it often doesn't free memory because it might need it
again later - but AFAIK perl itself doesn't leak. (Perl programs often
leak - the garbage collector cannot detect cycles, for example, so
the programmer has to remember to do that).

hp
 
P

Peter J. Holzer

Please keep in mind that perl's memory allocation strategy in general is
optimized for longer running programs, not for one-off scripts (which
makes sense, since one-off scripts don't usually need the performance
gains).

Or maybe it is optimized for one-off scripts? One-off scripts rarely
need to worry about hogging memory and it is certainly faster to let the
OS free all the memory at once than to call free a gazillion times.

hp
 
P

Peter J. Holzer

Joost Diepenmaat a écrit :
jm said:
Modifying a little bit again the script, and checking execution with
ltrace, I observed malloc is called 1871 times when free is just called
922 times.
Isn't it an issue?
[...]
IOW, if you
have a long-running program that only means you need 100Mb for it to
run, it does NOT mean it takes a 100Mb for each call.

Yes, it is what I observed.

But this mean it is not possible to free the memory consumed by one
function, when you know you need memory in another one function.

You can. Perl keeps around the lexical variables, but not any objects
they point to. So avoid large scalar, hash or array variables in subs
and use references instead. For example, compare the behaviour of

#!/usr/bin/perl
use warnings;
use strict;

print a(@ARGV), "\n";
exit 0;

sub a {
print "entering a @_\n";
my ($n) = @_;

my $s1 = "a" x $n;
my $s2 = "b" x $n;

my $rc = length($s1 . $s2);

print "leaving a @_\n";
return $rc;
}

and

#!/usr/bin/perl
use warnings;
use strict;

print a(@ARGV), "\n";
exit 0;

sub a {
print "entering a @_\n";
my ($n) = @_;

my $s;
$s->[1] = "a" x $n;
$s->[2] = "b" x $n;

my $rc = length($s->[1] . $s->[2]);

print "leaving a @_\n";
return $rc;
}

(And of course "length($s->[1] . $s->[2])" is (intentionally) stupid -
replace it with "length($s->[1]) + length($s->[2])")

This mean you have a 8 Gbytes RAM memory computer.

But if memory was used by perl in a better way, might be the same
programs might work on a 512 MBytes RAM computer.

Yes. But memory allocated to lexicals is usually the least of your
worries in this case. The overhead of typical perl data structures is
much worse. (Just this month I reduced the memory consumption of a
program from about 3 GB (which meant that it crashed sometimes, since
that's the limit on 32bit linux) to less than one GB by replacing an
anonymous array with a string (which I always had to unpack and repack
to access and manipulate the data within, which is ugly, but not really
slower than accessing the array).

hp
 
P

Peter J. Holzer

I don't know much about the perl garbage collector,
but memory is not freed immediately when the ref
count goes to 0.

This is wrong. When the ref count goes to zero, perl immediately calls
free.

free may decide to keep the memory around for a subsequent malloc call,
but that doesn't have anything to do with perl - only with your system's
malloc/free implementation (if you use the system's malloc, which is the
default on most platforms, I think).
When I run your program and watch with top, VM goes to 200 MB and
stays there for the whole run. That seems to be some upper bound
where the garbage collector is running.

Perl doesn't have a garbage collector which runs periodically.

hp
 
J

Joost Diepenmaat

Peter J. Holzer said:
Or maybe it is optimized for one-off scripts? One-off scripts rarely
need to worry about hogging memory and it is certainly faster to let the
OS free all the memory at once than to call free a gazillion times.

True enough. In any case, as others mentioned, the perl interpreter is
pretty wasteful with memory in most cases where it could trade off
between speed and memory efficiency.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,209
Messages
2,571,088
Members
47,684
Latest member
sparada

Latest Threads

Top