undef takes forever

K

KKramsch

I have a script that, over a period of several *days* gradually
builds a very large Perl hash. Periodically, it saves this large
hash to a file, using the Storable module. In the past, this
storage process has resulted in a corrupted (and unusable) file,
so the current version of the script tests for the soundness of
the stored file by saving the hash to a dummy file first, then
retrieving the hash from memory into a temporary variable $temp,
and making sure that $temp is defined and that %$temp has the right
number of keys. If all this is as it should be, then the dummy
file is used to overwrite the old version of the hash stored on
disk.

It turns out, however, that this version of the script is about
10x slower than the original version, which did not do this extra
check on the stored hash. Using carefully placed print statements,
I determined that the bottleneck is not due to the extra retrieval
and checking steps, but to the deallocation of %$temp that happens
when $temp goes out of scope. Since %$temp is very large and
useless once the check is done, I don't want it hanging around
longer than necessary, but the deallocation step takes 3-4 minutes!
This is about 100 times slower than the time it takes to allocate
%$temp in the first place! It's crazy. I confirmed this by
inserting an explicit statement "undef $temp" right before the end
of the enclosing scope, and noting (via print statements) that this
step is the script's worst bottleneck by far.

It's the same thing if I make $temp file-global and skip the
explicit deallocation step. Now the bottleneck becomes every time
that I assign a new value to $temp, which (except for the first
time) involves deallocating the last contents of %$temp.

Is there any way to speed up the deallocation of %$temp (and
$temp)?

Thanks!

Karl
 
A

A. Sinan Unur

I have a script that, over a period of several *days* gradually
builds a very large Perl hash. Periodically, it saves this large
hash to a file, using the Storable module. In the past, this
storage process has resulted in a corrupted (and unusable) file,
so the current version of the script tests for the soundness of
the stored file by saving the hash to a dummy file first, then
retrieving the hash from memory into a temporary variable $temp,
and making sure that $temp is defined and that %$temp has the right
number of keys. If all this is as it should be, then the dummy
file is used to overwrite the old version of the hash stored on
disk.

You can't really be sure of the "soundness" of the file using this method.
It is hard to come up with a recommendation without knowing how much that
hash really needs to stay in memory at any given time. If, most of the
time, you don't need to reference previously computed elements of the hash,
I'd recommend at least using a tied hash, a DBM module. Alternatively, my
favorite at this point, you can look at SQLite with Class::DBI.
It turns out, however, that this version of the script is about
10x slower than the original version, which did not do this extra
check on the stored hash. Using carefully placed print statements,
I determined that the bottleneck is not due to the extra retrieval
and checking steps, but to the deallocation of %$temp that happens
when $temp goes out of scope. Since %$temp is very large and
useless once the check is done, I don't want it hanging around
longer than necessary, but the deallocation step takes 3-4 minutes!
This is about 100 times slower than the time it takes to allocate
%$temp in the first place! It's crazy. I confirmed this by
inserting an explicit statement "undef $temp" right before the end
of the enclosing scope, and noting (via print statements) that this
step is the script's worst bottleneck by far.

Again, without code, I have no idea what you are talking about. How big is
this thing?

Over time, most of the memory your script is using is being paged out to
the hard drive. On my Win 98 PIII500 with 128 Mb RAM, I ran the following
script:

#! perl

use strict;
use warnings;

print "Filling the hash now:\n";
my $t0 = time;
{
my $h;
$h->{$_} = $_ for (1 .. 750_000);
print <<EOT;
It took @{[ time - $t0 ]} seconds to fill the hash.
Now let's undef it:
EOT
$t0 = time;
}

print "It took @{[ time - $t0 ]} seconds to undef the hash.\n"

D:\Home> perl t.pl
Filling the hash now:
It took 11 seconds to fill the hash.
Now let's undef it:
It took 65 seconds to undef the hash.

On the other hand, with 500_000 elements instead of 750_000, I get:

D:\Home> perl t.pl
Filling the hash now:
It took 6 seconds to fill the hash.
Now let's undef it:
It took 3 seconds to undef the hash.

So, the solution seems to be to move away from holding all your data in
memory.

Sinan.
 
B

Bart Lateur

KKramsch said:
so the current version of the script tests for the soundness of
the stored file by saving the hash to a dummy file first, then
retrieving the hash from memory into a temporary variable $temp,
and making sure that $temp is defined and that %$temp has the right
number of keys. If all this is as it should be, then the dummy
file is used to overwrite the old version of the hash stored on
disk.

It turns out, however, that this version of the script is about
10x slower than the original version,

I think the extra slowness is due to your memory being full,causing
extensive swapping.

I'd try to move the check to an external script, which needn't be so
careful about carefully releasing all memory. If necessary, you can
shortcircuit the garbage collection for that external script using a
carefully chosen exec().
 
A

Anno Siegel

Bart Lateur said:
I think the extra slowness is due to your memory being full,causing
extensive swapping.

I'd try to move the check to an external script, which needn't be so
careful about carefully releasing all memory. If necessary, you can
shortcircuit the garbage collection for that external script using a
carefully chosen exec().

How so? Did you mean exit()?

....or even POSIX::_exit? That bypasses detailed de-allocation and
frees all memory at once -- much faster. It also bypasses any
DESTROY calls and END blocks, so it isn't always applicable.

Anno
 
B

Ben Morrow

Quoth (e-mail address removed)-berlin.de (Anno Siegel):
How so? Did you mean exit()?

A common trick for programs which leak, at least on Unix, is to re-exec
yourself every so often (having arranged things so you can get back into
the state you were in, of course), which will 'deal' with the leaks.

I'm not sure if this applies here though: will perl go through a full GC
run if you exec, even though it doesn't need to free the memory? I guess
it may, in order to call DESTROY handlers...

Ben
 
A

Anno Siegel

Ben Morrow said:
Quoth (e-mail address removed)-berlin.de (Anno Siegel):[...]
How so? Did you mean exit()?

A common trick for programs which leak, at least on Unix, is to re-exec
yourself every so often (having arranged things so you can get back into
the state you were in, of course), which will 'deal' with the leaks.

Oh, right.
I'm not sure if this applies here though: will perl go through a full GC
run if you exec, even though it doesn't need to free the memory? I guess
it may, in order to call DESTROY handlers...

Apparently not. From the Camel:

It [END and DESTROY] isn't run if, instead of exiting, the current
process just morphs itself from one program to another via exec.

I can't find the equivalent passage in perldoc.

Anno
 
B

Bart Lateur

Anno said:
From the Camel:

It [END and DESTROY] isn't run if, instead of exiting, the current
process just morphs itself from one program to another via exec.

I can't find the equivalent passage in perldoc.

See the bottom of "perldoc -f exec":

Note that "exec" will not call your "END" blocks, nor will it
call any "DESTROY" methods in your objects.


I know of a person who used

exec("true");

as a way to quickly get out of a perl script. "true" is a Unix/BSD/Linux
command line "tool" that doesn't do much at all. (I think its main
purpose is to be used in "make" setups.)

man true: <http://www.rt.com/man/true.1.html>
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,161
Messages
2,570,892
Members
47,427
Latest member
HildredDic

Latest Threads

Top