K
KKramsch
I have a script that, over a period of several *days* gradually
builds a very large Perl hash. Periodically, it saves this large
hash to a file, using the Storable module. In the past, this
storage process has resulted in a corrupted (and unusable) file,
so the current version of the script tests for the soundness of
the stored file by saving the hash to a dummy file first, then
retrieving the hash from memory into a temporary variable $temp,
and making sure that $temp is defined and that %$temp has the right
number of keys. If all this is as it should be, then the dummy
file is used to overwrite the old version of the hash stored on
disk.
It turns out, however, that this version of the script is about
10x slower than the original version, which did not do this extra
check on the stored hash. Using carefully placed print statements,
I determined that the bottleneck is not due to the extra retrieval
and checking steps, but to the deallocation of %$temp that happens
when $temp goes out of scope. Since %$temp is very large and
useless once the check is done, I don't want it hanging around
longer than necessary, but the deallocation step takes 3-4 minutes!
This is about 100 times slower than the time it takes to allocate
%$temp in the first place! It's crazy. I confirmed this by
inserting an explicit statement "undef $temp" right before the end
of the enclosing scope, and noting (via print statements) that this
step is the script's worst bottleneck by far.
It's the same thing if I make $temp file-global and skip the
explicit deallocation step. Now the bottleneck becomes every time
that I assign a new value to $temp, which (except for the first
time) involves deallocating the last contents of %$temp.
Is there any way to speed up the deallocation of %$temp (and
$temp)?
Thanks!
Karl
builds a very large Perl hash. Periodically, it saves this large
hash to a file, using the Storable module. In the past, this
storage process has resulted in a corrupted (and unusable) file,
so the current version of the script tests for the soundness of
the stored file by saving the hash to a dummy file first, then
retrieving the hash from memory into a temporary variable $temp,
and making sure that $temp is defined and that %$temp has the right
number of keys. If all this is as it should be, then the dummy
file is used to overwrite the old version of the hash stored on
disk.
It turns out, however, that this version of the script is about
10x slower than the original version, which did not do this extra
check on the stored hash. Using carefully placed print statements,
I determined that the bottleneck is not due to the extra retrieval
and checking steps, but to the deallocation of %$temp that happens
when $temp goes out of scope. Since %$temp is very large and
useless once the check is done, I don't want it hanging around
longer than necessary, but the deallocation step takes 3-4 minutes!
This is about 100 times slower than the time it takes to allocate
%$temp in the first place! It's crazy. I confirmed this by
inserting an explicit statement "undef $temp" right before the end
of the enclosing scope, and noting (via print statements) that this
step is the script's worst bottleneck by far.
It's the same thing if I make $temp file-global and skip the
explicit deallocation step. Now the bottleneck becomes every time
that I assign a new value to $temp, which (except for the first
time) involves deallocating the last contents of %$temp.
Is there any way to speed up the deallocation of %$temp (and
$temp)?
Thanks!
Karl