cleanup speed (named vs anonymous (and my?))

E

Eric Wilhelm

I've been working on a 2D drafting / geometry module (CAD::Drawing) and
noticed some HUGE slowdown when working with extremely large hashes.

This is an object-oriented module, so the typical blessed hash reference
is returned by the constructor function:

sub new {
my $caller = shift;
my $class = ref($caller) || $caller;
my $self = {@_};
bless($self, $class);
return($self);
}

The two keys under which most of the data is stored are $self->{g} (all
geometry, nested by layer name and then entity) and $self->{colortrack}
(keeping lists of addresses nested by layer,entity, then color.)

While loading the data into the structure took very little time with the
above function, I knocked about 11 minutes off of the cleanup using the
one below (from 11m27.723s down to 0m31.711s loading 689850 circles onto
4590 layers.

sub new {
my $caller = shift;
my $class = ref($caller) || $caller;
my $self = {@_};
# this is clunky, but saves -_#*HUGE*#_- on big drawings!
$CAD::Drawing::geometry_data{$self} = {};
$self->{g} = $CAD::Drawing::geometry_data{$self};
$CAD::Drawing::color_tracking{$self} = {};
$self->{colortrack} = $CAD::Drawing::color_tracking{$self};
bless($self, $class);
return($self);
}

Is it documented somewhere that named references cleanup faster than
anonymous ones? I've been crawling through the manual, google, the
archives of this group, etc. I've managed to find something about
lexical variables and attempts at memory reuse, but that doesn't seem
like the issue.

There is somewhat of a hint in the camel book about springing references
to life by dereferencing them as lvalues. Is this the reason for the
slowdown? I've noticed a similar (but maybe unrelated?) issue when using
my for a reference to a nested array:

$ time perl -e '$l = shift;
for($i=0;$i<$l;$i++) {
for($j=0;$j<$l;$j++) {
$d->[$i][$j] = [$i,$j];
}
}' 2000

real 0m8.190s

But when I claim $d as 'my':

$ time perl -e 'my $d; $l = shift;
for($i=0;$i<$l;$i++) {
for($j=0;$j<$l;$j++) {
$d->[$i][$j] = [$i,$j];
}
}' 2000

real 0m11.671s

I'm using v5.6.1 on Linux 2.4.21 (memory is not a problem, all tests have
been well within available ram (though the my'd variables do seem to use
more memory.))

Putting a print statement at the end of the loops shows that a good deal
of time is spent after the loops are done iff you said 'my $d;'

Anyone able to shed some light on this?

Thanks
Eric
 
N

nobull

Eric Wilhelm said:
I've been working on a 2D drafting / geometry module (CAD::Drawing) and
noticed some HUGE slowdown when working with extremely large hashes.

This is an object-oriented module, so the typical blessed hash reference
is returned by the constructor function:

sub new {
my $caller = shift;
my $class = ref($caller) || $caller;
my $self = {@_};
bless($self, $class);
return($self);
}

The two keys under which most of the data is stored are $self->{g} (all
geometry, nested by layer name and then entity) and $self->{colortrack}
(keeping lists of addresses nested by layer,entity, then color.)

While loading the data into the structure took very little time with the
above function, I knocked about 11 minutes off of the cleanup using the
one below (from 11m27.723s down to 0m31.711s loading 689850 circles onto
4590 layers.

sub new {
my $caller = shift;
my $class = ref($caller) || $caller;
my $self = {@_};
# this is clunky, but saves -_#*HUGE*#_- on big drawings!
$CAD::Drawing::geometry_data{$self} = {};
$self->{g} = $CAD::Drawing::geometry_data{$self};
$CAD::Drawing::color_tracking{$self} = {};
$self->{colortrack} = $CAD::Drawing::color_tracking{$self};
bless($self, $class);
return($self);
}

Is it documented somewhere that named references cleanup faster than
anonymous ones?

I think the point is not that they are cleaned up faster but that the
package variables are simply not cleaned up at process end.

If you are happy for your class to leak memory the above trick will
work fine.

If you want Perl to terminate cleaning up at all then use the POSIX
_exit().

As I undersand it, package variables just continue to exist and the
memory will be freed when the process terminates.

Actually it's not just package variables but any variable that
referenced indirectly by the symbol table.

You would have seen the same effect if %geometry_data had been a
file-scoped lexical.

my $foo; # Will get cleaned up

my $bar; # Won't get cleaned up because...
sub uses_bar { $bar }; # ...$bar now referenced by &main::uses_bar

our $baz; # Won't get cleaned up because it's a package variable

my $bat; # Won't get cleaned up because...
our $uses_bat { $bar }; # ...$bar now referenced by $main::uses_bat

There is something called the global destruction phase when the
DESTROY() method is called for every object that's still left over at
the end of the process - but this is not a proper neat systematic
cleanup and even seemingly innocent things can actually segfault
during global destruction.

At least that's how I understand all this. I've not actually tested
my beliefs extensively.
 
C

ctcgag

Eric Wilhelm said:
I've been working on a 2D drafting / geometry module (CAD::Drawing) and
noticed some HUGE slowdown when working with extremely large hashes.

This is an object-oriented module, so the typical blessed hash reference
is returned by the constructor function:

sub new {
my $caller = shift;
my $class = ref($caller) || $caller;
my $self = {@_};
bless($self, $class);
return($self);
}

The two keys under which most of the data is stored are $self->{g} (all
geometry, nested by layer name and then entity) and $self->{colortrack}
(keeping lists of addresses nested by layer,entity, then color.)

While loading the data into the structure took very little time with the
above function, I knocked about 11 minutes off of the cleanup using the
one below (from 11m27.723s down to 0m31.711s loading 689850 circles onto
4590 layers.


I've noticed that the clean-up of very large hashes is very
ill-behaved. It can sometime be inordinately slow, and what
causes it to be slow or not is unpredicable. On some systems and
versions, simply allocating buckets at the start (even the default number
of buckets) can cure the slowness. Sometimes switching from lexical
to dynamic has done it. Sometimes it seems that switching from
named to referenced only (or vice versa) does it. Since I rarely want to
destroy large hashes except at program end, I've taken to just routinely
using _exit in programs with large hashes.

Xho
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,102
Messages
2,570,645
Members
47,246
Latest member
TemekaLutz

Latest Threads

Top