E
Eric Wilhelm
I've been working on a 2D drafting / geometry module (CAD:rawing) and
noticed some HUGE slowdown when working with extremely large hashes.
This is an object-oriented module, so the typical blessed hash reference
is returned by the constructor function:
sub new {
my $caller = shift;
my $class = ref($caller) || $caller;
my $self = {@_};
bless($self, $class);
return($self);
}
The two keys under which most of the data is stored are $self->{g} (all
geometry, nested by layer name and then entity) and $self->{colortrack}
(keeping lists of addresses nested by layer,entity, then color.)
While loading the data into the structure took very little time with the
above function, I knocked about 11 minutes off of the cleanup using the
one below (from 11m27.723s down to 0m31.711s loading 689850 circles onto
4590 layers.
sub new {
my $caller = shift;
my $class = ref($caller) || $caller;
my $self = {@_};
# this is clunky, but saves -_#*HUGE*#_- on big drawings!
$CAD:rawing::geometry_data{$self} = {};
$self->{g} = $CAD:rawing::geometry_data{$self};
$CAD:rawing::color_tracking{$self} = {};
$self->{colortrack} = $CAD:rawing::color_tracking{$self};
bless($self, $class);
return($self);
}
Is it documented somewhere that named references cleanup faster than
anonymous ones? I've been crawling through the manual, google, the
archives of this group, etc. I've managed to find something about
lexical variables and attempts at memory reuse, but that doesn't seem
like the issue.
There is somewhat of a hint in the camel book about springing references
to life by dereferencing them as lvalues. Is this the reason for the
slowdown? I've noticed a similar (but maybe unrelated?) issue when using
my for a reference to a nested array:
$ time perl -e '$l = shift;
for($i=0;$i<$l;$i++) {
for($j=0;$j<$l;$j++) {
$d->[$i][$j] = [$i,$j];
}
}' 2000
real 0m8.190s
But when I claim $d as 'my':
$ time perl -e 'my $d; $l = shift;
for($i=0;$i<$l;$i++) {
for($j=0;$j<$l;$j++) {
$d->[$i][$j] = [$i,$j];
}
}' 2000
real 0m11.671s
I'm using v5.6.1 on Linux 2.4.21 (memory is not a problem, all tests have
been well within available ram (though the my'd variables do seem to use
more memory.))
Putting a print statement at the end of the loops shows that a good deal
of time is spent after the loops are done iff you said 'my $d;'
Anyone able to shed some light on this?
Thanks
Eric
noticed some HUGE slowdown when working with extremely large hashes.
This is an object-oriented module, so the typical blessed hash reference
is returned by the constructor function:
sub new {
my $caller = shift;
my $class = ref($caller) || $caller;
my $self = {@_};
bless($self, $class);
return($self);
}
The two keys under which most of the data is stored are $self->{g} (all
geometry, nested by layer name and then entity) and $self->{colortrack}
(keeping lists of addresses nested by layer,entity, then color.)
While loading the data into the structure took very little time with the
above function, I knocked about 11 minutes off of the cleanup using the
one below (from 11m27.723s down to 0m31.711s loading 689850 circles onto
4590 layers.
sub new {
my $caller = shift;
my $class = ref($caller) || $caller;
my $self = {@_};
# this is clunky, but saves -_#*HUGE*#_- on big drawings!
$CAD:rawing::geometry_data{$self} = {};
$self->{g} = $CAD:rawing::geometry_data{$self};
$CAD:rawing::color_tracking{$self} = {};
$self->{colortrack} = $CAD:rawing::color_tracking{$self};
bless($self, $class);
return($self);
}
Is it documented somewhere that named references cleanup faster than
anonymous ones? I've been crawling through the manual, google, the
archives of this group, etc. I've managed to find something about
lexical variables and attempts at memory reuse, but that doesn't seem
like the issue.
There is somewhat of a hint in the camel book about springing references
to life by dereferencing them as lvalues. Is this the reason for the
slowdown? I've noticed a similar (but maybe unrelated?) issue when using
my for a reference to a nested array:
$ time perl -e '$l = shift;
for($i=0;$i<$l;$i++) {
for($j=0;$j<$l;$j++) {
$d->[$i][$j] = [$i,$j];
}
}' 2000
real 0m8.190s
But when I claim $d as 'my':
$ time perl -e 'my $d; $l = shift;
for($i=0;$i<$l;$i++) {
for($j=0;$j<$l;$j++) {
$d->[$i][$j] = [$i,$j];
}
}' 2000
real 0m11.671s
I'm using v5.6.1 on Linux 2.4.21 (memory is not a problem, all tests have
been well within available ram (though the my'd variables do seem to use
more memory.))
Putting a print statement at the end of the loops shows that a good deal
of time is spent after the loops are done iff you said 'my $d;'
Anyone able to shed some light on this?
Thanks
Eric