X
xhoster
I need a work queue, but I don't really care about the order (LIFO, FIFO,
random) in which things come out of it. Either is pretty simple and
efficient with a Perl array, and either would suffice. But I want the
queue to not hold duplicate entries. I could use an array as a stack or
queue, with a parallel hash to be checked to prevent duplicates from being
entered. But why keep parallel data structures? Just use the hash as the
queue:
while (key %hash) {
my $to_do=each %hash;
delete $hash{$to_do};
## do stuff with $to_do, which might make new entries in %hash
};
Well, except that this gets really slow on large hashes.
I suspect the each operator does a linear scan of all the buckets until it
finds an occupied one. With a hash that used to be big but now isn't (and
so has lots of empty buckets), this behaves poorly, essentially being N^2
to empty a large hash.
If I just use scalar %hash instead of scalar keys %hash, then the slow down
doesn't occur because "each" scans the buckets from where it left off last
time, instead of from the beginning each time. (At first I thought it was
scalar keys %hash that was being slow and was going to post about *that*,
but then I realized that keys' reseting of the iterator was causing "each"
to be slow). But you shouldn't change a hash at the same time you iterate
over it because things might get missed. Presumably, anything missed will
be found on the next time through the iterator, so this might work without
the slowdown:
while (%hash) { # does not reset iterator
my $to_do=each %hash;
next unless defined $to_do; # not a real hash key,
# signals end of iterator
## do stuff with $to_do, which might make new entries in %hash
};
If my speculation on the internals is right, then this would still
perform poorly if the hash first grows very large, then shrinks to
be only a few elements, but those last few elements keep triggering
the addition of just one more element each time. In my case, that
shouldn't be a problem.
Any comments on this code? Is there some less quirky way to do this
efficiently? Is there a simple (as simple as perl's internals can get)
way to fix "each" so that it doesn't have this degenerate case?
Thanks,
Xho
--
-------------------- http://NewsReader.Com/ --------------------
The costs of publication of this article were defrayed in part by the
payment of page charges. This article must therefore be hereby marked
advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
this fact.
random) in which things come out of it. Either is pretty simple and
efficient with a Perl array, and either would suffice. But I want the
queue to not hold duplicate entries. I could use an array as a stack or
queue, with a parallel hash to be checked to prevent duplicates from being
entered. But why keep parallel data structures? Just use the hash as the
queue:
while (key %hash) {
my $to_do=each %hash;
delete $hash{$to_do};
## do stuff with $to_do, which might make new entries in %hash
};
Well, except that this gets really slow on large hashes.
I suspect the each operator does a linear scan of all the buckets until it
finds an occupied one. With a hash that used to be big but now isn't (and
so has lots of empty buckets), this behaves poorly, essentially being N^2
to empty a large hash.
If I just use scalar %hash instead of scalar keys %hash, then the slow down
doesn't occur because "each" scans the buckets from where it left off last
time, instead of from the beginning each time. (At first I thought it was
scalar keys %hash that was being slow and was going to post about *that*,
but then I realized that keys' reseting of the iterator was causing "each"
to be slow). But you shouldn't change a hash at the same time you iterate
over it because things might get missed. Presumably, anything missed will
be found on the next time through the iterator, so this might work without
the slowdown:
while (%hash) { # does not reset iterator
my $to_do=each %hash;
next unless defined $to_do; # not a real hash key,
# signals end of iterator
## do stuff with $to_do, which might make new entries in %hash
};
If my speculation on the internals is right, then this would still
perform poorly if the hash first grows very large, then shrinks to
be only a few elements, but those last few elements keep triggering
the addition of just one more element each time. In my case, that
shouldn't be a problem.
Any comments on this code? Is there some less quirky way to do this
efficiently? Is there a simple (as simple as perl's internals can get)
way to fix "each" so that it doesn't have this degenerate case?
Thanks,
Xho
--
-------------------- http://NewsReader.Com/ --------------------
The costs of publication of this article were defrayed in part by the
payment of page charges. This article must therefore be hereby marked
advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
this fact.