I've got a 400 or so elements from which I want to select a small
variable number of random elements with each pass of a loop that runs
100 million times.
This is what I've been doing:
I cached the unmodified array, and then with each pass enumerate it as
a new array and then take that reference. Then while I started off by
using Fisher-Yates, but then found because I was only interested in a
small number of elements I was better off simply splicing a random
element from the array. (I know that the FAQ cautions this against this
in thee general case but in this instance splicing has proved faster.
Actually splicing is also fatser than swapping the random element with
the last element and then "pop"-ing it off the end of the array -- that
actually came as a bit of surprise to me). Nevertheless, it seems that
the operation is still pretty slow.
Here's a contrived snippet (I'm actually using the Mersenne Twister as
implemented in Math::Random::MT instead of perl's built-in rand
function as my PNRG but that's of course beside the point):
# START code snippet
my $ARRAY_ra = [(qw(1 2 3 4 5 6 7 8 9 10 11 12 13)) x 16]
for (1 .. 100_000) {
my $copy_ra = [ @{ $ARRAY_ra } ];
for (1 .. 10) {
# select 10 random elements from the array
# in reality all that is known anoout the actual number of
elemnts is that
# the number is small relative to the size of the array
&get_ele( $copy_ra );
}
};
sub get_ele {
my $ar = shift;
# gnerate random index of an elements in the array
my $r = int(rand($#{$ar} + 1));
# return selected element, modifying array in place;
return splice @{$ar}, $r, 1;}
## END SNIPPET
So any thoughts on how possibly to speed this up?
Thanks,
J.
Bringing this inner consideration to the outer level (against the rules!)
for readability. Actually so it can be seen I guess.
There is a third method that was overlooked. Indeed its the most promising
as it has the best performance over the complete range.
It's a flag method. Excuse me if has been mentioned before. It is at the top of
the performance scale over a wide range. The determining factor is the sum of what
it does, as robic0 goes to assembly infinity...
I used a medium distribution where I didn't have to sit here forever waiting
for benchmarks. See below for details. I thought you would like to know this
before you bury your head in PhD books. I mean, if speed is all you wan't.
Extrapolate to the working model. If you don't see a clear path to that, or some
other obstruction, let me know and I will think about it.
#!perl
use strict;
use warnings;
use Benchmark qw(cmpthese);
my (@ARRAY, $ssize, $asize);
my @Set_sizes = (150,300,1000,2000);
my @Ary_sizes = (2500,3000,3500,5000);
for (@Ary_sizes)
{
$asize = $_;
@ARRAY = (0 .. $asize);
for (@Set_sizes)
{
$ssize = $_;
print "\n\n***>> $ssize/$asize\n";
cmpthese(10000, {
'swap_pop' => \&use_swap_pop,
'splice' => \&use_splice,
'flag' => \&use_flag,
} );
}
}
sub use_splice {
my @ar = @ARRAY;
for (0 .. $ssize) {
splice @ar, rand( @ar ), 1 ;
}
}
sub use_swap_pop{
my @ar = @ARRAY;
for (0 .. $ssize) {
my $r = rand(@ar);
my $n = $#ar;
@ar[$n,$r] = @ar[$r,$n];
pop @ar;
}
}
sub use_flag {
my $asize = @ARRAY;
my $rnumb;
my @flag_array;
for (0 .. $ssize) {
while (defined $flag_array[($rnumb = int ( rand ( $asize ) ))]) {}
$flag_array[$rnumb] = 1;
}
}
sub debug_use_flag {
my $asize = @ARRAY;
my $rnumb;
my @flag_array;
for (0 .. $ssize) {
while (defined $flag_array[($rnumb = int ( rand ( $asize ) ))]) {
print $rnumb,"\n";
}
$flag_array[$rnumb] = 1;
print " $rnumb\n";
}
print "=================================\n";
<>;
}
__END__
***>> 150/2500
Rate swap_pop splice flag
swap_pop 2105/s -- -1% -74%
splice 2119/s 1% -- -74%
flag 8203/s 290% 287% --
***>> 300/2500
Rate swap_pop splice flag
swap_pop 1410/s -- -3% -67%
splice 1455/s 3% -- -66%
flag 4266/s 203% 193% --
***>> 1000/2500
Rate swap_pop splice flag
swap_pop 560/s -- -13% -55%
splice 642/s 15% -- -48%
flag 1233/s 120% 92% --
***>> 2000/2500
Rate swap_pop splice flag
swap_pop 301/s -- -25% -37%
splice 403/s 34% -- -16%
flag 478/s 59% 19% --
***>> 150/3000
Rate splice swap_pop flag
splice 1783/s -- -4% -77%
swap_pop 1855/s 4% -- -77%
flag 7899/s 343% 326% --
***>> 300/3000
Rate splice swap_pop flag
splice 1253/s -- -4% -70%
swap_pop 1301/s 4% -- -69%
flag 4211/s 236% 224% --
***>> 1000/3000
Rate swap_pop splice flag
swap_pop 542/s -- -1% -56%
splice 548/s 1% -- -56%
flag 1243/s 129% 127% --
***>> 2000/3000
Rate swap_pop splice flag
swap_pop 295/s -- -14% -45%
splice 344/s 17% -- -36%
flag 536/s 82% 55% --
***>> 150/3500
Rate splice swap_pop flag
splice 1550/s -- -8% -80%
swap_pop 1689/s 9% -- -78%
flag 7710/s 398% 357% --
***>> 300/3500
Rate splice swap_pop flag
splice 1094/s -- -10% -74%
swap_pop 1214/s 11% -- -71%
flag 4155/s 280% 242% --
***>> 1000/3500
Rate splice swap_pop flag
splice 482/s -- -8% -62%
swap_pop 525/s 9% -- -58%
flag 1253/s 160% 139% --
***>> 2000/3500
Rate swap_pop splice flag
swap_pop 290/s -- -3% -49%
splice 299/s 3% -- -47%
flag 564/s 94% 88% --
***>> 150/5000
Rate splice swap_pop flag
splice 1081/s -- -15% -85%
swap_pop 1277/s 18% -- -82%
flag 7189/s 565% 463% --
***>> 300/5000
Rate splice swap_pop flag
splice 776/s -- -21% -81%
swap_pop 988/s 27% -- -75%
flag 4000/s 416% 305% --
***>> 1000/5000
Rate splice swap_pop flag
splice 354/s -- -26% -72%
swap_pop 477/s 35% -- -62%
flag 1262/s 257% 165% --
***>> 2000/5000
Rate splice swap_pop flag
splice 212/s -- -23% -65%
swap_pop 274/s 29% -- -54%
flag 602/s 184% 119% --