How to generate random number without replacement?

P

Peng Yu

It seems that the int(rand(10)) generate random with replacement. I'm
wondering how to generate random number without replacement in perl.
Is there a function for it?
 
R

Randal L. Schwartz

Peng> It seems that the int(rand(10)) generate random with replacement. I'm
Peng> wondering how to generate random number without replacement in perl.
Peng> Is there a function for it?

$ perldoc -q shuffle
Found in /usr/local/lib/perl5/5.10.1/pod/perlfaq4.pod
How do I shuffle an array randomly?
If you either have Perl 5.8.0 or later installed, or if you have
Scalar-List-Utils 1.03 or later installed, you can say:

use List::Util 'shuffle';

@shuffled = shuffle(@list);

If not, you can use a Fisher-Yates shuffle.

sub fisher_yates_shuffle {
my $deck = shift; # $deck is a reference to an array
return unless @$deck; # must not be empty!

my $i = @$deck;
while (--$i) {
my $j = int rand ($i+1);
@$deck[$i,$j] = @$deck[$j,$i];
}
}

# shuffle my mpeg collection
#
my @mpeg = <audio/*/*.mp3>;
fisher_yates_shuffle( \@mpeg ); # randomize @mpeg in place
print @mpeg;

Note that the above implementation shuffles an array in place, unlike
the "List::Util::shuffle()" which takes a list and returns a new
shuffled list.

You've probably seen shuffling algorithms that work using splice,
randomly picking another element to swap the current element with

srand;
@new = ();
@old = 1 .. 10; # just a demo
while (@old) {
push(@new, splice(@old, rand @old, 1));
}

This is bad because splice is already O(N), and since you do it N times,
you just invented a quadratic algorithm; that is, O(N**2). This does not
scale, although Perl is so efficient that you probably won't notice this
until you have rather largish arrays.

print "Just another Perl hacker,"; # the original
 
P

Peng Yu

Peng> It seems that the int(rand(10)) generate random with replacement. I'm
Peng> wondering how to generate random number without replacement in perl..
Peng> Is there a function for it?

$ perldoc -q shuffle

I don't really need shuffle. For example, if I want to sample 1000
number (without replacement) from the numbers between 1 and
1000,000,000, shuffle will take more runtime than necessary.
 
J

Jürgen Exner

Peng Yu said:
It seems that the int(rand(10)) generate random with replacement. I'm
wondering how to generate random number without replacement in perl.

Could you please explain what you mean by "with/without replacement"?
A number is a number, it doesn't replace anything....

jue
 
U

Uri Guttman

Peng> It seems that the int(rand(10)) generate random with replacement. I'm
Peng> wondering how to generate random number without replacement in perl.
Peng> Is there a function for it?
PY> I don't really need shuffle. For example, if I want to sample 1000
PY> number (without replacement) from the numbers between 1 and
PY> 1000,000,000, shuffle will take more runtime than necessary.

then just call rand but cache your hits with a hash. if found in the
hash, then try again. this will work if your sample size of 1k is much
lower than the large number. it will still work but be slow if the
sampling size is close to the total size.

uri
 
U

Uri Guttman

JE> Ok. For those like me not familiar with this term: he means random
JE> numbers with and without repetition.

and i told him how to do it. i won't tell him again. it is a simple
problem and hashes solve it.

uri
 
P

Peng Yu

  >>> >It seems that the int(rand(10)) generate random with replacement.. I'm
  >>> >wondering how to generate random number without replacement in perl.
  >>>
  >>> Could you please explain what you mean by "with/without replacement"?
  >>> A number is a number, it doesn't replace anything....
  >>
  >> These are standard concepts in statistics. Please see the following
  >> webpage for the explanations on sampling 'with/without replacement'.
  >>
  >>http://www.ma.utexas.edu/users/parker/sampling/repl.htm

  JE> Ok. For those like me not familiar with this term: he means random
  JE> numbers with and without repetition.

and i told him how to do it. i won't tell him again. it is a simple
problem and hashes solve it.

What do you mean? I didn't ask you to tell me again.

But I feel sorry that perl doesn't provide such a function out of the
box.
 
U

Uri Guttman

PY> What do you mean? I didn't ask you to tell me again.

i told you how to do it. either you didn't read it or you didn't get the
solution.

PY> But I feel sorry that perl doesn't provide such a function out of the
PY> box.

i feel sorry for you that you can't code this up in 5 minutes. it is
trivial to do as i outlined. name another lang that has this built in.
it isn't needed as it is easily made from a hash and a loop and calls to
rand. this is about 2 lines of code and possibly 1 line. here, i will
code it up on the fly and possibly even get it right. i leave making
into a sub as your exercise.

my %seen ;
while( 1 ) { $x = int rand( 100_000_000 ) ; $seen{$x} and next ;
$seen{$x} = 1; print $x }

oops, it wrapped into 3 lines.

was that too complex? it will print numbers until it runs out of
them. no duplicates. make it a for loop to control the number of
prints. can you handle that? do you feel sorry for perl now? show me
another lang that could do that as easily.

uri
 
P

Peter J. Holzer

PY> What do you mean? I didn't ask you to tell me again.

i told you how to do it. either you didn't read it or you didn't get the
solution.

Uri, nobody asked you to tell him again. Why do insist on telling us
three times that you won't do it? Nobody asked me to write a haiku, but
I don't feel the slightest urge to tell you all that I won't write a
haiku,

hp
 
J

Jürgen Exner

Peng Yu said:
But I feel sorry that perl doesn't provide such a function out of the
box.

I feel neither sorry nor that Perl should provide such a function. You
are confusing generating random numbers with sampling a given data set.

Now, you could argue that beside equal distribution Perl should also
provide additional distributions of random numbers like e.g. Gaussian
distributions or random number without repetition or any other
distribution.
However IMNSHO Perl is a general purpose programming language and
functions like that are WAAAAYYYY to subject matter specific. If you
want them, then put them in a module. And of course you very welcome to
submit such a module to CPAN.

jue
 
J

John Bokma

Peter J. Holzer said:
Uri, nobody asked you to tell him again.

It's exactly this Ogrish behavior of a few regulars that makes this
place such a pain in the ass. :-(.

Luckily for the newbies there are sites like stackoverflow.com where at
least they can vote down such behaviour.

Again: Dear regular, you /don't have to post/. If a newbie pisses you
off, just move on to the next message in this group. You will make this
group a way more friendlier place, and hey, maybe it gets a bit more
traffic that way.

Ah, well, one can only hope...
 
T

Ted Zlatanov

UG> my %seen ;
UG> while( 1 ) { $x = int rand( 100_000_000 ) ; $seen{$x} and next ;
UG> $seen{$x} = 1; print $x }

This will grow pretty quickly with a hash. Bit::Vector already has
Bit_On($index) and bit_test($index) so memory usage and probably
performance will be a bit (heh) better.

Ted
 
U

Uri Guttman

UG> my %seen ;
UG> while( 1 ) { $x = int rand( 100_000_000 ) ; $seen{$x} and next ;
UG> $seen{$x} = 1; print $x }

TZ> This will grow pretty quickly with a hash. Bit::Vector already has
TZ> Bit_On($index) and bit_test($index) so memory usage and probably
TZ> performance will be a bit (heh) better.

he said he wanted 1k random numbers out of a large range so a hash would
be fine for that.

uri
 
X

Xho Jingleheimerschmidt

Ted said:
UG> my %seen ;
UG> while( 1 ) { $x = int rand( 100_000_000 ) ; $seen{$x} and next ;
UG> $seen{$x} = 1; print $x }

This will grow pretty quickly with a hash. Bit::Vector already has
Bit_On($index) and bit_test($index) so memory usage and probably
performance will be a bit (heh) better.

I'd be very surprised if Bit::Vector had faster performance, at least
until the other method started swapping.

Xho
 
X

Xho Jingleheimerschmidt

Peng said:
But I feel sorry that perl doesn't provide such a function out of the
box.

There are many ways to implement this, and which is best depends on how
large of a set you are sampling and how densely you are sampling it, as
well as perhaps other things (do you know how many you need in advance,
or will you know when you get there? Are you happy with standard rand()
or do you want something better, etc.). I'm entirely unsurprised Perl
doesn't provide this out of the box, as it is so context dependent, plus
not all that common. It is trivial to implement for yourself, and throw
your implementation into a toolkit module you use for all your code.

Xho
 
T

Ted Zlatanov

UG> my %seen ;
UG> while( 1 ) { $x = int rand( 100_000_000 ) ; $seen{$x} and next ;
UG> $seen{$x} = 1; print $x }
XJ> I'd be very surprised if Bit::Vector had faster performance, at least
XJ> until the other method started swapping.

It's C internally so performance is pretty good. But you're probably
right anyhow, I wasn't thinking.


UG> he said he wanted 1k random numbers out of a large range so a hash would
UG> be fine for that.

You're right, I wasn't paying enough attention.

Ted
 
D

David Combs

I see it differently.

He's got every right to get pissed off from time to time.

How many hours a week does he sit in front of the terminal
answering questions for us (when he could be making money,
presumably, a nice sacrifice for us all!) and after a while
of being eaten by "I have to get back to WORK sometime --
so I can pay some BILLS" or whatever else might be eating
him now and then, that almost it doesn't matter WHAT the
next question is, the least perceived-foolishness/stupidity
in it will elicit a ROAR and accompanying dragon-mouth FLAME
BLAST.

That done, he bypasses a few questions he might have liked
answering, and he's back to his nice helpful self, again
answering hairy questions.

So thank you, Uri, for being there!

David
 
D

David Combs

UG> my %seen ;
UG> while( 1 ) { $x = int rand( 100_000_000 ) ; $seen{$x} and next ;
UG> $seen{$x} = 1; print $x }

TZ> This will grow pretty quickly with a hash. Bit::Vector already has
TZ> Bit_On($index) and bit_test($index) so memory usage and probably
TZ> performance will be a bit (heh) better.

he said he wanted 1k random numbers out of a large range so a hash would
be fine for that.

uri

WERE this to be a cpan module, maybe a mix would be nice. Bit vector for
the first N of them, N of course requiring N/8 bytes preallocated probably,
and for > N, the hash scheme?

David
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,228
Members
46,817
Latest member
AdalbertoT

Latest Threads

Top