Generating 2 independent random numbers

B

bilgekhan

What is the correct method for generating 2 independent random numbers?
They will be compared whether they are equal.

What about this method:

srand(time(0));
int r1 = rand();
srand(rand());
int r2 = rand();
bool f = r1 == r2;
 
B

bilgekhan

In my prev. posting I forgot to mention that the
random numbers shall be in the range 0 to 99.
So here my updated question:

What is the correct method for generating
2 independent random numbers in the range 0 to 99 ?
They will immediately be compared for equality.

What about this method:

srand(time(0));
int r1 = rand() % 100;
srand(rand());
int r2 = rand() % 100;
bool f = r1 == r2;
 
P

Pascal J. Bourguignon

bilgekhan said:
In my prev. posting I forgot to mention that the
random numbers shall be in the range 0 to 99.
So here my updated question:

What is the correct method for generating
2 independent random numbers in the range 0 to 99 ?
They will immediately be compared for equality.

What Sam said.
What about this method:

srand(time(0));
int r1 = rand() % 100;
srand(rand());
int r2 = rand() % 100;
bool f = r1 == r2;

The rand() function returns a pseudo-random integer between 0 and RAND_MAX.

Therefore, assuming it's excluding RAND_MAX, and including 0, and
assuming each value is equiprobable (which is not specified by the man
page of rand(3) on Linux), if RAND_MAX % 100 is not 0, then your
randoms r1 and r2 won't be equiprobable.

But this is already assuming too much, as further reading of the
rand(3) man page.


If you care so little about your randomness, you could as well write

bool f=((rand()%100)==(rand()%100));
 
K

Kai-Uwe Bux

bilgekhan said:
In my prev. posting I forgot to mention that the
random numbers shall be in the range 0 to 99.
So here my updated question:

What is the correct method for generating
2 independent random numbers in the range 0 to 99 ?
They will immediately be compared for equality.

What about this method:

srand(time(0));
int r1 = rand() % 100;
srand(rand());
int r2 = rand() % 100;
bool f = r1 == r2;

Not good: the reseeding of the RNG just has unknown effects. If the RNG used
by rand() is good, the reseeding very likely will make it worse. At the
very least, you loose all the good things that the theory inspiring the RNG
guarantees.

Your question has two aspects:

a) How to generate independently distributed random numbers?

b) How to generate random numbers in [0,100)?

As for (a), any RNG is supposed to yield pseudo random numbers that look
independent. Therefore, just using a decent RNG will take care of the
independence requirement.

As for (b), the %-trick is probably good enough for your purposes. However,
beware that unless RAND_MAX+1 is divisible by 100.


As for the immediate comparison part: if you discard r1 and r2 immediately
after the comparison, and only keep f, then it suffices to generate one
number r in [0,100) and do:

bool f = ( r == 0 );

because that will yield true with probability 0.01, too.



Best

Kai-Uwe Bux
 
L

Lionel B

In my prev. posting I forgot to mention that the random numbers shall be
in the range 0 to 99. So here my updated question:

What is the correct method for generating 2 independent random numbers
in the range 0 to 99 ? They will immediately be compared for equality.

As others have mentioned, most random number generators (and almost
certainly your `rand()' call) generate *pseudo*-random sequences of
numbers. "Real" random number sequences (and it's by no means trivial to
say what that even means) are hard to come by and will generally involve
some hardware-based solution. I'll assume that pseudo random is good
enough for your purposes.

But be aware that there is a huge variation in the "randomness" of
sequences generated by pseudo random generators. The system-supplied
generators (like `rand()') are often astonishingly non-random, and the
(current) C++ standard allows them to be as cruddy as they please.
Thankfully, the forthcoming C++ standard will include some higher quality
generators such as the Mersenne Twister (recommended, Google it). But
even that would not be good enough e.g. for cryptographic purposes.
What about this method:

srand(time(0));

This is ok as long as you don't call this routine again so quickly that
time(0) returns the same value as for the last call.
int r1 = rand() % 100;

This can be a bad idea, particularly for so-called "linear congruential"
rngs (and there's a fair chance your `rand()' will be linear
congruential). The reason for this is that mod-ing a number basically
involves using just the lower-order bits, which for many rngs - and
linear congruential ones in particular - are frequently the "least
random".

Safer is something along the lines of:

int r1 = int(100.0*double(rand())/double(RAND_MAX));

that is, you generate a uniform(ish) random double in the range [0,1),
multiply it by 100 and take the integer part. This gives you a uniform
(ish) random int in the range [0,99). It's safer because it relies more
heavily on the higher-order bits of the number returned by your rng.
srand(rand());
^^^^^^^^^^^^^^

Probably a bad idea to re-seed your rng with a number generated by the
same rng... and pointless in any case. Seed once and generate away...
Leave out that line.
int r2 = rand() % 100;

Again, as for r1 above.
 
B

bilgekhan

Pascal J. Bourguignon said:
What Sam said.


The rand() function returns a pseudo-random integer between 0 and RAND_MAX.

Therefore, assuming it's excluding RAND_MAX, and including 0, and
assuming each value is equiprobable (which is not specified by the man
page of rand(3) on Linux), if RAND_MAX % 100 is not 0, then your
randoms r1 and r2 won't be equiprobable.

But this is already assuming too much, as further reading of the
rand(3) man page.


If you care so little about your randomness, you could as well write

bool f=((rand()%100)==(rand()%100));

Have you tried this in a loop of say 1 million times and counted the hits?
I think theoretically there should be about 10,000 matches.
What result do you get?
Maybe 0 ? :)
 
L

Lionel B

Have you tried this in a loop of say 1 million times and counted the
hits? I think theoretically there should be about 10,000 matches. What
result do you get?
Maybe 0 ? :)

Why do you think you might get 0 ?
 
B

bilgekhan

Kai-Uwe Bux said:
bilgekhan said:
In my prev. posting I forgot to mention that the
random numbers shall be in the range 0 to 99.
So here my updated question:

What is the correct method for generating
2 independent random numbers in the range 0 to 99 ?
They will immediately be compared for equality.

What about this method:

srand(time(0));
int r1 = rand() % 100;
srand(rand());
int r2 = rand() % 100;
bool f = r1 == r2;

Not good: the reseeding of the RNG just has unknown effects. If the RNG used
by rand() is good, the reseeding very likely will make it worse. At the
very least, you loose all the good things that the theory inspiring the RNG
guarantees.

Your question has two aspects:

a) How to generate independently distributed random numbers?

b) How to generate random numbers in [0,100)?

As for (a), any RNG is supposed to yield pseudo random numbers that look
independent. Therefore, just using a decent RNG will take care of the
independence requirement.

I did some experiments with the standard pseudo-RNG
of the compiler and I come to the conclusion that it is
impossible to let 1 RNG _correctly_ generate 2 independent
random numbers.
If someone manages this let me know pls, though
I solved my original problem by the clever method of Kai-Uwe below.
As for (b), the %-trick is probably good enough for your purposes. However,
beware that unless RAND_MAX+1 is divisible by 100.

Unfortuately it's not divisable by 100, and
unfortunately on my machine RAND_MAX is only 32767 :-(
As for the immediate comparison part: if you discard r1 and r2 immediately
after the comparison, and only keep f, then it suffices to generate one
number r in [0,100) and do:

bool f = ( r == 0 );

because that will yield true with probability 0.01, too.

That's really a nice solution!
So there is no need for 2 random numbers at all,
and all the other associated problems with it.
Thanks, Kai-Uwe!
 
B

bilgekhan

Lionel B said:
Why do you think you might get 0 ?

Because it is the same sequence of the RNG... :)
A seed sequence of rand() always generates
RAND_MAX different random numbers.
So within a sequence it cannot generate the same number twice.
After generating RAND_MAX numbers the sequence
starts over again to generate again the same sequence of numbers.

Since the program is within the same sequence
and since the two numbers are generated together
it never can happen that these 2 numbers somehow match...

Ie. rand() is predictable: if you generate RAND_MAX numbers
and store them somewhere then you can look there what random
number it will generate next. :)
But this works only if you don't change the rand sequence
by calling srand() a second time.

I did some experiments and I come to the conclusion
that it is not NOT possible to let 1 RNG _correctly_ generate
2 independent random numbers.
One has to solve it cleverly via just 1 random number only
as was shown by Kai-Uwe Bux (cf. his posting).
 
L

Lionel B

Because it is the same sequence of the RNG... :) A seed sequence of
rand() always generates RAND_MAX different random numbers.

Really? The documentation for my system doesn't say that. It simply says:
"The rand() function returns a pseudo-random integer between 0 and
RAND_MAX". Maybe your rand() does what you say... there is no requirement
for it to do so.

In any case, that's irrelevant, since x !=y does not imply x%100 != y%
100. Think about it - and try it:

#include <iostream>
#include <cstdlib>

int main()
{
const int N = 1000000;

int hits = 0;
for (int n=0;n<N;++n)
if ((std::rand()%100)==(std::rand()%100)) ++hits;

std::cout << "hits = " << hits << '\n';
}

On my system this outputs:

hits = 10195
 
O

Owen Jacobson

Because it is the same sequence of the RNG...  :)
A seed sequence of rand() always generates
RAND_MAX different random numbers.

No. It generates numbers between 0 and RAND_MAX. RAND_MAX has no
relationship to how long the period of the rand() RNG is.

Furthermore, even if you were right and it could only generate each
value once, mapping the RNG output to values [0, 100) means that many
(RAND_MAX/100) values from the RNG map to one value in the output, so
the expression rand() % 100 could very well generate '5' twice in a
row.
So within a sequence it cannot generate the same number twice.

False, or at best buggy. In fact, this is a dangerous property for an
RNG, and no respectable PRNG has it. If your system's rand() RNG
can't generate a value twice in the same period, report it to the
appropriate vendor's bug list and get it fixed.
Ie. rand() is predictable: if you generate RAND_MAX numbers
and store them somewhere then you can look there what random
number it will generate next.  :)

This is true: iff you know the period of your rng (the number of
samples it produces before repeating its list). However, the period
of rand() is allowed to be much larger than RAND_MAX -- in which case
it must, by simple counting logic, produce at least one value more
than once.
But this works only if you don't change the rand sequence
by calling srand() a second time.

I did some experiments and I come to the conclusion
that it is not NOT possible to let 1 RNG _correctly_ generate
2 independent random numbers.
One has to solve it cleverly via just 1 random number only
as was shown by Kai-Uwe Bux  (cf. his posting).

Indeed. It's better to understand the probability distribution you
want -- often you can achieve it with only a single RNG, even for more
complicated distributions. Multiple RNG samples aren't any "more
random" than a single sample.
 
K

Kai-Uwe Bux

Pete said:
On 2008-05-28 10:22:43 -0400, "bilgekhan" [snip]
I did some experiments and I come to the conclusion
that it is not NOT possible to let 1 RNG _correctly_ generate
2 independent random numbers.

If that is true, then the implementation of the RNG you are using sucks. A
good pseudo random number generator will pass tests for statistical
independence of subsequent returns.

Define "independent". <g>

As far as I can see, the OP just uses the ordinary concept of independent
random events/variables: knowing one does not help predicting the other.
Why do you ask? do you have any reason to suspect otherwise?


Best

Kai-Uwe Bux
 
L

Lionel B

Define "independent". <g>

I assume the OP intends "independent" in the (perfectly well-defined)
probability-theoretic sense.

I guess you could argue that it doesn't make much sense to talk about
independence of *pseudo* random numbers. In practice, however, you can
perfectly well look for statistics such as serial (sample) correlation
between successive random deviates generated by some PRNG - this is one
intuitive notion of "how independent" successive deviates are. This would
certainly be one (of many) tests routinely applied to gauge the quality
of a PRNG.

The whole question of "how random is a sequence of numbers" leads into
astonishingly deep and murky mathematical/philosophical waters (of which
I have little knowledge).
 
P

Pascal J. Bourguignon

bilgekhan said:
Have you tried this in a loop of say 1 million times and counted the hits?
I think theoretically there should be about 10,000 matches.
What result do you get?
Maybe 0 ? :)

Yes, that's what I'd expect. And that's why the OP could expect too,
"If [he] care so little about [his] randomness" so as not to read
the rand(3) man page.
 
B

bilgekhan

Pete Becker said:
The usual solution here is to compute the largest multiple of 100
that's less than RAND_MAX, and ignore any generated values that are
equal to or greater than that value:

int rand100()
{
const int max = (RAND_MAX / 100) * 100;
int tmp = rand();
while (max <= tmp)
tmp = rand();
return tmp % 100;
}


Hmm... this method has a drawback of slowing the program.
If RAND_MAX is say 32767 then the window in the routine above
is just 0.3% of the full range, so in 99.7% of the calls it will waste
extra calls to rand()...
 
L

Lionel B

Hmm... this method has a drawback of slowing the program. If RAND_MAX is
say 32767 then the window in the routine above is just 0.3% of the full
range, so in 99.7% of the calls it will waste extra calls to rand()...

No, if RAND_MAX is 32767 then (RAND_MAX / 100) * 100 = 32700, so you
waste only about 0.2% of calls.
 
B

bilgekhan

Pete Becker said:
On 2008-05-28 10:22:43 -0400, "bilgekhan" said:

If you know how long the period of the generator is, you can store that
many numbers, and then you can "predict" what the next value will be.
But RAND_MAX is not the period; it's the maximum value that will be
produced.

Ok, but it should be no problem to get the length of the period... :)
For example:
Store, say 256 rand() call results in an array.
Then infinetely call rand() and check after how many calls
it repeats the same sequence stored in the array.... :)
and then quit the loop.
A simpler way would be of course looking in the TM of the compiler/library...
(but OTOH who does RTFM ?... :)
Define "independent". <g>

Independent as so far as 2 consecutive calls to rand()
also generate twice the same random number as
theoretically would be expected in 10% of 1 million cases
when each time generating 2 numbers between 0 and 99.
That's 1e6 / 100 = 10000 = 10% of 1e6.

For achieving this independence one needs to switch
to another seed sequence. But my experiments show
that then the distribution is flawed as it deviates from
the theoretical expectation. I managed to get it within
+1.16% more matches by using the following ugly kludge:
srand((1U + rand()) * (1U + rand()) * 4U /* - 1 */);
as RAND_MAX on my machine is only 32767.
But then I discarded the idea of wanting to have just one RNG
generate such 2 independent random numbers.

Using just one RNG, and within the same period (srand sequence),
it is not possible to get the theoretically expected _correct_ number
of matches. Ie. one would need two identical but independent RNGs
and then both initialized with different seed values.
 
J

James Kanze

[...]
This is ok as long as you don't call this routine again so
quickly that time(0) returns the same value as for the last
call.

It's somewhat implicit in what you just said, but you
*shouldn't* use this if you need independent random sequences on
different machines---the programs could be started within the
same second (and in a lot of applications, typically will be).

I generally use /dev/random if it's available, and a mixture of
time(), pid() and the machine's MAC address (or failing that,
its IP address) otherwise.
This can be a bad idea, particularly for so-called "linear
congruential" rngs (and there's a fair chance your `rand()'
will be linear congruential). The reason for this is that
mod-ing a number basically involves using just the lower-order
bits, which for many rngs - and linear congruential ones in
particular - are frequently the "least random".
Safer is something along the lines of:
int r1 = int(100.0*double(rand())/double(RAND_MAX));
that is, you generate a uniform(ish) random double in the
range [0,1), multiply it by 100 and take the integer part.
This gives you a uniform (ish) random int in the range [0,99).
It's safer because it relies more heavily on the higher-order
bits of the number returned by your rng.

Actually, just he reverse is true. While some linear congruent
generators aren't very random in the low order bits, the good
ones are, and the high order bits are never really very random:
a very small number is always followed by a very large one.
 
J

Jerry Coffin

[ ... ]
Because it is the same sequence of the RNG... :)
A seed sequence of rand() always generates
RAND_MAX different random numbers.

False. It usually _will_, but there's no guarantee of it.
So within a sequence it cannot generate the same number twice.

Even more thoroughly false. I've cut out a bunch more statements, but
they all reflect the same misconception.

You're assuming that RAND_MAX reflects not only the range, but also the
period of the PRNG. While the standard's definition is loose enough to
_allow_ that, it's most assuredly not required. I'd add that, IMO, such
an implementation is _far_ from ideal.

From a practical viewpoint, having the range and the period of the
generator equal appears to be fairly unusual. For example, consider the
following code:

#include <stdlib.h>
#include <iostream>

int main() {
srand(1);

int i;

for (i=0; i<9; i++)
std::cout << rand() << "\t";
std::cout << "\n";

for (; i<RAND_MAX; i++)
rand();

for (i=0; i<9; i++)
std::cout << rand() << "\t";
std::cout << "\n";
return 0;
}

According to your statements, the two lines should be identical (with a
possible offset). With the compilers I have handy, that's not the case.
Here's the output from VC++ 7.1:

41 18467 6334 26500 19169 15724 11478 29358 26962
29246 17262 12374 18983 5450 31630 32102 30438 30332

With g++ (4.3.1), these are the results:

1481765933 1085377743 1270216262 1191391529
812669700553475508 445349752 1344887256 730417256

1039805031 1083326686 1331336834 716584787
2033177353934031309 1079589791 1155192414 1372256943

In neither case does repetition appear after producing RAND_MAX numbers.
For most things, I also test with Comeau, but it normally uses the back-
end compiler's implementation of rand(), so it would produce results
identical to the back-end compiler's.
 
J

Jerry Coffin

[ ... ]
While some linear congruent
generators aren't very random in the low order bits, the good
ones are, and the high order bits are never really very random:
a very small number is always followed by a very large one.

Hmmm...neither of the compilers I have handy seems to suffer from either
of these problems. For example, consider the following two sequences
generated by VC++ 7.1:

41 18467 6334 26500 19169 15724 11478 29358 26962
29246 17262 12374 18983 5450 31630 32102 30438 30332

I see no particular pattern of a particularly large number following a
small one. To help make possible problems in the low bits more apparent,
here's the same sequence printed in hex:

29 4823 18be 6784 4ae1 3d6c 2cd6 72ae 6952
723e 436e 3056 4a27 154a 7b8e 7d66 76e6 767c

At least right off, I don't see any obvious patterns on the low bits
either (though I'll openly admit I haven't tried to do any sort of
extensive analysis).
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,230
Members
46,819
Latest member
masterdaster

Latest Threads

Top