Better Rand() Algorithim

Gordon Burditt · Sep 11, 2003

And 32 bits is nowhere near enough to shuffle a deck of 52 cards.

The point that I was trying to make was that you have to analyze where your
system is vulnerable to an attack.
You have correctly identified that for the code that I posted, it lies in
having only 32-bits of entropy for the seed. This makes an exhaustive search
possible. Using the Mersenne Twister with a 32-bits seed would have made no
difference.

So, it is not the design of the random number generator which is the problem
here.
Agreed.

Exhaustive search of 2**32 (approx. 4 billion) is doable on today's
computers. If you only show the first 5 cards face up, for each possibility
you would have to do at least 48 calls to rand + 48 mod operations. Now we
are taking it out of doing it real time. This would probably be safe for
non-commercial use, which is a what the OP was talking about.

Consider the possibility of a pre-computed table (on disk). Look
at the first card. Index into a table. This either tells you the
seed in use or which table to go to for further cards. Continue
with the next cards. This is likely to take 1 disk access per table
(7 max per hand). Once you have the seed, you can run the sequence
with the right seed (once, not once per 2**32). This can be done
much faster than inputting the cards into a laptop. Note that even
if you have incomplete tables, you'll either know the seed or know
you haven't got it, so you could bet accordingly. You could still
make money if you only know all the cards 25% of the time, and
otherwise play half-decently without cheating.

A 7-D array of the seeds for the first 7 cards takes about 2700 GB.
(That's a bit much for current laptops, but certainly not impossible
for a NFS-based storage farm and not at all unusual for a commercial
USENET news server.) Since the seed is often determined by the 6th
card or earlier, using tables that contain up to 52 elements of
either the seed or a pointer to the next table to try can leave out
entries for already-determined data. I'm not sure how much storage
this would save. A complete set of data for SIX cards would take
58GB, and I'm guessing that since you have a 70% chance of a unique
answer for 6 cards, with good techniques of sparse-array storage,
you could put the tables in around 58GB, which is within range for
today's laptops. Except for inputting the cards, I suspect this
could run faster than 1000 X real time even on a last-generation
laptop, where you're giving presumed humans reasonable time to
decide on their bet.

For commercial use I would use two 32-bits seeds. Only a tiny, tiny subset
of all shuffles would be possible. After all 64 is smaller than 226. But you
can't do an exhaustive search of 2**64 possibilities. If you do 3 billion
instructions per second, and each search takes just one instructions, it
would take more than 200 years. That is sufficient for practical security.
Use the first 32-bits seed to determine the initial order of the cards
(based on rank). The second 32-bits seed you use as before.

Be careful that the use of 2 32-bit seeds doesn't collapse into a lot less
than 2**64 possibilities. In pathological situations, it could leave
you no better off than 1 32-bit seed.

Gordon L. Burditt

Ernst Lippe · Sep 12, 2003

For a program in c, I need some random numbers for a system were people are
placing bets. This is not a commerical project btw. Generally, I tend to
rely on things from the standard library, because they're written by people
with skills far above mine. Hence, I've always used rand() and FALSELY
assumed it could produce unpredictable random numbers and could be used in
many situations even if security was an issue.

Firstly, lets assume for the moment I have a way to get a random and
unpredictable seed for a pseudo random number generator. Then is it
possible to get a reasonably secure shuffle in which the user cannot
predict all the players cards after seeing the first five or so in the
series?

Secondly, Is there a reasonable method for determing an unpredictatble
seed? I understand seeding from the timer is unsecure.

Finally, is there a better rand() facsimile in which the source code is
available in the public domain? One which would suit my needs.

First of all, when you are even slightly worried about attackers you
should use a cryptographical random number generator. Several replies
in this have proposed random number generators that are not
cryptographically strong, in most cases these are very easy to break.

I would strongly advise you to look at Yarrow
(http://www.counterpane.com/yarrow.html). The article gives a good
analysis of the problems with generating random numbers. You can also
find an Windows implementation of Yarrow for on the site, but I would
not recommend it. It does not fully implement the algorithm that is
described in the paper, and the code is not always very readable and
certainly not portable to other systems.

When your code runs on a Unix system that has /dev/random and/or
/dev/urandom, you should use these.

When you are developing for another platform you will probably
have to write it yourself.

A simple solution that is probably sufficiently secure for your
application is the following: Take a strong block cipher (AES) and use
it to encrypt a counter that is incremented after each time it is
used. An attacker will not be able to predict the outputs, unless he
is able to break the block cipher (highly unlikely) or has other means
to determine the key. A small practical problem for your
implementation is that the size of the counter is equal to the block
size (that is 16 bytes for AES).

This random number generator must be initialised by choosing a random
value for the key. Now this is the hard part. You should use a value
that cannot be predicted by an attacker. The easiest solution is to
find as many sources of unpredictable information as is practically
feasible. Most operating systems have internal statistical counters
that are hard to predict. Now put all the results in one large buffer
and apply a cryptographical hash function (SHA-1 or SHA-256).
Cryptographical hash functions have the nice properties that all bits
of the output depend on the entire input. So even when an attacker can
guess most of the inputs he won't be able to predict the outputs as
long as there are sufficient bits in the input that he can't guess.

It should not be too difficult to implement this, you can easily find
code on the Web that implements AES and SHA.

The system should be secure as long as an attacker is not able to
guess the initial key (that's why you should use as many sources as
possible) and if there is no way that he could steal the generated key
from the running program, e.g. by examining the memory or the swap
file.

For dealing the cards I would suggest that you simply use the
following algorithm.

Take a "deck" that initially contains all the cards.

At each step randomly select one of the cards from the
deck and remove that card from the deck.

This algorithm is obviously correct (unlike some of the other
proposals in this thread). For this algorithm you need a random number
generator that generates uniform numbers between 0 and n-1 where n <=
52. A few days ago I posted some code that converts between random
number generator with different ranges that you could use.

greetings,

Ernst Lippe

Robert I. Eachus · Sep 13, 2003

Ernst said:
Take a "deck" that initially contains all the cards.

At each step randomly select one of the cards from the
deck and remove that card from the deck.

This algorithm is obviously correct (unlike some of the other
proposals in this thread). For this algorithm you need a random number
generator that generates uniform numbers between 0 and n-1 where n <=
52. A few days ago I posted some code that converts between random
number generator with different ranges that you could use.

Bad idea. If you do this it is easy to show that some of the draws will
be biased, unless your PRNG state is evenly divisible by every number
between 2 and 52. (Yes, this only shows the exsistance of a small bias.
But it also means that there is no easy way to prove that there isn't
a large bias for some N.) A better approach is as follows. Select 52
random numbers and assign one to each card. Sort the deck using the
random values as a key.

This has two advantages. First, your random number generator need not
be unbiased. (Or more formally stated, the choice of permutations will
only depend on the independence of the values generated.) Second you
can use tests of randomness such as the runs test to validate your
generator. If the generator passes the runs test, all other sins, if
any, won't show up in your output. (Outside of a flawed sort algorithm.)
--
Robert I. Eachus

"As far as I'm concerned, war always means failure." -- Jacques Chirac,
President of France
"As far as France is concerned, you're right." -- Rush Limbaugh

Scott Nelson · Sep 14, 2003

Bad idea. If you do this it is easy to show that some of the draws will
be biased, unless your PRNG state is evenly divisible by every number
between 2 and 52.

If, as assumed, you have an RNG that generates uniform numbers
between 0 and n-1 where n <= 52, then there is no bias at all
with this method.

A better approach is as follows. Select 52
random numbers and assign one to each card. Sort the deck using the
random values as a key.

If you can generate duplicate values, then the method is biased.
If you can't generate duplicate values, then you'd need a generator
that can produce numbers between 0 and N-1 where N is as large as
the range of the generator.

This has two advantages. First, your random number generator need not
be unbiased. (Or more formally stated, the choice of permutations will
only depend on the independence of the values generated.)

Imagine that my RNG is me throwing a six sided die.
Are you really trying to assert that assigning the numbers
1 through 6 randomly to each card and then sorting will
produce an unbiased shuffle?

Scott Nelson <[email protected]>

Robert I. Eachus · Sep 14, 2003

Scott said:
If, as assumed, you have an RNG that generates uniform numbers
between 0 and n-1 where n <= 52, then there is no bias at all
with this method.

Definitely not true. Let's say for specificity that n = 100. How do you
assign the values 0..99 to cards so that the first card chosen is not
biased? The second card? For choosing the third card it is easy to see
several unbiased ways to do that. And so on.

If you can generate duplicate values, then the method is biased.

Actually, no. Assuming that the number of possible values generated by
the RNG is greater than or equal to 52, you can discard duplicates. In
practice if the number of values generated is much larger than 52, or if
the generator doesn't produce duplicates in short runs, you don't need
to do that. I use a BBS generator if use the generated values directly
there would be no chance of duplicates. But the generator is mapped to
double precision values between 0.0 and 1.0. The rounding to double
precision allows for the possibility of generating two identical values
in a short sequence. But if you were to check every list for duplicate
values--easy after the sorting--you might discard one sequence in every
2^48 or so.

Imagine that my RNG is me throwing a six sided die.
Are you really trying to assert that assigning the numbers
1 through 6 randomly to each card and then sorting will
produce an unbiased shuffle?

No. But if you throw the die three times per card (giving 216 possible
values) and eliminate duplicates by either method above, it won't matter
how loaded your die is.

--
Robert I. Eachus

"As far as I'm concerned, war always means failure." -- Jacques Chirac,
President of France
"As far as France is concerned, you're right." -- Rush Limbaugh

Scott Nelson · Sep 14, 2003

Definitely not true. Let's say for specificity that n = 100. How do you
assign the values 0..99 to cards so that the first card chosen is not
biased? The second card? For choosing the third card it is easy to see
several unbiased ways to do that. And so on.

You're asking me to assume that n <= 52, and that n = 100.

I'm guessing that once you realize why Ernst chose to use
N and not a specific number, you'll realize that his method
is perfectly unbiased.

Actually, no. Assuming that the number of possible values generated by
the RNG is greater than or equal to 52, you can discard duplicates.

If you discard duplicates, then your method can't generate duplicates.

Discarding duplicates wasn't mentioned in your original method.

Bottom line:
You're assuming that Ernst suggested generating a random number
of a fixed sized.

He didn't.

He suggested first generating a random number from 0 to 51.
Then generating a random number from 0 to 50
Then generating a random number from 0 to 49
Then generating a random number from 0 to 48
Then generating a random number from 0 to 47
Then generating a random number from 0 to 46
Then generating a random number from 0 to 45
Then generating a random number from 0 to 44
Then generating a random number from 0 to 43
Then generating a random number from 0 to 42

And so on.

Scott Nelson <[email protected]>

Ernst Lippe · Sep 15, 2003

Bad idea. If you do this it is easy to show that some of the draws will
be biased, unless your PRNG state is evenly divisible by every number
between 2 and 52.

Did you actually look at the code that I posted?
(Somehow I have the feeling that you did not)

If you have found bugs I would prefer that you post a more detailed
follow-up to that post.

(Yes, this only shows the exsistance of a small bias.
But it also means that there is no easy way to prove that there isn't
a large bias for some N.)

The procedure that I proposed does not have any bias when the
original inputs are indeed uniformly distributed.

I believe that may be referring to the "traditional" way of converting
a uniform random number r with range 0..m-1 to a random number with
range 0..n-1 by computing r mod n. But for this procedure it is easy
to prove that the bias is at most 1/m.

A better approach is as follows. Select 52
random numbers and assign one to each card. Sort the deck using the
random values as a key.

This has two advantages. First, your random number generator need not
be unbiased. (Or more formally stated, the choice of permutations will
only depend on the independence of the values generated.)

But how do you handle the case where two cards are assigned the
same random number? (I expect that many programmers will make the
wrong decision in this case)

Second you
can use tests of randomness such as the runs test to validate your
generator. If the generator passes the runs test, all other sins, if
any, won't show up in your output. (Outside of a flawed sort algorithm.)

These tests are pointless for the algorithm that I proposed.

My algorithm starts with encrypting a counter with a block cipher.
You don't need any statistical tests here, you should only test if the
block cipher implementation is correct and if the counter is always
incremented correctly. Statistical tests of the outputs are pointless,
because any statistical significant results would imply that the
underlying block cipher is broken

The second step is the conversion of this random number to a smaller
range 0..n-1. You should prove that the implementation is correct, my
implementation already contained invariants that could be used as a
basis for an even more rigorous proof for those who really want such
proofs. The only tests that you need is a simple test that all the
possible outputs occur with the same frequency.

Testing the shuffle algorithm is also very straightforward because it
is extremely likely that any errors in its implementation will show up
as duplicated cards in the hands that are dealt.

greetings,

Ernst Lippe

Ernst Lippe · Sep 15, 2003

Actually, no. Assuming that the number of possible values generated by
the RNG is greater than or equal to 52, you can discard duplicates. In
practice if the number of values generated is much larger than 52, or if
the generator doesn't produce duplicates in short runs, you don't need
to do that. I use a BBS generator if use the generated values directly
there would be no chance of duplicates. But the generator is mapped to
double precision values between 0.0 and 1.0.

(Just a side note, I am a bit surprised about your use of BBS. There
is a rigorous proof that it contains at least one cryptographically
secure bit. It is also possible to use a few more bits under certain
circumstances. Do you have any security proof for the large number of
bits that you are using?)

The rounding to double
precision allows for the possibility of generating two identical values
in a short sequence. But if you were to check every list for duplicate
values--easy after the sorting--you might discard one sequence in every
2^48 or so.

No. But if you throw the die three times per card (giving 216 possible
values) and eliminate duplicates by either method above, it won't matter
how loaded your die is.

If I understand you correctly the first method that you propose
is to throw away a random value, if you had already drawn the
same value earlier in the sequence. This method is biased.

Imagine that the dice are heavily loaded in such a way that 1 occurs
very frequently. Then obviously they is a high probabilty that the
random number that is assigned to the first card is 3. But this means
that the probability that this card will also be the first card that
is dealt is at least as high as the probability that you throw 3 on
the first throw.

The root of this problem is of course, that, when you eliminate
duplicates one at a time, the random numbers in your output are no
longer independent.

greetings,

Ernst Lippe

Matthew Montchalin · Sep 23, 2003

On 10 Sep 2003, Gordon Burditt wrote:
|>Your compiler's rand may have weaknesses. The question is, when properly
|>seeded, can a would-be attacker exploit the weaknesses without doing an
|>exhaustive search. If an exhaustive search is needed, then no other random
|>number generator with the same size seed would be any better.
|
|And 32 bits is nowhere near enough to shuffle a deck of 52 cards.
|You need at least 226 bits (log base 2(52!) = 225.581 bits) of
|random seed for a random shuffle.

Isn't it even worse if it used for fortune telling, and the orientation
of the cards (face up, yet is it 'up' or 'down' for the face cards)?

If the random function has to entertain the possibility that face cards
may be turned up, and yet are upside-down, would that make it 226! bits?

Michael Amling · Sep 23, 2003

Matthew said:
On 10 Sep 2003, Gordon Burditt wrote:
|>Your compiler's rand may have weaknesses. The question is, when properly
|>seeded, can a would-be attacker exploit the weaknesses without doing an
|>exhaustive search. If an exhaustive search is needed, then no other random
|>number generator with the same size seed would be any better.
|
|And 32 bits is nowhere near enough to shuffle a deck of 52 cards.
|You need at least 226 bits (log base 2(52!) = 225.581 bits) of
|random seed for a random shuffle.

Isn't it even worse if it used for fortune telling, and the orientation
of the cards (face up, yet is it 'up' or 'down' for the face cards)?

If the random function has to entertain the possibility that face cards
may be turned up, and yet are upside-down, would that make it 226! bits?

More like 52! * 2**12, requiring 226+12 bits of randomness.

--Mike Amling

Robert I. Eachus · Sep 23, 2003

Ernst said:
> (Just a side note, I am a bit surprised about your use of BBS. There
> is a rigorous proof that it contains at least one cryptographically
> secure bit. It is also possible to use a few more bits under certain
> circumstances. Do you have any security proof for the large number of
> bits that you are using?)

I am not using a large number of bits. If you generate 52 values and
sort the cards based on those values you use just under 6 bits per value
on average. The BBS generator is secure if you use less than log2 N
values, so a 64-bit N would suffice for cryptographic level security.

But there is a bit of procedural knowledge there which needs to be taken
into account. If you know N but not the factors of N, and the final
state of the generator, such knowledge is relatively worthless.
However, if you know N and the initial state of the generator, it is
easy to "predict" the sequence. In the game setting proposed, this
means that you have to be very careful both that the initial state
cannot be predicted, and the initial state cannot be forced by external
agents to a given value.

Of course, for any PRNG, allowing an adversary to force the intial
generator state or to know the intial generator state is bad.

> If I understand you correctly the first method that you propose is to
> throw away a random value, if you had already drawn the same value
> earlier in the sequence. This method is biased.

Two corrections. I did not propose to throw away duplicates. I just
showed that whatever you do with duplicates is irrelevant with the
generator I proposed. If you generate a pair of duplicates once every
few billion years any procedure that you use to disambiguate the pair is
reasonable. Using a sorting algorithm that 1) is known to produce some
order rapidly, and 2) it not "stable," in other words the algorithm may
reverse the order of identical values, seems acceptable to me. You may
feel differently.

But the real problem and it is a real problem is the amount of entropy
in your generator state. If your generator has less than 52! potential
initial states, there are deck sequences you won't produce. If the
number of initial states is not evenly divisible by 52!, then some deck
permutations will appear more often than others. What you will accept
in this case is again a judgement matter. For example, when generating
bridge hands, I feel that as long as the generator state--and initial
state--contain significantly more than log2((13!)^4/(52!)) bits of
entropy, I can sleep at nights. (It may not be possible to generate all
deck permutations, but you will probably be able to generate all
possible deals.)

> Imagine that the dice are heavily loaded in such a way that 1 occurs
> very frequently. Then obviously they is a high probabilty that the
> random number that is assigned to the first card is 3. But this means
> that the probability that this card will also be the first card that
> is dealt is at least as high as the probability that you throw 3 on
> the first throw.
>
> The root of this problem is of course, that, when you eliminate
> duplicates one at a time, the random numbers in your output are no
> longer independent.

So if you discard duplicates, and the generator is biased, the resulting
sequence will be biased. Agreed. But as discussed above, if the number
of potential generator states is large enough the bias introduced is
substantially reduced. At the limit, as discussed above, you will be
able to play from now to the end of the universe, and still not have a
detectable amount of bias.

The selection based on selecting one from the remaining cards and
repeating on the other hand, does result in serious detectable bias if
the underlying generator is even slightly biased. This robustness of
the sorting approach is what I was addressing. Use your weighted dice
then check how many spades are in the first five cards. A great way to
detect bias, not a good way to generate random permutations.
--
Robert I. Eachus

Ryan gunned down the last of his third white wine and told himself it
would all be over in a few minutes. One thing he'd learned from
Operation Beatrix: This field work wasn't for him. --from Red Rabbit by
Tom Clancy.

Matthew Montchalin · Sep 23, 2003

Ernst Lippe · Sep 24, 2003

I am not using a large number of bits. If you generate 52 values and
sort the cards based on those values you use just under 6 bits per value
on average.

I assume that you compute your floating point random number by
dividing the internal state x_i by the modulus N. But when you use the
result for sorting, you effectively use the most significant bits of
x_i. I am only aware of proofs that show the cryptographical security
of the least significant bits. I would be surprised if you could give
a proof that your procedure is cryptographically secure. After all
when x_i < sqrt(N) it is certain that x_(i+1) > x_i. So, it would
seem that successive outputs are not independent.

The BBS generator is secure if you use less than log2 N
values, so a 64-bit N would suffice for cryptographic level security.

According to the HAC you can use c lg ln N least significant bits, but
unfortunately there are secure ranges for c known.

Also a 64-bit N should be easy to factorize, so I can't see why you
would expect any cryptographic security with such low N anyhow.

Two corrections. I did not propose to throw away duplicates.

Then please explain what "either method" for eliminating duplicates
is. I assumed that one method was to throw away the entire sequence
when it contains any duplicates and that the other was to throw
away any duplicates as soon as they were encountered.

But the real problem and it is a real problem is the amount of entropy
in your generator state. If your generator has less than 52! potential
initial states, there are deck sequences you won't produce. If the
number of initial states is not evenly divisible by 52!, then some deck
permutations will appear more often than others. What you will accept
in this case is again a judgement matter. For example, when generating
bridge hands, I feel that as long as the generator state--and initial
state--contain significantly more than log2((13!)^4/(52!)) bits of
entropy, I can sleep at nights. (It may not be possible to generate all
deck permutations, but you will probably be able to generate all
possible deals.)

But how can you then prove that your method will uniformly generate all
possible deals?

The selection based on selecting one from the remaining cards and
repeating on the other hand, does result in serious detectable bias if
the underlying generator is even slightly biased.

But the construction that I proposed should give you an unbiased RNG
whenever the underlying block cipher is cryptographically secure in
CTR mode.

Summarizing my main objections against your approach:
* Your proofs are not very rigorous
* BBS is a very slow random number generator, so I
would expect that your system does not have a very
high performance.

greetings,

Ernst Lippe

Robert I. Eachus · Sep 24, 2003

Ernst Lippe wrote:

(An excellent criticsm by the way...)

I assume that you compute your floating point random number by
dividing the internal state x_i by the modulus N. But when you use the
result for sorting, you effectively use the most significant bits of
x_i. I am only aware of proofs that show the cryptographical security
of the least significant bits. I would be surprised if you could give
a proof that your procedure is cryptographically secure. After all
when x_i < sqrt(N) it is certain that x_(i+1) > x_i. So, it would
seem that successive outputs are not independent.

First, the proof actually shows that any n bits (n <= log2 N) are
cryptographically as secure as anything that depends on not being able
to factor N. But there is a neat little lemma in here that you have to
see. If I know N and the state of the generator when it generated x_i,
I can, of course, generate x_i+1, x_i+2 and so on. But I can't generate
x_i-1 without the factors of N, or more formally if I can even guess one
bit of x_i-1 with probability 0.5 + epsilon without the factors, I can
use that fact to find the factors. So if you know x_i to be 42, of
course (for any reasonable N) you know that x_i+1 is 1764. But what is
x_i-1? You don't know. (If the state is 49 or any perfect square over
the integers, you do know the previous state. But let's get back to my
algorithm...)

If I assign the values to be sorted to the cards by taking x_i, x_i+1,
x_i+2... it looks as if there may be a problem. But what if I assign
the values x_i-1, x_i-2, x_i-3,... to the deck in reverse order, and
then sort? Now it is clear that even given N and x_i, you are better
off trying to factor N (assuming it is larger than 64 bits) than to
guess at a starting value and generate a sequence trying to find one
that ends at x_i, or trying to work out what the values must have been
from the ordering of the deck.

But the two deck orders are identical. All that has changed is the
value of i in x_i. I call this the symmetric property. As long as it
holds for the way you use the x_i from the BBS algorithm, you maintain
the other properties that BB&S provides.

Also a 64-bit N should be easy to factorize, so I can't see why you
would expect any cryptographic security with such low N anyhow.

I don't. My point was that above a minimal (and quite small) value of N
the number of bits I use will be less than lg2 N.

> According to the HAC you can use c lg ln N least significant bits, but
> unfortunately there are secure ranges for c known.

I don't understand this statement. I think that you meant to say that
unfortunately there are no secure ranges for c known. If so I agree.
But this argument is based on factoring N. Unless someone comes up with
a proof that N /= NP or that factoring is in P, we will sit in that
boat. And I expect that situation to last for a while.

Then please explain what "either method" for eliminating duplicates
is. I assumed that one method was to throw away the entire sequence
when it contains any duplicates and that the other was to throw
away any duplicates as soon as they were encountered.

Again, my actual method is to ignore duplicates, and use a non-stable
sort algorithm. Even a stable sort so that the first duplicate always
came before the second would not introduce significant bias. But a
non-stable sort can be chosen where the order of the duplicates depends
on the final order of the other cards.

But how can you then prove that your method will uniformly generate all
possible deals?

I can't. But I can prove that it is not possible to detect that fact
assuming less than e deals are examined, and e usually turns out to be
ridiculously large. Let me give a hypothetical approach to "cheating"
given this approach to dealing cards, by someone who knows the algorithm
and N:

1. Generate all possible starting seeds and count how often each
deal shows up.
2. Choose a betting scheme based on this information.

Without quantum computers, that is going to take some ridiculously large
number times the age of the universe, and storing the counts will take
more matter than is in the universe without entangling millions of bits
of data into each electron.

But there is a problem here! If the (hidden) generator that creates the
decks is never reset, the actual generator will only generate 1/4, 1/8,
or 1/16 of those deck orders. How do you know which fraction that is?
Factor N, or get a value from the actual generator, and use it to
generate your deck order map.

Again, from a rigorous mathematical standpoint, there are problems.
From a rigorous statistical standpoint, those problems don't exist. (If
you have a biased coin, but cannot detect whether the bias is to heads
or tails by flipping it a trillion times, do you care? Especially since
the coin will wear away to nothing before you get that far.)

But the construction that I proposed should give you an unbiased RNG
whenever the underlying block cipher is cryptographically secure in
CTR mode.

I understand your approach. I can send you programs that run your
approach and my approach with small decks, then run Chi-Square numbers
on long runs. I then do K-S tests on the results. When I have done
this, your approach fails, even if the underlying generator is
cryptographically secure. Now:

I cannot create an ironclad proof that your method should work. The
problem is the independence of the 52 (or whatever) generators, assuming
they all use the same underlying PRNG algorithm. The bias that I detect
toward less random distributions of Chi-Square is very small. But why
bother. My method doesn't have the independence problem, and the data
looks good.

Summarizing my main objections against your approach:
* Your proofs are not very rigorous

They are. They are not mathematically rigorous, they are statistically
rigorous. As I said, the proof approach I take is demonstrating that
any bias or pattern cannot be detected, given certain very broad
assumptions. (Factoring is hard, no generator is run for min(p,q)
iterations, etc.)

* BBS is a very slow random number generator, so I would expect that your
> system does not have a very high performance.

How many times have I heard that said based on slow implementations of
BB&S? The trick is to choose N correctly, which of course takes work,
and then don't square x_i then do the mod operation, compute x_i^2 mod N
in one operation. If you do both, the generator itself can be very
fast. For example, look for an N such that N is 2^j - k for relatively
large j and small k. This makes the modulus calculation of x_i^2 mod N
easy. There are other tricks. For example, X^2 mod N = ((x-N)* x) mod
N. The quantity (x-N)*x < 1/4 * N^2. Not a big deal you would think,
but it allows you to choose N > 2*k for some convenient k say 512, and
not need to ever store numbers greater than 2^512.

Oops, forgot the biggest trick of all. Represent x as a n digit number
base M where M = 2^k for some useful k like 64. Let's call these digits
x_j to distinguish from the x_i above. The partial products are of the
form x_j * x_j' * M^i for some i. If M^i is greater than N replace it
with M^i mod N. (And remember that due to the way we chose N, M^i mod N
will have lots of zero digits.) Accumulate all the partials in the ith
place, multiply by M^i mod N, and now add the partials together mod N.
If N is chosen correctly as above, the mod operations can be performed
by splitting x into parts A * 2^j and B, then computing B + A * k, then
if the result is greater than N, subtract N. For the additions, you
only need to compare to N and if necessary subtract. Sounds complex,
but it simplifies the arithmetic greatly.

Yeah, it trades days of computer time finding nice values of N for a
factor of 100 or so speedup in evaluating X*X mod N. But it is worth
it. (At least as useful as letting my machine work on an RSA challenge,
fold proteins, or looking for phone calls from ET.)
--
Robert I. Eachus

Ryan gunned down the last of his third white wine and told himself it
would all be over in a few minutes. One thing he'd learned from
Operation Beatrix: This field work wasn't for him. --from Red Rabbit by
Tom Clancy.

Ernst Lippe · Sep 26, 2003

Ernst Lippe wrote:

(An excellent criticsm by the way...)

First, the proof actually shows that any n bits (n <= log2 N) are
cryptographically as secure as anything that depends on not being able
to factor N.

I think that you are using the wrong formula here, both AC and
HAC state that n <= log2 k, where k is the bit length of N,
in other words n <= log2 log2 N.

Again, can you give a reference that it is safe to use the
most significant bits, because all proofs that I have seen
only refer to the least significant bits.

Even when you can find such a proof, there is still the problem that
you don't use a fixed number of bits. When you generate two numbers
that have the same value for the highest bits, the outcome of the
comparison is determined by the highest bit that is different in
both. But this bit position can be outside the set of secure
bits. Using the birthday limit, when you have s secure bits you can
expect to use an insecure bit in your comparisons when you generate
more than 2^(s/2) random numbers.

But when you adapt your sorting algorithm in such a way that it only
uses secure bits, you need a rather large amount of secure bits. When
you want to assign a random number to each card and want to have 50%
chance that there are no collisions you need at least 52^2 different
possible numbers (again the birthday limit), in other words each
number must be at least 12 bits. So for one deal you need 52*12=624
secure random bits. Now this is not very efficient because there are
only 52! different deals and log2 52! is somewhat less than 226. The
method I proposed can be implemented in such a way that it does not
require much more than 226 secure bits (it is probably not worth to do
it, because I also proposed a very efficient random number generator,
but with a much slower random number generator such as BBS it might be
necessary.)

I don't understand this statement. I think that you meant to say that
unfortunately there are no secure ranges for c known. If so I agree.

Yes, that's what I meant. I assume that the precise value of c depends
on the computational complexity of factoring which is still unknown.

I can't.

But with my method, it is obvious that it will uniformly generate all deals
when the random generators are uniform and independent.

I understand your approach. I can send you programs that run your
approach and my approach with small decks, then run Chi-Square numbers
on long runs. I then do K-S tests on the results. When I have done
this, your approach fails, even if the underlying generator is
cryptographically secure.

Please explain this. If you can do this, you can break AES,
because you have just described a method to distinguish AES
from a random permutation.

I cannot create an ironclad proof that your method should work. The
problem is the independence of the 52 (or whatever) generators, assuming
they all use the same underlying PRNG algorithm. The bias that I detect
toward less random distributions of Chi-Square is very small. But why
bother. My method doesn't have the independence problem, and the data
looks good.

Again if you can find any form of dependence between the outputs
you have broken the underlying block cipher.

They are. They are not mathematically rigorous, they are statistically
rigorous.

So, statistics is no longer part of (applied) mathematics?

How many times have I heard that said based on slow implementations of
BB&S? The trick is to choose N correctly, which of course takes work,
and then don't square x_i then do the mod operation, compute x_i^2 mod N
in one operation. If you do both, the generator itself can be very
fast. For example, look for an N such that N is 2^j - k for relatively
large j and small k. This makes the modulus calculation of x_i^2 mod N
easy. There are other tricks. For example, X^2 mod N = ((x-N)* x) mod
N. The quantity (x-N)*x < 1/4 * N^2. Not a big deal you would think,
but it allows you to choose N > 2*k for some convenient k say 512, and
not need to ever store numbers greater than 2^512.

Oops, forgot the biggest trick of all. Represent x as a n digit number
base M where M = 2^k for some useful k like 64. Let's call these digits
x_j to distinguish from the x_i above. The partial products are of the
form x_j * x_j' * M^i for some i. If M^i is greater than N replace it
with M^i mod N. (And remember that due to the way we chose N, M^i mod N
will have lots of zero digits.) Accumulate all the partials in the ith
place, multiply by M^i mod N, and now add the partials together mod N.
If N is chosen correctly as above, the mod operations can be performed
by splitting x into parts A * 2^j and B, then computing B + A * k, then
if the result is greater than N, subtract N. For the additions, you
only need to compare to N and if necessary subtract. Sounds complex,
but it simplifies the arithmetic greatly.

Yeah, it trades days of computer time finding nice values of N for a
factor of 100 or so speedup in evaluating X*X mod N. But it is worth
it.

It is certainly useful to have fast BBS implementations, but I am
still not convinced that you can get a performance that is anywhere
near that of a block cipher in CTR mode. Can you give any timings?

greetings,

Ernst Lippe

Robert I. Eachus · Sep 26, 2003

Ernst said:
I think that you are using the wrong formula here, both AC and
HAC state that n <= log2 k, where k is the bit length of N,
in other words n <= log2 log2 N.

If so, that is a result I haven't seen. I assume AC is Applied
Cryptography, and HAC is Handbook of Applied Cryptography.

Again, can you give a reference that it is safe to use the
most significant bits, because all proofs that I have seen
only refer to the least significant bits.

Then reread the original Blum, Blum and Shub paper. It shows that the
position you take the bits from doesn't matter--and why.

Even when you can find such a proof, there is still the problem that
you don't use a fixed number of bits. When you generate two numbers
that have the same value for the highest bits, the outcome of the
comparison is determined by the highest bit that is different in
both. But this bit position can be outside the set of secure
bits. Using the birthday limit, when you have s secure bits you can
expect to use an insecure bit in your comparisons when you generate
more than 2^(s/2) random numbers.

I could equally argue that the ordering of any two cards depends on ONE
bit. And if you look at the BBS paper, you would see that for the two
card case I am exactly right. But what is the right number to use for
pairwise orderings in a deck? Rather than use 52*51/2 I chose to use 52
per card which is an overstatement. I can see that your arguement is
that the randomness is not equally distributed among the cards. I
agree. But I think that argues for a maximum entropy resolution per
card of 51 for the first card drawn. The next question is whether that
means 51 bits or ln2 51 bits.

Let me propose a thought experiment. Break all the cards into two
groups by looking at the first (or if you prefer, last) bit. Now look
at the next bit and divide the cards again. Continue dividing each pile
in this manner. But if a bit doesn't divide a subset, don't use that
bit. ;-) In theory there will be cases where you use 51 bits from a
single number. But actually you don't. Assign all the entropy used to
the numbers from the smaller of the two groups. Worst case, your first
division splits the cards into two piles one with 51 cards one with 1.
I argue that in that case you have used ln2(51) bits of entropy. And as
you can see, that is the worst case.

Please explain this. If you can do this, you can break AES,
because you have just described a method to distinguish AES
from a random permutation.

If so, I don't understand your method. I assumed you were talking about
creating 51 different generators generating numbers from 0 to 51, 0 to
50, 0 to 49, and so on. But if you take values from a single generator
and convert them in an unbiased manner into the numbers you need it
doesn't help.

Let me give you a concrete example. Take three cards, say the Aces of
Clubs, Diamonds, and Hearts in that order, and two dice, say one red and
one white. Hold the cards in that order in your hand, Ace of clubs on
the left. Roll the dice. If the Red die is a one or a two, put the Ace
of Clubs face down on the table. If the Red die is a three or four, put
the Ace of Diamonds down again face down. If the Red die is a five or
six, you guessed it, the Ace of Hearts. Now if the White die is even,
put the leftmost card on top of the face down card, if it is odd, put
the rightmost card down. Finally put the remaining card on top.

You have just taken three cards and arranged them in one of 3! = six
possible orders. If you have some four sided or eight sided dice, you
can add another die and extend this. A ten-sided die, and another
ordinary die on top of that will allow you to extend the experiment to
six cards. But ignore that for now.

The bottommost card in your randomized deck depends only on the Red die
right? But the other two cards--whatever they are--landed in positions
that depended on both dice. Now, we can assume the numbers on the two
dice were independent, so this is fine. But what if there is a
correlation between the dice? For the moment assume the dice are
magnetized so that when you look at one die, it is random, but the
results are not indepentent. What happens with the cards. The bottom
card is still random, but the other cards are biased.

Now go get those other dice. Do you see where this is going? Choosing
a "random" deck order this way entangles more and more information into
the topmost card. If there is no bias, AND all the values are
independent, the topmost card collects most of the bias and interactions.

So I actually use this particular test as a very sensitive sniffer to
see if a particular generator has any bias or lack of independence
between N variates, where N is the size of the deck.

But back to your comment about AES. What you say may be true about this
technique applied to an AES-based generator breaking AES. I think that
is correct, but it is late at night. However, I have tried this with
SHA-1, and rejecting randomness there doesn't invalidate DSA.

And that is why I rail so strongly against the approach you advocate.
From experience with testing poor to very good PRNGs, my technique
works even with very poor PRNGs, the technique where you select from an
ever diminishing deck concentrates any deviations from true randomness
in the last few cards. For example, take my algorithm and replace the
RNG with 1/2 rand(), or sin(10 * rand()), or whatever and it still gives
similarly random results. (But there are some LCG generators that fail
even with my algorithm. Which as far as I am concerned tells you just
how bad LCG generators are. ;-)

Again if you can find any form of dependence between the outputs
you have broken the underlying block cipher.

So, statistics is no longer part of (applied) mathematics?

No, statistical rigor is different from mathematical rigor. In
statistics, if two distributions cannot be distinguished they are
identical. In math, two functions which always give the same result are
not necessarily identical, but they are equivalent.

It is a subtle distinction but it matters here. If I create a
notational universe limited in some ways, and in that universe a
particular distribution cannot be distinguished from a random
distribution, then in that universe, the distribution is random, even if
there are other universes where it is not random.

In this particular case, I use that freedom to define a universe which
looks suspiciously like ours. If the size of our universe increases by
a factor of 100 again soon, it won't bother me that much. (Inflation
theories.) But factoring in P would break everything.

It is certainly useful to have fast BBS implementations, but I am
still not convinced that you can get a performance that is anywhere
near that of a block cipher in CTR mode. Can you give any timings?

The right question to ask is how long will it take you to find a BBS
generator that is better than this block cipher? Give me some realistic
timings and we will see. (I think I can beat a software AES version
with about the same degree of cryptographic security. I'd have to think
about implementing things in hardware.)

But the real problem is that to get FAST versions of BBS takes copious
computer cycles to find useful values of p and q. (Picking a nice N
then factoring it will obviously take too long. ;-) You end up picking
the range you want N to be in, generating a p, then seeing if there is
a q that puts N in that range.

--
Robert I. Eachus

"Quality is the Buddha. Quality is scientific reality. Quality is the
goal of Art. It remains to work these concepts into a practical,
down-to-earth context, and for this there is nothing more practical or
down-to-earth than what I have been talking about all along...the repair
of an old motorcycle." -- from Zen and the Art of Motorcycle
Maintenance by Robert Pirsig

Ernst Lippe · Sep 27, 2003

If so, that is a result I haven't seen. I assume AC is Applied
Cryptography, and HAC is Handbook of Applied Cryptography.

Then reread the original Blum, Blum and Shub paper. It shows that the
position you take the bits from doesn't matter--and why.

Handbook of Applied Cryptography, note 5.14:
"The efficiency of the generator can be improved by extracting the j
least significant bits of xi in step 3.2, where j = c lg lg n and c is
a constant."

Applied Cryptography, Second Edition, p 418:
"According to [...] if n is the length of x_i, the least significant
log2 n bits of x_1 can be used."

I could not locate a copy of the BBS paper. All references to
BBS I have found only use either the lowest order bit
(which I believe was the original version by BBS) or the
least significant bits.

Are you certain that you are not confusing BBS with some other
generators for which such proofs do indeed exist?

I could equally argue that the ordering of any two cards depends on ONE
bit. And if you look at the BBS paper, you would see that for the two
card case I am exactly right. But what is the right number to use for
pairwise orderings in a deck? Rather than use 52*51/2 I chose to use 52
per card which is an overstatement. I can see that your arguement is
that the randomness is not equally distributed among the cards. I
agree. But I think that argues for a maximum entropy resolution per
card of 51 for the first card drawn. The next question is whether that
means 51 bits or ln2 51 bits.

Let me propose a thought experiment. Break all the cards into two
groups by looking at the first (or if you prefer, last) bit. Now look
at the next bit and divide the cards again. Continue dividing each pile
in this manner. But if a bit doesn't divide a subset, don't use that
bit. ;-) In theory there will be cases where you use 51 bits from a
single number. But actually you don't. Assign all the entropy used to
the numbers from the smaller of the two groups. Worst case, your first
division splits the cards into two piles one with 51 cards one with 1.
I argue that in that case you have used ln2(51) bits of entropy. And as
you can see, that is the worst case.

My objection was against the security argument for your construction.
Even when you can prove that any fixed group of k bits is secure,
that proof does not prove anything about the security of your algorithm
because under certain circumstances your algorithm will use bits
that are not part of any fixed group of k bits.
A proof that it is secure to use k bits does not prove anything
about the case when you use more than k bits, so you will have
to give an independent security proof for your algorithm.

If so, I don't understand your method. I assumed you were talking about
creating 51 different generators generating numbers from 0 to 51, 0 to
50, 0 to 49, and so on. But if you take values from a single generator
and convert them in an unbiased manner into the numbers you need it
doesn't help.

My proposal contained two steps:
* Generate a cryptographically secure random number with
a block cipher in CTR mode.
* Transform the outputs of this random number generator
into uniformly distributed numbers with a range of [0..n),
where n ranges from 1 to 52.

Because the underlying random number generator is cryptographically
secure this by definition means that its outputs are independent and
unbiased. If you can prove any form of biases or dependency this
implies that the underlying block cipher is not cryptographically
secure. Because most strong block ciphers (like AES) have been
carefully analyzed, it is extremely unlikely that you will
be able to break them.

The next step consists of converting the outputs of this
cryptographically secure random number generator into random numbers
with a different range. In my post with that algorithm I gave a sketch
for its correctness proof. Even when my specific conversion algorithm
is wrong, there are other algorithms that can perform the same task.

So I am confident that the method that is proposed is
cryptographically secure, which by definition means that also is
statistically (in your definition) valid.

And that is why I rail so strongly against the approach you advocate.
From experience with testing poor to very good PRNGs, my technique
works even with very poor PRNGs, the technique where you select from an
ever diminishing deck concentrates any deviations from true randomness
in the last few cards. For example, take my algorithm and replace the
RNG with 1/2 rand(), or sin(10 * rand()), or whatever and it still gives
similarly random results. (But there are some LCG generators that fail
even with my algorithm. Which as far as I am concerned tells you just
how bad LCG generators are. ;-)

But for these generators you can't give a proof that your
algorithm is cryptographically secure.

No, statistical rigor is different from mathematical rigor. In
statistics, if two distributions cannot be distinguished they are
identical. In math, two functions which always give the same result are
not necessarily identical, but they are equivalent.

Then you are using a non-standard definition of the equality
of functions. The traditional definition is that two functions
f and g are equal iff they have the same domain and for all x: f(x)=g(x).

The right question to ask is how long will it take you to find a BBS
generator that is better than this block cipher?
Give me some realistic
timings and we will see. (I think I can beat a software AES version
with about the same degree of cryptographic security. I'd have to think
about implementing things in hardware.)

But the real problem is that to get FAST versions of BBS takes copious
computer cycles to find useful values of p and q. (Picking a nice N
then factoring it will obviously take too long. ;-) You end up picking
the range you want N to be in, generating a p, then seeing if there is
a q that puts N in that range.

So, you don't have any timings for the BBS version that you proposed?

Even though I don't expect that its performance would be anywhere near
that of AES, a fast BBS implementation would be useful on its own.

greetings,

Ernst Lippe

filling array with rand numbers	41	Aug 4, 2013
I noticed something with rand and I'm looking for some explanation.	9	Mar 8, 2013
Range / empty list issues??	1	Dec 10, 2023
portable rand/srand across linux,windows,etc	6	Nov 15, 2011
Padding strings for a clean visual print out...	5	Dec 23, 2023
how to return rand()'s seed portably?	12	Apr 5, 2009
Question abour rand()	6	Dec 29, 2009
A question about program logic... program provided	20	Sep 19, 2013

Better Rand() Algorithim

Gordon Burditt

Ernst Lippe

Robert I. Eachus

Scott Nelson

Robert I. Eachus

Scott Nelson

Ernst Lippe

Ernst Lippe

Matthew Montchalin

Michael Amling

Robert I. Eachus

Matthew Montchalin

Ernst Lippe

Robert I. Eachus

Ernst Lippe

Robert I. Eachus

Ernst Lippe

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads