RFC: clc-compliant pseudo-random number generator

I

Ian Woods

Well, no. It's just that by looking at the first bytes, you can get
some hints at what the seed (key) was. In addition, if you have
enough initial outputs from related seeds, well, you can figure out
what the seeds are.

This is true for every PRNG since they're deterministic. The question is
how hard is it given any subsequence of output given the algorithm - you
only need to determine the seed at any single point in the sequence to know
the whole sequence. Unless you're doing some behind-the-scenes magic with
the seed, I don't see how throwing away 'n' numbers helps you. Could you
explain further?

<snip>

Ian Woods
 
H

Hallvard B Furuseth

Peter said:
Why not change the size_t SIZE parameter to unsigned and, if KEY is
null and SIZE is non zero, seed on SIZE...?
(...)
That way, there is a simple mechanism for seeding by direct integers.

That _almost_ lets srand(seed) be implemented as prng_seed(0, seed).
srand(0) fails, however. So I think a prng_seed_uint() function is
needed in any case.

Also,
key = &size;
size = sizeof(size);

That does not give a repeatable sequence of pseudo-random numbers if
size has a padding bit which can be either 0 or 1. I'm afraid the
key needs to be built 'by hand':

unsigned char integer_key[sizeof size];
integer_key[0] = size;
#if UINT_MAX > UCHAR_MAX
{
int i = 1;
do {
integer_key[i++] = size >>= CHAR_BIT;
} while (i < (int)sizeof size);
}
#endif
key = &integer_key;
size = sizeof size;

The #if is necessary because `size >>= CHAR_BIT' is invalid if size only
has CHAR_BIT value bits.
 
D

Dan Pop

In said:
In message <[email protected]>


Ok, thanks for the DeathStation programming advice.

On a more practical note, what WOULD a reasonable value for a compiler in C90
mode to set __STDC_VERSION__ to, if it wanted to set it? (One reason for
wanting to set it might be to avoid warnings about it being an undefined
identifier treated as 0 in #if).

Presumably 1990xxL would make sense, but what is "xx"? Which month was C90
published (or reached whichever stage was equivalent to 199409/199901)? I
suppose 199000L would be safe whatever.

What if the implementor decided to include the full adoption date in
the definition of __STDC_VERSION__, so that it would become 1990mmddL?
Since there was no formal specification of the __STDC_VERSION__
macro at the time, such a decision is not a priori unreasonable.

Dan
 
D

Dan Pop

In said:
Only an actively malicious compiler would do so.

Wrong! See my previous post in the thread.
I'm not too interested in those.

Undefined behaviour is undefined behaviour in clc. Once you ignore it,
any hope of clc-compliance is gone. The standards we apply to Tisdale
must be valid for Pfaff, too.

Dan
 
K

Kevin Bracey

What if the implementor decided to include the full adoption date in
the definition of __STDC_VERSION__, so that it would become 1990mmddL?

The horse is dead, you can stop flogging now. Do you have a sensible answer
to the actual question asked?
 
D

Dan Pop

In said:
In message <[email protected]>


The horse is dead, you can stop flogging now. Do you have a sensible answer
to the actual question asked?

Methinks the actual question asked is off topic in this newsgroup.
If you need c.s.c, you should know where to find it.

BTW, wasn't I supposed to be plonked by you (due to my mental problems)?

Dan
 
B

Ben Pfaff

Wrong! See my previous post in the thread.
What if the implementor decided to include the full adoption date in
the definition of __STDC_VERSION__, so that it would become 1990mmddL?
Since there was no formal specification of the __STDC_VERSION__
macro at the time, such a decision is not a priori unreasonable.

Why would the implementor do so when C95 used 199409L, C99 used
199901L, and no other standard or (as far as I know) standard
draft defined __STDC_VERSION__ at all?
 
O

Old Wolf

On BCC 5.5, Ben's PRNG does not compile, whereas the rand()
This refers to "if (UCHAR_MAX == 255)" . I prefer to use #if
for comparisons like this that can be evaluated at compile-time
(I prefer not to turn off those compiler warnings, as sometimes
they show up genuine logic errors). Obviously this is just
a matter of style, unless you have a compiler with a really bad optimizer.
Bizarre. Can this be anything other than a bug in Borland C++?

From float.h;
#define DBL_MAX _max_dble
extern double _RTLENTRY _EXPDATA _max_dble;

[_RTLENTRY _EXPDATA is the calling convention for linking the C runtime
library, so I would suppose that is where _max_dble lives].

LDBL_MIN, FLT_MAX, LDBL_MAX are similarly declared (although
DBL_MIN, FLT_MIN, *_EPSILON, *_MAX_EXP, etc. are #defined as constants)

I'd say the compiler error is because in C (unlike C++),
static variables must be initialized to constants. It would be easy
to modify your code to work around this.

Does the Standard say that DBL_MAX must be a compile-time constant?
 
B

Ben Pfaff

This refers to "if (UCHAR_MAX == 255)" . I prefer to use #if
for comparisons like this that can be evaluated at compile-time
(I prefer not to turn off those compiler warnings, as sometimes
they show up genuine logic errors). Obviously this is just
a matter of style, unless you have a compiler with a really bad optimizer.

#if's are ugly, so I prefer to avoid them when it's reasonable.
Bizarre. Can this be anything other than a bug in Borland C++?

[...]

Does the Standard say that DBL_MAX must be a compile-time constant?

Yes. From C99 5.2.4.2.2:

The values given in the following list shall be replaced by
constant expressions with implementation-defined values that
are greater than or equal to those shown:

- maximum representable finite floating-point number, (1 - b-p)bemax
FLT_MAX 1E+37
DBL_MAX 1E+37
LDBL_MAX 1E+37
 
E

Eric Sosman

Old said:
[unattributed:]
On BCC 5.5, Ben's PRNG does not compile, whereas the rand()
provided takes 0.95 nanoseconds.
[...]
Error E2063 prng.c 244: Illegal initialization in function prng_get_double_normal
*** 1 errors in Compile ***

The error was at the line:
static double next_normal = DBL_MAX;

Does the Standard say that DBL_MAX must be a compile-time constant?

Section 5.2.4.2.2 paragraph 9:

The values given in the following list shall be
replaced by constant expressions [...]
FLT_MAX [...]
DBL_MAX [...]
LDBL_MAX [...]

It appears the compiler is broken or is being used in a
non-conforming mode.
 
P

Peter Nilsson

Hallvard B Furuseth said:
Peter said:
Why not change the size_t SIZE parameter to unsigned and, if KEY is
null and SIZE is non zero, seed on SIZE...?
(...)
That way, there is a simple mechanism for seeding by direct integers.

That _almost_ lets srand(seed) be implemented as prng_seed(0, seed).
srand(0) fails, however. So I think a prng_seed_uint() function is
needed in any case.

Also,
key = &size;
size = sizeof(size);

That does not give a repeatable sequence of pseudo-random numbers if
size has a padding bit which can be either 0 or 1. I'm afraid the
key needs to be built 'by hand':

unsigned char integer_key[sizeof size];
integer_key[0] = size;
#if UINT_MAX > UCHAR_MAX
{
int i = 1;
do {
integer_key[i++] = size >>= CHAR_BIT;
} while (i < (int)sizeof size);
}
#endif
key = &integer_key;
size = sizeof size;

The #if is necessary because `size >>= CHAR_BIT' is invalid if size only
has CHAR_BIT value bits.

I prefer avoid such #if-s wherever reasonably possible, as just look
ugly (IMHO).

unsigned char integer_key[sizeof size];
size_t i;

integer_key[0] = size;
for (i = 1; i < sizeof size; i++)
{
size = size >> CHAR_BIT - 1 >> 1;
integer_key = size;
}

key = &integer_key;
size = sizeof integer_key;

Good compilers should optimise this.

But, I did note that time_t conversion suffers the same problem, and I
don't think that's quite as easy to fix, at least not if you want to
consider aberrant implementations.
 
H

Hallvard B Furuseth

Peter said:
I prefer avoid such #if-s wherever reasonably possible, as just look
ugly (IMHO).

size = size >> CHAR_BIT - 1 >> 1;

I don't think obfuscating the shift is any prettier. In particular
since you now need a comment to explain why you are doing it.
But, I did note that time_t conversion suffers the same problem, and I
don't think that's quite as easy to fix, at least not if you want to
consider aberrant implementations.

I think that's OK. Initialization from time() isn't supposed to be
repeatable, so it shouldn't matter if two initializations with the
same time_t can give different results.
 
E

Eric Sosman

Ben said:
[a PRNG with assorted convenience functions, one of which is]

/* Returns a pseudo-random floating-point number from the uniform
distribution with range [0,1). */
double
prng_get_double (void)
{
for (;;)
{
unsigned long ulng = prng_get_ulong ();
if (ulng != ULONG_MAX)
return (double) ulng / ULONG_MAX;
}
}

I think it would be simpler, possibly faster, and
probably just as good to replace the function body with
this one-liner:

return prng_get_ulong() / (ULONG_MAX + 1.0);

However, I'm not sure "just as good" is "best." The
code as given (and as modified) uses an unsigned long's
worth of pseudo-random bits no matter how wide or narrow
the double's mantissa might be. Perhaps it would be better
to choose the bit count based on DBL_MANT_DIG instead.
This should be easy if FLT_RADIX is a power of two; if you
want to cater to base ten or to the DeathStation 9000 it
might be trickier but I think it could be done without too
great a run-time speed penalty.
 
A

Arthur J. O'Dwyer

Ben Pfaff said:
(e-mail address removed) (Dan Pop) writes:

/* Take advantage of inlining if possible. */
#if __STDC_VERSION__ >= 199901L

Any use of __STDC_VERSION__ invokes undefined behaviour in a C90
implementation. Such an implementation can [...]
[...] define it as a macro with a value greater
than 199901L.

Only an actively malicious compiler would do so.

Wrong! See my previous post in the thread.
What if the implementor decided to include the full adoption date in
the definition of __STDC_VERSION__, so that it would become 1990mmddL?
Since there was no formal specification of the __STDC_VERSION__
macro at the time, such a decision is not a priori unreasonable.

Why would the implementor do so when C95 used 199409L, C99 used
199901L, and no other standard or (as far as I know) standard
draft defined __STDC_VERSION__ at all?

Possibly because the compiler writer in 1990 was not aware of
the precedents set by the C standards committee in 1994 and 1999?

After all, the idea to incorporate __STDC_VERSION__ must have
come from somewhere, so it's not terribly implausible that someone
might have used it in an existing compiler. Pretty unlikely, yes.
In this case, I think it would make more sense to simply leave
out the tiny 'inline' efficiency thing, or else add a real macro
along the lines of HAS_INLINE, and let the user #define it if he
wants it.

-Arthur
 
P

Peter Nilsson

Hallvard B Furuseth said:
I don't think obfuscating the shift is any prettier. In particular
since you now need a comment to explain why you are doing it.

It doesn't _need_ a comment per se, but I'd say that, if anything...

#if UINT_MAX > UCHAR_MAX

....probably warrants a similar comment. The same set of people will
likely be scratching their heads as to why we bothered writing these
lines.
I think that's OK. Initialization from time() isn't supposed to be
repeatable, so it shouldn't matter if two initializations with the
same time_t can give different results.

Yup, I realised that about 15 minutes after posting. Mind you, I have
had at least one occasion to rewind the system clock to trace a
program fault.
 
S

Scott Fluhrer

Ian Woods said:
This is true for every PRNG since they're deterministic. The question is
how hard is it given any subsequence of output given the algorithm - you
only need to determine the seed at any single point in the sequence to know
the whole sequence. Unless you're doing some behind-the-scenes magic with
the seed, I don't see how throwing away 'n' numbers helps you. Could you
explain further?

Certainly. For one, the seed you supply RC4 is not simply the initial
state. Instead, there is a procedure (in Ben's code, the logic resides in
the function prng_seed) to convert that initial seed into an initial state.
This procedure is known to produce biases in the initial state, and these
biases produce biases in the initial output -- not very strong ones,
granted, but strong enough to be cryptographically interesting. When you
run the procedure to generate outputs (prng_get_octet in Ben's code), this
tends to smear out the biases, and thus the artifacts are limited to the
first couple initial outputs.

In addition, things get more interesting if you allow someone to view the
initial output of multiple related seeds. For example, if you give an
attacker the first octet output from the following seeds:

00 00 00 XXXXXXXX
00 00 01 XXXXXXXX
00 00 02 XXXXXXXX
etc...

then with enough of these outputs, the attacker can deduce values of the
common XXXXXXXX portion. Real protocols have been attacked because of this
observation, and this has made the cryptographical community quite wary of
the initial output of RC4.

Now, if you are just looking at creating a statistically good looking random
number generator, the biases from the first point are probably too weak to
really worry about, and the second point about related seeds is irrelevent.
However, cryptographically speaking, they are valid concerns.

(If you need references to any of the above, just ask)
 
D

Dan Pop

In said:
Why would the implementor do so when C95 used 199409L, C99 used
199901L, and no other standard or (as far as I know) standard
draft defined __STDC_VERSION__ at all?

Because the implementor may have done it in 1991, years before the feature
got standardised. The __STDC_VERSION__ name is fairly intuitive for such
a feature...

BTW, you're sounding more and more like Trollsdale in this subthread.

Dan
 
K

Kevin Bracey

Methinks the actual question asked is off topic in this newsgroup.
If you need c.s.c, you should know where to find it.

Heh. But if I went to comp.std.c, I'd probably be told it was off-topic there
too, as the C90 standard doesn't specify it :)
BTW, wasn't I supposed to be plonked by you (due to my mental problems)?

Yeah. My newsreader's filter kept vanishing though for some reason. Didn't
think it was worth investigating why. I had set it to a one month expiry
anyway, as I assumed you were just going through a bad patch.
 
H

Hallvard B Furuseth

Eric said:
Ben said:
[a PRNG with assorted convenience functions, one of which is]

/* Returns a pseudo-random floating-point number from the uniform
distribution with range [0,1). */
double
prng_get_double (void)
{
for (;;)
{
unsigned long ulng = prng_get_ulong ();
if (ulng != ULONG_MAX)
return (double) ulng / ULONG_MAX;
}
}

I think it would be simpler, possibly faster, and
probably just as good to replace the function body with
this one-liner:

return prng_get_ulong() / (ULONG_MAX + 1.0);

Floating-point arithmetic is inexact. For both the original version and
yours, there is no guarantee that the result will be < 1.0. The version
I suggested earlier should work:

for (;;)
{
double ret = (double) prng_get_ulong () / ULONG_MAX;
if (ret < 1)
return ret;
}
 
D

Dan Pop

In said:
In message <[email protected]>


Heh. But if I went to comp.std.c, I'd probably be told it was off-topic there
too, as the C90 standard doesn't specify it :)

You could avoid this by simply asking about the meanings of 01 in 199901L
and 09 in 199409L :)

Actually, the latter is documented at http://www.lysator.liu.se/c/na1.html

The macro __STDC_VERSION__ shall expand to 199409L. (The Normative
Addendum was formally registered with ISO in September 1994.)

So, you could skip this stage and go and ask when was C90 formally
registered with ISO (month and year). Then, you could figure out how a
C90 implementor with particularly good foresight should have defined his
__STDC_VERSION__ ;-)

Dan
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,137
Messages
2,570,797
Members
47,345
Latest member
tektheone

Latest Threads

Top