G
geo
The KISS (Keep-It Simple-Stupid) principle
appears to be one of the better ways to get
fast and reliable random numbers in a computer,
by combining several simple methods.
Most RNGs produce random integers---usually 32-bit---
that must be converted to floating point numbers
for applications. I offer here a posting
describing my latest version of a KISS RNG that
produces double-precision uniform random variables
directly, without need for the usual integer-to-float
conversion. It is an extension of a method I provided
for MATLAB, with details described in
G.Marsaglia and W.W.Tsang, The 64-bit universal RNG,
Statistics & Probability Letters, V66, Issue 2,2004.
Rather than combining simple lagged Fibonacci and Weyl
sequences, as in those versions, this latest version combines
a lag-1220 CSWB (Complimentary-Subtract-With-Borrow) sequence,
x[n]=x[n-1220-x[n-1190]-borrow \mod b=2^53, period >10^19462
with a lag-2 SWB (Subtract-With-Borrow) sequence,
z[n]=z[n-1]-z[n-2]-borrow mod b=2^53. period>10^30.
Details on how to maintain exact integer arithmetic
with only float operations are in the article above.
This double-precision RNG, called dUNI, has an immense period,
around 10^19492 versus a previous 10^61, requires just a few
lines of code, is very fast, some 70 million per second, and
---a key feature---behaves very well on tests of randomness.
The extensive tests of The Diehard Battery, designed for 32-bit
random integers, are applied to bits 1 to 32, then 2 to 33,
then 3 to 34,...,then finally bits 21 to 53 for each part of
the string of 53 bits in each double-precision float dUNI().
A C version is listed below, but as only tests of magnitude
and floating point addition or subtraction are required,
implementations in other languages could be made available.
I invite such from interested readers.
If you cut, paste, compile and run the following C code,
it should take around 14 seconds to generate 1000 million
dUNI's, followed by the output 0.6203646342357479.
------------------------------------------------------------
/*
dUNI(), a double-precision uniform RNG based
on the KISS (Keep-It-Simple-Stupid) principle,
combining, by subtraction mod 1, a CSWB
(Complimentary-Subtract-With-Borrow) integer sequence
x[n]=x[n-1220]-x[n-1190]-borrow mod b=2^53,
with a SWB (Subtract-With-Borrow) sequence
z[n]=z[n-2]-z[n-1]-borrow mod b=2^53.
Both implemented as numerators of double precision
rationals, t for CSWB, zy for SWB, each formed as
(integer mod 2^53)/2^53, leading to a returned KISS
value t-zw mod 1. All denominators floats of b=2^53.
The CSWB sequence is based on prime p=b^1220-b^1190+1,
for which the order of b mod p is (p-1)/72, so the CSWB
sequence has period >2^64653 or 10^19462.
The SWB-lag2 sequence z[n]=z[n-2]-z[n-1]-zc mod b=2^53,
is based on b^2-b-1 = 11*299419*24632443746239056514780519
with period 10*299418*(24632443746239056514780518/2) > 2^101.
cc is the CSWB and SWB borrow or carry value: cc=1.0/b;
*/
#include <stdio.h>
static double Q[1220];
static int indx=1220;
double dUNI()
{ /* Takes 14 nanosecs, Intel Q6600,2.40GHz */
const double cc=1.0/9007199254740992.0;
static double c=0.0,zc=0.0, /*current CSWB and SWB `borrow`*/
zx=5212886298506819.0/9007199254740992.0, /*SWB seed1*/
zy=2020898595989513.0/9007199254740992.0; /*SWB seed2*/
int i,j; double t; /*t: first temp, then next CSWB value*/
/* First get zy as next lag-2 SWB */
t=zx-zy-zc; zx=zy; if(t<0){zy=t+1.0; zc=cc;} else{zy=t; zc=0.0;}
/*Then get t as the next lag-1220 CSWB value */
if(indx<1220) t=Q[indx++];
else /*refill Q[n] via Q[n-1220]-Q[n-1190]-c, */
{ for(i=0;i<1220;i++)
{j=(i<30) ? i+1190 : i-30;
t=Q[j]-Q+c; /*Get next CSWB element*/
if(t>0) {t=t-cc; c=cc;} else {t=t-cc+1.0; c=0.0;}
Q=t;
} /* end i loop */
indx=1; t=Q[0]; /*set indx, exit 'else' with t=Q[0] */
} /* end else segment; return t-zy mod 1 */
return( (t<zy)? 1.0+(t-zy) : t-zy ) ;
} /* end dUNI() */
int main() /*You must provide at least two 32-bit seeds*/
{ int i,j; double s,t; /*Choose 32 bits for x, 32 for y */
unsigned long x=123456789, y=362436069; /*default seeds*/
/* Next, seed each Q, one bit at a time, */
for(i=0;i<1220;i++) /*using 9th bit from Cong+Xorshift*/
{s=0.0; t=1.0;
for(j=0;j<52;j++)
{t=0.5*t; /* make t=.5/2^j */
x=69069*x+123; y^=(y<<13);y^=(y>>17);y^=(y<<5);
if(((x+y)>>23)&1) s=s+t; /*change bit of s, maybe */
} /* end j loop*/
Q=s;
} /*end i seed loop, Now generate 10^9 dUNI's: */
for(i=0;i<1000000000;i++) t=dUNI();
printf("%.16f\n",dUNI());
}
/*
Output, after 10^9 random uniforms:
0.6203646342357479
Should take about 14 seconds,
e.g. with gcc -O3 opimizing compiler
--------------------------------------------------------
*/
/*
Fully seeding the Q[1220] array requires 64660
seed bits, plus another 106 for the initial zx
and zy of the lag-2 SWB sequence.
As in the above listing, using just two 32-bit
seeds, x and y in main(), to initialize the Q
array, one bit at a time, by means of the
resulting Congruential+Xorshift sequence
may be enough for many applications.
Applications such as in Law or Gaming may
require enough seeds in the Q[1220] array to
guarantee that each one of a huge set of
possible outcomes can appear. For example,
choosing a jury venire of 80 from a
list of 2000 eligibles would require at least
ten 53-bit seeds; choosing 180 from 4000 would
require twenty 53-bit seeds.
To get certification, a casino machine that could
play forty simultaneous games of poker must be
able to produce forty successive straight-flushes,
with a resulting minimal seed set.
Users can choose their 32-bit x,y for the
above seeding process, or develop their own
for more exacting requirements when a mere
set of 64 seed bits may not be enough.
Properties:
1. Simple and very fast, some 70 million double-precision
uniform(0,1] random variables/second, in IEEE 754 standard
form: 1 sign bit, 11 exponent bits, 52 fraction bits
plus the implied 1 leading the fraction part, making a
total of 53 bits for each uniform floating-point variate.
2. Period some 10^19492 (vs. the 10^61 of an earlier version),
G. Marsaglia and W.W. Tsang, "The 64-bit universal RNG",(2004)
Statistics and Probability Letters}, v66, no. 2, 183-187,
or an earlier version provided for Matlab.)
3. Following the KISS, (Keep-It-Simple-Stupid) principle,
combines, via subtraction, a CSWB sequence
x[n]=x[n-1220]-x[n-1190]-borrow mod 2^53
based on the prime p=b^1220-b^1190+1, b=2^53,
with a SWB lag-2 sequence z[n]=w[n-2]-z[n-1] -c mod 2^53.
All modular arithmetic on numerators of rationals
with denominators 2.^53.
4. Easily implemented in various programming languages,
as only tests on magnitude and double-precision floating-point
subtraction or addition required.
5. Passes all Diehard tests,
http://i.cs.hku.hk/~diehard/
As Diehard is designed for 32-bit integer input, all of its
234 tests are applied to bits 1..32, then 2..33, then 3..34,..,
then 22..53 of the 53 bits in each dUNI().
(To form 32-bit integer j from bits i..(i+31):
t=dUNI()*(1<<(i-1)); i=t; j=(t-i)*4294967296.0; )
All twenty-two choices for the 32-bit segments performed
very well on the 234 p-values from Diehard tests, using
the default 32-bit seeds x and y. You might try your own,
with possibly more extensive seeding of the Q array.
*/
George Marsaglia
appears to be one of the better ways to get
fast and reliable random numbers in a computer,
by combining several simple methods.
Most RNGs produce random integers---usually 32-bit---
that must be converted to floating point numbers
for applications. I offer here a posting
describing my latest version of a KISS RNG that
produces double-precision uniform random variables
directly, without need for the usual integer-to-float
conversion. It is an extension of a method I provided
for MATLAB, with details described in
G.Marsaglia and W.W.Tsang, The 64-bit universal RNG,
Statistics & Probability Letters, V66, Issue 2,2004.
Rather than combining simple lagged Fibonacci and Weyl
sequences, as in those versions, this latest version combines
a lag-1220 CSWB (Complimentary-Subtract-With-Borrow) sequence,
x[n]=x[n-1220-x[n-1190]-borrow \mod b=2^53, period >10^19462
with a lag-2 SWB (Subtract-With-Borrow) sequence,
z[n]=z[n-1]-z[n-2]-borrow mod b=2^53. period>10^30.
Details on how to maintain exact integer arithmetic
with only float operations are in the article above.
This double-precision RNG, called dUNI, has an immense period,
around 10^19492 versus a previous 10^61, requires just a few
lines of code, is very fast, some 70 million per second, and
---a key feature---behaves very well on tests of randomness.
The extensive tests of The Diehard Battery, designed for 32-bit
random integers, are applied to bits 1 to 32, then 2 to 33,
then 3 to 34,...,then finally bits 21 to 53 for each part of
the string of 53 bits in each double-precision float dUNI().
A C version is listed below, but as only tests of magnitude
and floating point addition or subtraction are required,
implementations in other languages could be made available.
I invite such from interested readers.
If you cut, paste, compile and run the following C code,
it should take around 14 seconds to generate 1000 million
dUNI's, followed by the output 0.6203646342357479.
------------------------------------------------------------
/*
dUNI(), a double-precision uniform RNG based
on the KISS (Keep-It-Simple-Stupid) principle,
combining, by subtraction mod 1, a CSWB
(Complimentary-Subtract-With-Borrow) integer sequence
x[n]=x[n-1220]-x[n-1190]-borrow mod b=2^53,
with a SWB (Subtract-With-Borrow) sequence
z[n]=z[n-2]-z[n-1]-borrow mod b=2^53.
Both implemented as numerators of double precision
rationals, t for CSWB, zy for SWB, each formed as
(integer mod 2^53)/2^53, leading to a returned KISS
value t-zw mod 1. All denominators floats of b=2^53.
The CSWB sequence is based on prime p=b^1220-b^1190+1,
for which the order of b mod p is (p-1)/72, so the CSWB
sequence has period >2^64653 or 10^19462.
The SWB-lag2 sequence z[n]=z[n-2]-z[n-1]-zc mod b=2^53,
is based on b^2-b-1 = 11*299419*24632443746239056514780519
with period 10*299418*(24632443746239056514780518/2) > 2^101.
cc is the CSWB and SWB borrow or carry value: cc=1.0/b;
*/
#include <stdio.h>
static double Q[1220];
static int indx=1220;
double dUNI()
{ /* Takes 14 nanosecs, Intel Q6600,2.40GHz */
const double cc=1.0/9007199254740992.0;
static double c=0.0,zc=0.0, /*current CSWB and SWB `borrow`*/
zx=5212886298506819.0/9007199254740992.0, /*SWB seed1*/
zy=2020898595989513.0/9007199254740992.0; /*SWB seed2*/
int i,j; double t; /*t: first temp, then next CSWB value*/
/* First get zy as next lag-2 SWB */
t=zx-zy-zc; zx=zy; if(t<0){zy=t+1.0; zc=cc;} else{zy=t; zc=0.0;}
/*Then get t as the next lag-1220 CSWB value */
if(indx<1220) t=Q[indx++];
else /*refill Q[n] via Q[n-1220]-Q[n-1190]-c, */
{ for(i=0;i<1220;i++)
{j=(i<30) ? i+1190 : i-30;
t=Q[j]-Q+c; /*Get next CSWB element*/
if(t>0) {t=t-cc; c=cc;} else {t=t-cc+1.0; c=0.0;}
Q=t;
} /* end i loop */
indx=1; t=Q[0]; /*set indx, exit 'else' with t=Q[0] */
} /* end else segment; return t-zy mod 1 */
return( (t<zy)? 1.0+(t-zy) : t-zy ) ;
} /* end dUNI() */
int main() /*You must provide at least two 32-bit seeds*/
{ int i,j; double s,t; /*Choose 32 bits for x, 32 for y */
unsigned long x=123456789, y=362436069; /*default seeds*/
/* Next, seed each Q, one bit at a time, */
for(i=0;i<1220;i++) /*using 9th bit from Cong+Xorshift*/
{s=0.0; t=1.0;
for(j=0;j<52;j++)
{t=0.5*t; /* make t=.5/2^j */
x=69069*x+123; y^=(y<<13);y^=(y>>17);y^=(y<<5);
if(((x+y)>>23)&1) s=s+t; /*change bit of s, maybe */
} /* end j loop*/
Q=s;
} /*end i seed loop, Now generate 10^9 dUNI's: */
for(i=0;i<1000000000;i++) t=dUNI();
printf("%.16f\n",dUNI());
}
/*
Output, after 10^9 random uniforms:
0.6203646342357479
Should take about 14 seconds,
e.g. with gcc -O3 opimizing compiler
--------------------------------------------------------
*/
/*
Fully seeding the Q[1220] array requires 64660
seed bits, plus another 106 for the initial zx
and zy of the lag-2 SWB sequence.
As in the above listing, using just two 32-bit
seeds, x and y in main(), to initialize the Q
array, one bit at a time, by means of the
resulting Congruential+Xorshift sequence
may be enough for many applications.
Applications such as in Law or Gaming may
require enough seeds in the Q[1220] array to
guarantee that each one of a huge set of
possible outcomes can appear. For example,
choosing a jury venire of 80 from a
list of 2000 eligibles would require at least
ten 53-bit seeds; choosing 180 from 4000 would
require twenty 53-bit seeds.
To get certification, a casino machine that could
play forty simultaneous games of poker must be
able to produce forty successive straight-flushes,
with a resulting minimal seed set.
Users can choose their 32-bit x,y for the
above seeding process, or develop their own
for more exacting requirements when a mere
set of 64 seed bits may not be enough.
Properties:
1. Simple and very fast, some 70 million double-precision
uniform(0,1] random variables/second, in IEEE 754 standard
form: 1 sign bit, 11 exponent bits, 52 fraction bits
plus the implied 1 leading the fraction part, making a
total of 53 bits for each uniform floating-point variate.
2. Period some 10^19492 (vs. the 10^61 of an earlier version),
G. Marsaglia and W.W. Tsang, "The 64-bit universal RNG",(2004)
Statistics and Probability Letters}, v66, no. 2, 183-187,
or an earlier version provided for Matlab.)
3. Following the KISS, (Keep-It-Simple-Stupid) principle,
combines, via subtraction, a CSWB sequence
x[n]=x[n-1220]-x[n-1190]-borrow mod 2^53
based on the prime p=b^1220-b^1190+1, b=2^53,
with a SWB lag-2 sequence z[n]=w[n-2]-z[n-1] -c mod 2^53.
All modular arithmetic on numerators of rationals
with denominators 2.^53.
4. Easily implemented in various programming languages,
as only tests on magnitude and double-precision floating-point
subtraction or addition required.
5. Passes all Diehard tests,
http://i.cs.hku.hk/~diehard/
As Diehard is designed for 32-bit integer input, all of its
234 tests are applied to bits 1..32, then 2..33, then 3..34,..,
then 22..53 of the 53 bits in each dUNI().
(To form 32-bit integer j from bits i..(i+31):
t=dUNI()*(1<<(i-1)); i=t; j=(t-i)*4294967296.0; )
All twenty-two choices for the 32-bit segments performed
very well on the 234 p-values from Diehard tests, using
the default 32-bit seeds x and y. You might try your own,
with possibly more extensive seeding of the Q array.
*/
George Marsaglia