Velocity Reviews > C++ > Re: RNGs: A double KISS

# Re: RNGs: A double KISS

Dann Corbit
Guest
Posts: n/a

 04-14-2010
For reference and comparison, here is a "down and dirty" translation
into a C class.

I guess that some C++ guru will take a few minutes to shine the apple a
bit so that it can be genuinely useful.

/*
From: geo <(E-Mail Removed)>
Newsgroups: sci.math,comp.lang.c
Subject: RNGs: A double KISS
Date: Wed, 14 Apr 2010 03:41:05 -0700 (PDT)
Xref: eternal-september.org sci.math:94533 comp.lang.c:57850

The KISS (Keep-It Simple-Stupid) principle
appears to be one of the better ways to get
fast and reliable random numbers in a computer,
by combining several simple methods.

Most RNGs produce random integers---usually 32-bit---
that must be converted to floating point numbers
for applications. I offer here a posting
produces double-precision uniform random variables
directly, without need for the usual integer-to-float
conversion. It is an extension of a method I provided
for MATLAB, with details described in
G.Marsaglia and W.W.Tsang, The 64-bit universal RNG,
Statistics & Probability Letters, V66, Issue 2,2004.

Rather than combining simple lagged Fibonacci and Weyl
a lag-1220 CSWB (Complimentary-Subtract-With-Borrow) sequence,
x[n]=x[n-1220-x[n-1190]-borrow \mod b=2^53, period >10^19462
with a lag-2 SWB (Subtract-With-Borrow) sequence,
z[n]=z[n-1]-z[n-2]-borrow mod b=2^53. period>10^30.
Details on how to maintain exact integer arithmetic
with only float operations are in the article above.

This double-precision RNG, called dUNI, has an immense period,
around 10^19492 versus a previous 10^61, requires just a few
lines of code, is very fast, some 70 million per second, and
---a key feature---behaves very well on tests of randomness.
The extensive tests of The Diehard Battery, designed for 32-bit
random integers, are applied to bits 1 to 32, then 2 to 33,
then 3 to 34,...,then finally bits 21 to 53 for each part of
the string of 53 bits in each double-precision float dUNI().

A C version is listed below, but as only tests of magnitude
and floating point addition or subtraction are required,
implementations in other languages could be made available.
I invite such from interested readers.

If you cut, paste, compile and run the following C code,
it should take around 14 seconds to generate 1000 million
dUNI's, followed by the output 0.6203646342357479.
------------------------------------------------------------
*/
/*
dUNI(), a double-precision uniform RNG based
on the KISS (Keep-It-Simple-Stupid) principle,
combining, by subtraction mod 1, a CSWB
(Complimentary-Subtract-With-Borrow) integer sequence
x[n]=x[n-1220]-x[n-1190]-borrow mod b=2^53,
with a SWB (Subtract-With-Borrow) sequence
z[n]=z[n-2]-z[n-1]-borrow mod b=2^53.

Both implemented as numerators of double precision
rationals, t for CSWB, zy for SWB, each formed as
(integer mod 2^53)/2^53, leading to a returned KISS
value t-zw mod 1. All denominators floats of b=2^53.

The CSWB sequence is based on prime p=b^1220-b^1190+1,
for which the order of b mod p is (p-1)/72, so the CSWB
sequence has period >2^64653 or 10^19462.

The SWB-lag2 sequence z[n]=z[n-2]-z[n-1]-zc mod b=2^53,
is based on b^2-b-1 = 11*299419*24632443746239056514780519
with period 10*299418*(24632443746239056514780518/2) > 2^101.
cc is the CSWB and SWB borrow or carry value: cc=1.0/b;
*/

#include <cstdio>
#include <cstdlib>
#include <cassert>

class kiss2 {

private:
double Q[1220];
int indx;
double cc;
double c; /* current CSWB */
double zc; /* current SWB `borrow` */
double zx; /* SWB seed1 */
double zy; /* SWB seed2 */
size_t qlen;/* length of Q array */

public:
kiss2()
{
assert(sizeof (double) >= ;
cc = 1.0 / 9007199254740992.0; // inverse of 2^53rd power
int i;
size_t qlen = indx = sizeof Q / sizeof Q[0];
for (i = 0; i < qlen; i++)
Q[i] = 0;
double c = 0.0, zc = 0.0, /* current CSWB and SWB `borrow` */
zx = 5212886298506819.0 / 9007199254740992.0, /* SWB seed1 */
zy = 2020898595989513.0 / 9007199254740992.0; /* SWB seed2 */
int j;
double s, t; /* Choose 32 bits for x, 32 for y */
unsigned long x = 123456789, y = 362436069; /* default seeds * /
/* Next, seed each Q[i], one bit at a time, */
for (i = 0; i < qlen; i++)
{ /* using 9th bit from Cong+Xorshift */
s = 0.0;
t = 1.0;
for (j = 0; j < 52; j++)
{
t = 0.5 * t; /* make t=.5/2^j */
x = 69069 * x + 123;
y ^= (y << 13);
y ^= (y >> 17);
y ^= (y << 5);
if (((x + y) >> 23) & 1)
s = s + t; /* change bit of s, maybe */
} /* end j loop */
Q[i] = s;
} /* end i seed loop, Now generate 10^9 dUNI's: */
}

double dUNI()
{ /* Takes 14 nanosecs, Intel Q6600,2.40GHz */
int i, j;
double t; /* t: first temp, then next CSWB value */
/* First get zy as next lag-2 SWB */
t = zx - zy - zc;
zx = zy;
if (t < 0)
{
zy = t + 1.0;
zc = cc;
}
else
{
zy = t;
zc = 0.0;
}

/* Then get t as the next lag-1220 CSWB value */
if (indx < 1220)
t = Q[indx++];
else
{ /* refill Q[n] via Q[n-1220]-Q[n-1190]-c, */
for (i = 0; i < 1220; i++)
{
j = (i < 30) ? i + 1190 : i - 30;
t = Q[j] - Q[i] + c; /* Get next CSWB element */
if (t > 0)
{
t = t - cc;
c = cc;
}
else
{
t = t - cc + 1.0;
c = 0.0;
}
Q[i] = t;
} /* end i loop */
indx = 1;
t = Q[0]; /* set indx, exit 'else' with t=Q[0] */
} /* end else segment; return t-zy mod 1 */
return ((t < zy) ? 1.0 + (t - zy) : t - zy);
} /* end dUNI() */
};

int main(void)
{ /* You must provide at least two 32-bit seeds */
kiss2 krng;
double t;
int i;
for (i = 0; i < 1000000000; i++)
t = krng.dUNI ();
printf ("%.16f\n", krng.dUNI ());
}

/*
Output, after 10^9 random uniforms:
0.6203646342357479
e.g. with gcc -O3 opimizing compiler
--------------------------------------------------------
*/

/*
Fully seeding the Q[1220] array requires 64660
seed bits, plus another 106 for the initial zx
and zy of the lag-2 SWB sequence.
As in the above listing, using just two 32-bit
seeds, x and y in main(), to initialize the Q
array, one bit at a time, by means of the
resulting Congruential+Xorshift sequence
may be enough for many applications.

Applications such as in Law or Gaming may
require enough seeds in the Q[1220] array to
guarantee that each one of a huge set of
possible outcomes can appear. For example,
choosing a jury venire of 80 from a
list of 2000 eligibles would require at least
ten 53-bit seeds; choosing 180 from 4000 would
require twenty 53-bit seeds.
To get certification, a casino machine that could
play forty simultaneous games of poker must be
able to produce forty successive straight-flushes,
with a resulting minimal seed set.

Users can choose their 32-bit x,y for the
above seeding process, or develop their own
for more exacting requirements when a mere
set of 64 seed bits may not be enough.

Properties:
1. Simple and very fast, some 70 million double-precision
uniform(0,1] random variables/second, in IEEE 754 standard
form: 1 sign bit, 11 exponent bits, 52 fraction bits
plus the implied 1 leading the fraction part, making a
total of 53 bits for each uniform floating-point variate.

2. Period some 10^19492 (vs. the 10^61 of an earlier version),
G. Marsaglia and W.W. Tsang, "The 64-bit universal RNG",(2004)
Statistics and Probability Letters}, v66, no. 2, 183-187,
or an earlier version provided for Matlab.)

3. Following the KISS, (Keep-It-Simple-Stupid) principle,
combines, via subtraction, a CSWB sequence
x[n]=x[n-1220]-x[n-1190]-borrow mod 2^53
based on the prime p=b^1220-b^1190+1, b=2^53,
with a SWB lag-2 sequence z[n]=w[n-2]-z[n-1] -c mod 2^53.
All modular arithmetic on numerators of rationals
with denominators 2.^53.

4. Easily implemented in various programming languages,
as only tests on magnitude and double-precision floating-point

5. Passes all Diehard tests,
http://i.cs.hku.hk/~diehard/
As Diehard is designed for 32-bit integer input, all of its
234 tests are applied to bits 1..32, then 2..33, then 3..34,..,
then 22..53 of the 53 bits in each dUNI().
(To form 32-bit integer j from bits i..(i+31):
t=dUNI()*(1<<(i-1)); i=t; j=(t-i)*4294967296.0; )
All twenty-two choices for the 32-bit segments performed
very well on the 234 p-values from Diehard tests, using
the default 32-bit seeds x and y. You might try your own,
with possibly more extensive seeding of the Q array.

-- George Marsaglia

*/

Jonathan Lee
Guest
Posts: n/a

 04-14-2010
On Apr 14, 2:06*pm, Dann Corbit <(E-Mail Removed)> wrote:

Thanks for sharing.

Isn't this dangerous for y:

> * * * * * * for (j = 0; j < 52; j++)
> * * * * * * {
> * * * * * * * * t = 0.5 * t; /* make t=.5/2^j **/
> * * * * * * * * x = 69069 * x + 123;
> * * * * * * * * y ^= (y << 13);
> * * * * * * * * y ^= (y >> 17);
> * * * * * * * * y ^= (y << 5);
> * * * * * * * * if (((x + y) >> 23) & 1)
> * * * * * * * * * * s = s + t; /* change bit of s, maybe */
> * * * * * * } * */* end j loop */

Specifically (y >> 17) could mix in high bits if unsigned
long were 64 bit instead of (presumably) 32-bit. Just at
a quick glance, I would think the operations on y
would need to be masked to be portable.

--Jonathan

Jonathan Lee
Guest
Posts: n/a

 04-14-2010
On Apr 14, 2:06*pm, Dann Corbit <(E-Mail Removed)> wrote:
> For reference and comparison, here is a "down and dirty" translation
> into a C class.
>
> I guess that some C++ guru will take a few minutes to shine the apple a
> bit so that it can be genuinely useful.

Another thought: you could probably remove that assertion by building
the double from LSb to MSb. You'd essentially be bitreversing the
number which probably won't affect the randomness (can't prove it
off hand). If double didn't have enough bits, the machine would simply
round the value as you went along. Not ideal under some RNG scenarios,
but the result should be the "same" as the actual number up to machine
precision.

--Jonathan

Jonathan Lee
Guest
Posts: n/a

 04-14-2010
On Apr 14, 2:53*pm, Jonathan Lee <(E-Mail Removed)> wrote:
> On Apr 14, 2:06*pm, Dann Corbit <(E-Mail Removed)> wrote:
>
> Thanks for sharing.
>
> Isn't this dangerous for y:
>

Just confirmed: if y isn't masked to 32-bits the output
is wrong.

Anyway, here's my take on the code:

// kiss2.hpp
-------------------------------------------------------------------
#include <cstddef>

class kiss2 {
double Q[1220]; // meh.. I guess this could be a
std::vector
const std::size_t qlen; // length of Q array
std::size_t indx;
double c; // current CSWB
double zc; // current SWB `borrow`
double zx; // SWB seed1
double zy; // SWB seed2
public:
kiss2(unsigned long = 123456789UL, unsigned long = 362436069UL);
double operator()(); // For getting a random number
};

// kiss2.cpp
-------------------------------------------------------------------
//#include "kiss2.hpp"
#include <cfloat>
using std::size_t;
/**
\todo? Replace constant initialization of indx and qlen w/ sizeof Q/
sizeof Q[0]
\todo Magic numbers make me uneasy (re: zx, zy)
*/
kiss2::kiss2(
unsigned long x, // seed1 value
unsigned long y // seed2 value
): qlen(1220), indx(1220), c(0.0), zc(0.0),
zx(5212886298506819.0 / 9007199254740992.0),
zy(2020898595989513.0 / 9007199254740992.0)
{
#if (FLT_RADIX != 2 || DBL_MIN_EXP > -53)
#error "Machine double not supported"
#endif

// fill in Q[i] "using 9th bit from Cong+Xorshift"
for (size_t i = 0; i < qlen; i++) {
double s = 0.0, t = 1.0;

// Build Q[i] one bit at a time
for (int j = 0; j < 52; j++) {
t *= 0.5; // make t=.5/2^j
x = (69069 * x + 123);
y = (y ^ (y << 13)) & 0xFFFFFFFFUL;
y = (y ^ (y >> 17)) & 0xFFFFFFFFUL;
y = (y ^ (y << 5)) & 0xFFFFFFFFUL;

if (((x + y) >> 23) & 1UL) // conditionally set bit of s
s += t;
}
Q[i] = s;
}
}

/**
\todo Separate refill function?
\todo I guess we sorta need a guarantee that qlen >= 30
*/
double kiss2:perator()() { // Takes 14 nanosecs, Intel Q6600,2.40GHz
static const double cc = 1.0 / 9007199254740992.0; // 2^-53

double t; // t: first temp, then next CSWB value
// First get zy as next lag-2 SWB
t = zx - zy - zc;
zx = zy;
if (t < 0) {
zy = t + 1.0, zc = cc;
} else zy = t, zc = 0.0;

// Then get t as the next lag-1220 CSWB value
if (indx >= qlen) { // refill Q[n] via Q[n-1220]-Q[n-1190]-c
for (size_t i = 0; i < qlen; i++) {
size_t j = (i < 30) ? i + (qlen - 30) : i - 30;
t = Q[j] - Q[i] + c; // Get next CSWB element
if (t > 0) {
t = t - cc, c = cc;
} else t = t - cc + 1.0, c = 0.0;
Q[i] = t;
}
indx = 0; // reset indx
}

// return Q[indx] - zy (mod 1)
t = Q[indx++] - zy;
return (t < 0.0 ? 1.0 + t : t);
}

// main.cpp
--------------------------------------------------------------------
#include <cstdio>
#include <cstdlib>
// #include "kiss2.hpp"
int main(void)
{ /* You must provide at least two 32-bit seeds */
kiss2 krng; // Use the default seeds
double t;
int i;
for (i = 0; i < 1000000000; i++)
t = krng();
printf ("%.16f\n", krng());
}

--Jonathan