Velocity Reviews > The CMWC4827 RNG, an improvement on MWC4691.

# The CMWC4827 RNG, an improvement on MWC4691.

geo
Guest
Posts: n/a

 10-22-2010

In August 2010 I posted descriptions of a MWC
(Multiply-With-Carry) RNG based on, with b=2^32,
the prime p=8193b^4691-1. With an apparently
very long period and multiplier a=(2^13+1), the
basic MWC operation could be carried out using only
32-bit operations: given 32-bit c and x, imagine
t=a*x+c in 64 bits, determine the new c and new x
as the top and bottom 32-bits of t.

There were two drawbacks: rare nuisance cases that
had to be accommodated, and no proof of the order of b
for the prime p, the period of the RNG. We can be
almost---but not quite---certain that the order of
b mod p is k=(p-1)/2, since b^k mod p = 1 and for each
attainable prime divisor q of k---that is, with q one
of 431,18413, 15799501,1505986643549,
2883504568596254032007909, b^(k/q) mod p is not 1.
(My thanks to Darío Alejandro Alpern of Buenos Aires for
providing the larger, and estimates on future, factors.)

But there are at least two more, presently unattainable,
prime divisors of k, each with at least 30 digits.
With z a primitive root of p and b=z^j, the chances of j
being a multiple of one of those unknown prime divisors
of k seem less than 1/10^30. So we can be reasonably---
indeed, very---confident that the Industrial Grade Order.
(IGO) of b is (p-1)/2 for the prime p=8193b^4691-1.

So, with nuisance cases and uncertainties in mind, I put
three computers to work searching for a prime of the
form p=a*b^r+1, with a=2^i-1, to avoid the nuisance
cases arising from a=2^13+1, and p=(2^i-1)*b^r+1 so
that factoring p-1 would be feasible when r>4000.

After about 5 days, one of them found p=4095*b^4827+1.
And it turned out that the order of b mod p was the
maximum possible, k=(p-1)/2^6, (since we "lose" one 2
because 2 cannot be primitive, and must lose at least
five more because b=2^5 and k is 4095*2^15445.
Thus b^k mod p = 1, and for each prime divisor q of k,
q=2,3,5,7,13: b^(k/q) mod p is not 1.

Unlike p=ab^r-1, a prime of the form p=ab^r+1 leads
to CMWC (Complimentary-Multiply-With-Carry) RNGs, in
which we again imagine forming t=a*x+c in 64 bits and
again seek c as the top 32 bits, but rather than
x=(t mod b) for MWC, the new x is x=(b-1)-(t mod b),
that is x=~(t mod b), using C's ~ op.

Here is a C version of the resulting CMWC method for
p=4095*b^4827+1, using only 32-bit arithmetic, with
verified period 4095*2^154458, faster and simpler than
for either signed or unsigned integers and for Fortran
or other programming languages:
__________________________________________________ _______

#include <stdio.h>

static unsigned long Q[4827],carry=1271;

unsigned long CMWC4827(void)
{unsigned long t,x; static int j=4827;
j=(j<4826)? j+1:0;
x=Q[j]; t=(x<<12)+carry;
carry=(x>>20)-(t<x);
return (Q[j]=~(t-x));
}

#define CNG ( cng=69069*cng+13579 )
#define XS ( xs^=(xs<<13), xs^=(xs>>17), xs^=(xs<<5) )
#define KISS4827 ( CMWC4827()+CNG+XS )

int main(void)
{unsigned long int i,x,cng=123456789,xs=362436069;
/* First seed Q[] with CNG+XS: */
for(i=0;i<4827;i++) Q[i]=CNG+XS;
/* Then generate 10^9 CMWC4827()s */
for(i=0;i<1000000000;i++) x=CMWC4827();
printf("Does x=1346668762?\n x=%lu\n",x);
/* followed by 10^9 KISS4827s: */
for(i=0;i<1000000000;i++) x=KISS4827;
printf("Does x=4041198809?\n x=%lu\n",x);

return 0;
}
__________________________________________________ ____

Getting 10^9 CMWC4827s should take less than four seconds,
while 10^9 KISS4827s should take less than seven seconds.

For signed integers, replace that single (t<x) by (t'<x'),
where t' means flip the sign bit: t^(1<<31) for C
or ieor(t,ishft(1,-31)) for Fortran,
(or use hex 10000000 to specify the bit to be flipped).

CMWC4827() used alone will pass all tests in
the Diehard Battery of Tests of Randomness
www.cs.hku.hk/~diehard/
But as any RNG based on a single mathematical structure
is likely to have potential flaws---such as, but
perhaps not as striking as congruential RNGs "falling
mainly in the planes"---I advocate the KISS
(Keep-It-Simple-Stupid) approach: combine with
Congruential and Xorshift sequences invoked inline.

Why not splurge at a cost of 3 nanoseconds?

George Marsaglia

Dann Corbit
Guest
Posts: n/a

 10-22-2010
In article <df5387d8-95fe-43b0-9b56-
http://www.velocityreviews.com/forums/(E-Mail Removed)>, (E-Mail Removed) says...
>
> In August 2010 I posted descriptions of a MWC
> (Multiply-With-Carry) RNG based on, with b=2^32,
> the prime p=8193b^4691-1. With an apparently
> very long period and multiplier a=(2^13+1), the
> basic MWC operation could be carried out using only
> 32-bit operations: given 32-bit c and x, imagine
> t=a*x+c in 64 bits, determine the new c and new x
> as the top and bottom 32-bits of t.
>
> There were two drawbacks: rare nuisance cases that
> had to be accommodated, and no proof of the order of b
> for the prime p, the period of the RNG. We can be
> almost---but not quite---certain that the order of
> b mod p is k=(p-1)/2, since b^k mod p = 1 and for each
> attainable prime divisor q of k---that is, with q one
> of 431,18413, 15799501,1505986643549,
> 2883504568596254032007909, b^(k/q) mod p is not 1.
> (My thanks to Darío Alejandro Alpern of Buenos Aires for
> providing the larger, and estimates on future, factors.)
>
> But there are at least two more, presently unattainable,
> prime divisors of k, each with at least 30 digits.
> With z a primitive root of p and b=z^j, the chances of j
> being a multiple of one of those unknown prime divisors
> of k seem less than 1/10^30. So we can be reasonably---
> indeed, very---confident that the Industrial Grade Order.
> (IGO) of b is (p-1)/2 for the prime p=8193b^4691-1.
>
> So, with nuisance cases and uncertainties in mind, I put
> three computers to work searching for a prime of the
> form p=a*b^r+1, with a=2^i-1, to avoid the nuisance
> cases arising from a=2^13+1, and p=(2^i-1)*b^r+1 so
> that factoring p-1 would be feasible when r>4000.
>
> After about 5 days, one of them found p=4095*b^4827+1.
> And it turned out that the order of b mod p was the
> maximum possible, k=(p-1)/2^6, (since we "lose" one 2
> because 2 cannot be primitive, and must lose at least
> five more because b=2^5 and k is 4095*2^15445.
> Thus b^k mod p = 1, and for each prime divisor q of k,
> q=2,3,5,7,13: b^(k/q) mod p is not 1.
>
> Unlike p=ab^r-1, a prime of the form p=ab^r+1 leads
> to CMWC (Complimentary-Multiply-With-Carry) RNGs, in
> which we again imagine forming t=a*x+c in 64 bits and
> again seek c as the top 32 bits, but rather than
> x=(t mod b) for MWC, the new x is x=(b-1)-(t mod b),
> that is x=~(t mod b), using C's ~ op.
>

I encapsulated things a bit, so that the code to call the prng is not
interspersed among the main program.

static unsigned long Q[4827],
carry = 1271,
cng = 123456789,
xs = 362436069;

unsigned long scramble()
{
return (cng = 69069 * cng + 13579) + (xs ^= (xs << 13), xs ^= (xs >>
17), xs ^= (xs << 5));
}

void initCMWC4827(void)
{
unsigned long int i;

for (i = 0; i < 4827; i++)
Q[i] = scramble();
}

unsigned long CMWC4827(void)
{
unsigned long t,
x;
static int j = 4827;
j = (j < 4826) ? j + 1 : 0;
x = Q[j];
t = (x << 12) + carry;
carry = (x >> 20) - (t < x);
return (Q[j] = ~(t - x));
}

#ifdef UNIT_TEST
#include <stdio.h>
int main(void)
{
unsigned long int i,
x =0;

initCMWC4827();

for (i = 0; i < 1000000000; i++)
x = CMWC4827();
printf("Does x=1346668762?\n x=%lu\n", x);

for (i = 0; i < 1000000000; i++)
x = (CMWC4827() + scramble());
printf("Does x=4041198809?\n x=%lu\n", x);

return 0;
}
#endif

/*
C:\tmp>cl /DUNIT_TEST /W4 /Ox k8.c
Microsoft (R) C/C++ Optimizing Compiler Version 16.00.30319.01 for x64

k8.c
Microsoft (R) Incremental Linker Version 10.00.30319.01

/out:k8.exe
k8.obj

C:\tmp>k8
Does x=1346668762?
x=1346668762
Does x=4041198809?
x=4041198809

C:\tmp>
*/

Jens Thoms Toerring
Guest
Posts: n/a

 10-22-2010
In comp.lang.c Dann Corbit <(E-Mail Removed)> wrote:
> In article <df5387d8-95fe-43b0-9b56-
> (E-Mail Removed)>, (E-Mail Removed) says...

<George Marsglia's description snipped>

> I encapsulated things a bit, so that the code to call the prng is not
> interspersed among the main program.

> static unsigned long Q[4827],
> carry = 1271,
> cng = 123456789,
> xs = 362436069;

> unsigned long scramble()
> {
> return (cng = 69069 * cng + 13579) + (xs ^= (xs << 13), xs ^= (xs >>
> 17), xs ^= (xs << 5));
> }

> void initCMWC4827(void)
> {
> unsigned long int i;

> for (i = 0; i < 4827; i++)
> Q[i] = scramble();
> }

> unsigned long CMWC4827(void)
> {
> unsigned long t,
> x;
> static int j = 4827;
> j = (j < 4826) ? j + 1 : 0;
> x = Q[j];
> t = (x << 12) + carry;
> carry = (x >> 20) - (t < x);
> return (Q[j] = ~(t - x));
> }

> #ifdef UNIT_TEST
> #include <stdio.h>
> int main(void)
> {
> unsigned long int i,
> x =0;

> initCMWC4827();
>
> for (i = 0; i < 1000000000; i++)
> x = CMWC4827();
> printf("Does x=1346668762?\n x=%lu\n", x);

> for (i = 0; i < 1000000000; i++)
> x = (CMWC4827() + scramble());
> printf("Does x=4041198809?\n x=%lu\n", x);

> return 0;
> }
> #endif

Based on Dann Corbit's version here's a C (C89) and C++ version
that also (seems to?) work on 64-bit machines (that's what all
the ANDing with 0xFFFFFFFFLU is about):

========= CMWC.c =====================================

static unsigned long int Q[ 4827 ];

unsigned long
scramble( )
{
static unsigned long int cng = 123456789LU;
static unsigned long int xs = 362436069LU;

cng = ( 69069 * cng + 13579 ) & 0xFFFFFFFFLU;
return ( cng + ( xs ^= ( xs << 13 ) & 0xFFFFFFFFLU,
xs ^= ( xs >> 17 ),
xs ^= ( xs << 5 ) & 0xFFFFFFFFLU ) ) & 0xFFFFFFFFLU;
}

void
initCMWC4827( void )
{
unsigned int i;

for ( i = 0; i < 4827; i++ )
Q[ i ] = scramble( );
}

unsigned long int
CMWC4827( void )
{
static int j = 4827;
static unsigned long int carry = 1271LU;

unsigned long int x = Q[ j = j < 4826 ? j + 1 : 0 ],
t = ( ( x << 12 ) + carry ) & 0xFFFFFFFFLU;

carry = ( x >> 20 ) - ( ( t < x ) & 0xFFFFFFFFLU );
return Q[ j ] = ~ ( t - x ) & 0xFFFFFFFFLU;
}

#ifdef UNIT_TEST
#include <stdio.h>
int
main( void )
{
unsigned long int i,

initCMWC4827( );

for ( i = 0; i < 1000000000; i++ )
x = CMWC4827( );
printf( "Does x=1346668762?\n x=%lu\n", x );

for ( i = 0; i < 1000000000; i++ )
x = ( CMWC4827( ) + scramble( ) ) & 0xFFFFFFFFLU;
printf( "Does x=4041198809?\n x=%lu\n", x );

return 0;
}

#endif

========= CMWC.c =====================================

========= CMWC.cpp ===================================

class CMWC4827
{
public:

CMWC4827( ) :
cng( 123456789LU ),
xs( 362436069LU ),
carry( 1271LU ),
j( 4827 )
{
for ( int i = 0; i < 4827; i++ )
Q[ i ] = scramble( );
}

unsigned long int
next( )
{
unsigned long int x = Q[ j = j < 4826 ? j + 1 : 0 ],
t = ( ( x << 12 ) + carry ) & 0xFFFFFFFFLU;

carry = ( x >> 20 ) - ( ( t < x ) & 0xFFFFFFFFLU );
return Q[ j ] = ~ ( t - x ) & 0xFFFFFFFFLU;
}

unsigned long int
nextWithScramble( )
{
return ( next( ) + scramble( ) ) & 0xFFFFFFFFLU;
}

private:

unsigned long int
scramble( )
{
cng = ( 69069 * cng + 13579 ) & 0xFFFFFFFFLU;
return ( cng + ( xs ^= ( xs << 13 ) & 0xFFFFFFFFLU,
xs ^= ( xs >> 17 ),
xs ^= ( xs << 5 ) & 0xFFFFFFFFLU ) ) & 0xFFFFFFFFLU;
}

unsigned long int Q[ 4827 ];
unsigned long int cng;
unsigned long int xs;
unsigned long int carry;
int j;
};

#ifdef UNIT_TEST
#include <iostream>

int
main( )
{
CMWC4827 rgen;
unsigned long int x;

for ( unsigned long int i = 0; i < 1000000000; i++ )
x = rgen.next( );
std::cout << "Does x = 1346668762?\n x = " << x << '\n';

for ( unsigned long int i = 0; i < 1000000000; i++ )
x = rgen.nextWithScramble( );
std::cout << "Does x = 4041198809?\n x = " << x << '\n';

return 0;
}

#endif

========= CMWC.cpp ===================================

Does anyone has a better idea how to get rid/deal more effectively
with 64-issues? Of course if one could assume the existence of a
uint32_t things would be simple, but that requires C99 or C++0x...

Regards, Jens
--
\ Jens Thoms Toerring ___ (E-Mail Removed)
\__________________________ http://toerring.de

Ian Collins
Guest
Posts: n/a

 10-22-2010
On 10/23/10 12:18 AM, geo wrote:
>
> Here is a C version of the resulting CMWC method for
> p=4095*b^4827+1, using only 32-bit arithmetic, with
> verified period 4095*2^154458, faster and simpler than
> for either signed or unsigned integers and for Fortran
> or other programming languages:

I strongly recommend you use the standard fixed width types if your code
is designed for 32 bit arithmetic. Your code will then port to 64 bit
systems without modification.

--
Ian Collins

Ian Collins
Guest
Posts: n/a

 10-22-2010
On 10/23/10 11:00 AM, Jens Thoms Toerring wrote:
>
> Based on Dann Corbit's version here's a C (C89) and C++ version
> that also (seems to?) work on 64-bit machines (that's what all
> the ANDing with 0xFFFFFFFFLU is about):
>

It's much simpler just to replace "unsigned long" with "uint32_t". Then
you don't have to mess with masks, the same code works "as is" in either
32 or 64 bit compiles.

> Does anyone has a better idea how to get rid/deal more effectively
> with 64-issues? Of course if one could assume the existence of a
> uint32_t things would be simple, but that requires C99 or C++0x...

Ah, I didn't see this until I'd written the above! Most if not all
common platforms have uint32_t defined in their system headers (either
in C99's <stdint.h> or elsewhere). If a platform doesn't have them,
they are trivial to define in your own header. Much better than messing
with masks! More so on annoying systems that have 32 bit longs even in
64 bit mode, where the masks only waste cycles, assuming they aren't
optimised away.

--
Ian Collins

Jens Thoms Toerring
Guest
Posts: n/a

 10-22-2010
In comp.lang.c Ian Collins <(E-Mail Removed)> wrote:
> Ah, I didn't see this until I'd written the above! Most if not all
> common platforms have uint32_t defined in their system headers (either
> in C99's <stdint.h> or elsewhere). If a platform doesn't have them,

Mmm, how do you do that without knowing anything about the
platform? And how to distinguish between uint32_t already
being defined or not? I would love to see some clean way to
do that. Of course, if you can use some kind of configure
script it's not too complicated, but I'm curious if there
is a way to get it right without that (even on machines that
may just have a C89 (or C++9 conforming compiler.

> Much better than messing with masks!

Yes, definitely if possible

Best regards, Jens
--
\ Jens Thoms Toerring ___ (E-Mail Removed)
\__________________________ http://toerring.de

Ian Collins
Guest
Posts: n/a

 10-22-2010
On 10/23/10 11:41 AM, Jens Thoms Toerring wrote:
> In comp.lang.c Ian Collins<(E-Mail Removed)> wrote:
>> Ah, I didn't see this until I'd written the above! Most if not all
>> common platforms have uint32_t defined in their system headers (either
>> in C99's<stdint.h> or elsewhere). If a platform doesn't have them,

>
> Mmm, how do you do that without knowing anything about the
> platform? And how to distinguish between uint32_t already
> being defined or not? I would love to see some clean way to
> do that. Of course, if you can use some kind of configure
> script it's not too complicated, but I'm curious if there
> is a way to get it right without that (even on machines that
> may just have a C89 (or C++9 conforming compiler.

A configure script is the usual way to determine the presence of headers
or types within them (by attempting to compile a snippet that uses them).

Fixed width type definitions are totally platform and compile mode
specific unfortunately..

--
Ian Collins

Keith Thompson
Guest
Posts: n/a

 10-22-2010
(E-Mail Removed) (Jens Thoms Toerring) writes:
> In comp.lang.c Ian Collins <(E-Mail Removed)> wrote:
>> Ah, I didn't see this until I'd written the above! Most if not all
>> common platforms have uint32_t defined in their system headers (either
>> in C99's <stdint.h> or elsewhere). If a platform doesn't have them,

>
> Mmm, how do you do that without knowing anything about the
> platform? And how to distinguish between uint32_t already
> being defined or not? I would love to see some clean way to
> do that. Of course, if you can use some kind of configure
> script it's not too complicated, but I'm curious if there
> is a way to get it right without that (even on machines that
> may just have a C89 (or C++9 conforming compiler.

Some years ago, Doug Gwyn published a set of files that implement
some C99 (then C9x) headers in C90.

http://www.lysator.liu.se/c/q8/

A simplified example of how you might define uint32_t (not tested):

#include <limits.h>
#if ULONG_MAX == 4294967295
typedef unsigned long uint32_t;
#elif UINT_MAX == 4294967295
typedef unsigned int uint32_t;
#elif USHRT_MAX == 4294967295
typedef unsigned short uint32_t
#elif UCHAR_MAX == 4294967295 /* unlikely */
typedef unsigned char uint32_t
#else
#error "Unable to determine type for uint32_t
#endif

This ignores padding bits and probably a few other possibilities.

--
Keith Thompson (The_Other_Keith) (E-Mail Removed) <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

Ilmari Karonen
Guest
Posts: n/a

 10-23-2010
On 2010-10-23, christian.bau <(E-Mail Removed)> wrote:
> into shifts if that is faster. I found that using "long" for "j" tends
> to produce faster code on 64 bit implementations and doesn't change
> anything on 32 bit implementations.

Since you're using C99 types anyway (and I agree that that's the
sensible thing to do), you might try using (u)int_fast16_t for j.

--
Ilmari Karonen

Jens Thoms Toerring
Guest
Posts: n/a

 10-23-2010
In comp.lang.c christian.bau <(E-Mail Removed)> wrote:
> On Oct 22, 11:00Â*pm, (E-Mail Removed) (Jens Thoms Toerring) wrote:

> > unsigned long int
> > CMWC4827( void )
> > {
> > Â* Â* static int j = 4827;
> > Â* Â* static unsigned long int carry = 1271LU;
> >
> > Â* Â* unsigned long int x = Q[ j = j < 4826 ? j + 1 : 0 ],
> > Â* Â* Â* Â* Â* Â* Â* Â* Â* Â* Â* t = ( ( x << 12 ) + carry ) & 0xFFFFFFFFLU;
> >
> > Â* Â* carry = ( x >> 20 ) - ( ( t < x ) & 0xFFFFFFFFLU );
> > Â* Â* return Q[ j ] = ~ ( t - x ) & 0xFFFFFFFFLU;
> >
> > }

> It seems the original source code emulated 64 bit arithmetic using 32
> bit arithmetic, and your code now uses 64 bit arithmetic to emulate 32
> bit arithmetic emulating 64 bit arithmetic. I'd write, assuming that Q
> is defined as uint32_t:

> uint32_t CMWC4827 (void)
> {
> static long j = 4827;
> static uint64_t carry = 1271;

> if (++j >= 4827) j = 0;
> uint64_t x = Q [j] * (uint64_t) 4095 + carry;
> carry = x >> 32;
> return (Q [j] = x);
> }

I see. There's just an omission in the last line, it must be

return Q[ j ] = ~ x;

Now that's quite a bit faster

Best regards, Jens
--
\ Jens Thoms Toerring ___ (E-Mail Removed)
\__________________________ http://toerring.de