Velocity Reviews > KISS4691, a potentially top-ranked RNG.

# KISS4691, a potentially top-ranked RNG.

orz
Guest
Posts: n/a

 08-19-2010
On Aug 19, 6:11*am, geo <(E-Mail Removed)> wrote:
> On Aug 18, 6:08*pm, "christian.bau" <(E-Mail Removed)>
> wrote:
>
>
>
> > Hi George,

>
> > the problem is not that the carry could match or exceed the multiplier
> > 8193.
> > The problem is that the carry can be as large as 8192.

>
> > In your code you write

>
> > * t=(x<<13)+c+x; c=(t<x)+(x>>19);

>
> > To make it a bit more obvious what happens, I can change this to the
> > 100% equivalent

>
> > * t = (x << 13) + c;
> > * c = x >> 19;
> > * t += x; if (t < x) ++c;

>
> > The last line is the usual trick to add numbers and check for carry: t
> > will be less than x exactly if the addition t += x wrapped around and
> > produced a carry.

>
> > The problem is that (x << 13) + c can also produce a carry, and that
> > goes unnoticed. If c = 8192, and the lowest 19 bits in x are all set,
> > then (x << 13) has the highest 19 bits set and 13 bits zero. Adding c
> > "overflows" and gives a result of 0, and adding x leaves x. The
> > comparison (t < x) is false and produces zero, even though (x<<13)+c+x
> > should have produced a carry.

>
> > This is very, very rare. It doesn't actually happen in your original
> > source code with a total of 2 billion random numbers. If you produce
> > four billion random numbers, then it happens at i = 3596309491 and
> > again at i = 3931531774.

>
> > I am more interested in 64 bit performance, so I just made c, t, and x
> > 64 bit numbers and wrote

>
> > * uint64_t x = Q [i];
> > * uint64_t t = (x << 13) + x + c;
> > * c = (t >> 32);

>
> > That might help in Fortran as well, assuming 64 bit integers are
> > available, signed or unsigned doesn't matter.

>
> Thanks for pointing that out.
> The correction (t<x) should be (t<=x),
> to take care of that rare, but inevitable, event
> when the last 19 bits of x are 1's
> and the carry c is a-1=2^13, making (x<<13)+c = 0,
> and thus t=(x<<13)+c+x = x,
> (mod 2^32, of course, the assumed underlying
> integer arithmetic of the processor).
>
> The entire MWC() routine should have that
> (t<x) to (t<=x) correction:
>
> unsigned long MWC(void)
> { unsigned long t,x,i; static c=0,j=4691;
> * * j=(j<4690)? j+1:0;
> * * x=Q[j];
> * * t=(x<<13)+c+x; c=(t<=x)+(x>>19);
> * * return (Q[j]=t);
>
> }
>
> and for interested readers who may have pasted
> the original, please change that (t<x) to (t<=x).
> The test values from 10^9 calls will still be as indicated,
> but strict adherence to the underlying mathematics requires
> the change from (t<x) to (t<=x) to ensure the full period.
>
> Keeping that (t<x) may be viewed as, rather than making a
> circular transition through the 8193*2^133407-1 base-b digits
> of the expansion of some j/p, we jump---after an average of
> b=2^32 steps---to another point in the immense circle.
>
> It could be that the period is increased rather than
> decreased, but it remains a curiosity. Perhaps light could
> be shed with primes p=ab^n-1 for which the actual distribution
> of such jumps, averaging every b steps, could be examined.
>
> Or perhaps it could remain a mystery
> and be further fodder for those who tend to
> equate lack of understanding to "true" randomness.
>
> *George Marsaglia

Hm... I see what you're saying...
old forumula: carry = (x>>19) + (t<x)
old state: x = 2**32-1, carry = 2**13
new state: t = 2**32-1, carry = 2**13-1 !!!

The carry does appear incorrect. And the change does appear to fix
that:

new forumula: carry = (x>>19) + (t<=x)
old state: x = 2**32-1, carry = 2**13
new state: t = 2**32-1, carry = 2**13

But... doesn't that mean:

new forumula: carry = (x>>19) + (t<=x)
old state: x = 0, carry = 0
new state: t = 0, carry = 1 !!!

Isn't that incorrect?

orz
Guest
Posts: n/a

 08-19-2010
On Aug 19, 10:19*am, orz <(E-Mail Removed)> wrote:
> On Aug 19, 6:11*am, geo <(E-Mail Removed)> wrote:
>
>
>
> > On Aug 18, 6:08*pm, "christian.bau" <(E-Mail Removed)>
> > wrote:

>
> > > Hi George,

>
> > > the problem is not that the carry could match or exceed the multiplier
> > > 8193.
> > > The problem is that the carry can be as large as 8192.

>
> > > In your code you write

>
> > > * t=(x<<13)+c+x; c=(t<x)+(x>>19);

>
> > > To make it a bit more obvious what happens, I can change this to the
> > > 100% equivalent

>
> > > * t = (x << 13) + c;
> > > * c = x >> 19;
> > > * t += x; if (t < x) ++c;

>
> > > The last line is the usual trick to add numbers and check for carry: t
> > > will be less than x exactly if the addition t += x wrapped around and
> > > produced a carry.

>
> > > The problem is that (x << 13) + c can also produce a carry, and that
> > > goes unnoticed. If c = 8192, and the lowest 19 bits in x are all set,
> > > then (x << 13) has the highest 19 bits set and 13 bits zero. Adding c
> > > "overflows" and gives a result of 0, and adding x leaves x. The
> > > comparison (t < x) is false and produces zero, even though (x<<13)+c+x
> > > should have produced a carry.

>
> > > This is very, very rare. It doesn't actually happen in your original
> > > source code with a total of 2 billion random numbers. If you produce
> > > four billion random numbers, then it happens at i = 3596309491 and
> > > again at i = 3931531774.

>
> > > I am more interested in 64 bit performance, so I just made c, t, and x
> > > 64 bit numbers and wrote

>
> > > * uint64_t x = Q [i];
> > > * uint64_t t = (x << 13) + x + c;
> > > * c = (t >> 32);

>
> > > That might help in Fortran as well, assuming 64 bit integers are
> > > available, signed or unsigned doesn't matter.

>
> > Thanks for pointing that out.
> > The correction (t<x) should be (t<=x),
> > to take care of that rare, but inevitable, event
> > when the last 19 bits of x are 1's
> > and the carry c is a-1=2^13, making (x<<13)+c = 0,
> > and thus t=(x<<13)+c+x = x,
> > (mod 2^32, of course, the assumed underlying
> > integer arithmetic of the processor).

>
> > The entire MWC() routine should have that
> > (t<x) to (t<=x) correction:

>
> > unsigned long MWC(void)
> > { unsigned long t,x,i; static c=0,j=4691;
> > * * j=(j<4690)? j+1:0;
> > * * x=Q[j];
> > * * t=(x<<13)+c+x; c=(t<=x)+(x>>19);
> > * * return (Q[j]=t);

>
> > }

>
> > and for interested readers who may have pasted
> > the original, please change that (t<x) to (t<=x).
> > The test values from 10^9 calls will still be as indicated,
> > but strict adherence to the underlying mathematics requires
> > the change from (t<x) to (t<=x) to ensure the full period.

>
> > Keeping that (t<x) may be viewed as, rather than making a
> > circular transition through the 8193*2^133407-1 base-b digits
> > of the expansion of some j/p, we jump---after an average of
> > b=2^32 steps---to another point in the immense circle.

>
> > It could be that the period is increased rather than
> > decreased, but it remains a curiosity. Perhaps light could
> > be shed with primes p=ab^n-1 for which the actual distribution
> > of such jumps, averaging every b steps, could be examined.

>
> > Or perhaps it could remain a mystery
> > and be further fodder for those who tend to
> > equate lack of understanding to "true" randomness.

>
> > *George Marsaglia

>
> Hm... I see what you're saying...
> old forumula: carry = (x>>19) + (t<x)
> old state: x = 2**32-1, carry = 2**13
> new state: t = 2**32-1, carry = 2**13-1 !!!
>
> The carry does appear incorrect. *And the change does appear to fix
> that:
>
> new forumula: carry = (x>>19) + (t<=x)
> old state: x = 2**32-1, carry = 2**13
> new state: t = 2**32-1, carry = 2**13
>
> But... doesn't that mean:
>
> new forumula: carry = (x>>19) + (t<=x)
> old state: x = 0, carry = 0
> new state: t = 0, carry = 1 !!!
>
> Isn't that incorrect?

Late by several hours. I should double-check that the thread hasn't
gone on to the next page before posting.

The suggestion of using 64 bit integers seems to work... but if
compiled to 32 bit code on my machine, using 64 bit integers slows it
down by almost a factor of 2. The whole KISS, not just the MWC
measured in isolation.

I tried a few options, comparing performance between them:
(compiling to 32 bit x86 with MSVC, running on a Core 2 Duo 3 GHrz)

without the carry fix:
6.07 seconds for 10**9 calls

with 64 bit ints:
11.24 seconds for 10**9 calls
(I tried a few minor variations of this trying to get the compiler to
do better, but it refused)

with a special case if checking for the incorrectly handled 0
6.19 seconds for 10**9 calls

with some ADC instructions (inefficiently) inserted with __asm
8.68 seconds for 10**9 calls

So it looks to me like at least on 32 bit instruction sets the fastest
is adding an extra if. Any other ideas for how to handle that?

The versions of MWC with the carry fixes appears to pass and fail the
same statistical tests as the original version of MWC.

Gib Bogle
Guest
Posts: n/a

 08-19-2010
baf wrote:
> On 8/18/2010 9:00 PM, Gib Bogle wrote:
>> baf wrote:
>>> On 8/17/2010 11:35 PM, Gib Bogle wrote:
>>>> glen herrmannsfeldt wrote:
>>>>
>>>> (snipped Fortran code)
>>>>
>>>> That does it! Thanks a lot.
>>>
>>> Odd. With gfortran 4.6, I get
>>>
>>> MWC result=3740121002 (or -554846295) ? -554846294
>>> KISS result=2224631993 (or -2070335303) ? -2070335303
>>>
>>> Notice that the MWC result is off by 1

>>
>> Presumably that's a typo. The signed value of 3740121002 is -554846294

>
> Yes, that seems to be correct. Just a typo in the code.
>
>> according to me. Do this:
>>
>> integer :: u = 3740121002
>> write(*,*) u
>>
>> I'd be interested to hear how the C and Fortran times compare for you.
>>

>
> gcc -O2 27.7 sec
> gfortran -O2 31.9
>
> I am sure both of these times could be improved with some better choices
> in compile options.
>

It appears that Intel's C compiler has some pretty nifty bit-shifting
optimizations (lacking the Fortran, unfortunately).

sci.math
Guest
Posts: n/a

 08-21-2010
On Aug 21, 1:40*am, "io_x" <(E-Mail Removed)> wrote:
> "io_x" <(E-Mail Removed)> ha scritto nel messaggionews:4c6f757a\$0\$18999\$(E-Mail Removed) ws.tin.it...
>
> > this could be the .asm version...

>
> i forget to say it could be the wrong traslation or deep bugged
> so not use it if you want something sure 100%
> not i do some test on it for see the randomeness too
>
>
>
> > i'm not agree with the C version on doing all functions as macro
> > because, yes it can be more fast here, but when there are many
> > call in the code it could inflate all, and
> > could be not very easy to debug too- Hide quoted text -

>
> - Show quoted text -

My ss begins 4691

robin
Guest
Posts: n/a

 08-28-2010
"Gib Bogle" <(E-Mail Removed)> wrote in message news:i4d70n\$bll\$(E-Mail Removed)...
| robin wrote:
| > "Gib Bogle" <(E-Mail Removed)> wrote in message news:i3ssd1\$be8\$(E-Mail Removed)...
| > | orz wrote:
| > |
| > | > If I made yet another mistake on that then I'm going to throw
| > | > something.
| > | >
| > | > It produced identical results for me when I tried it in C using this
| > | > code:
| > | >
| > | > typedef unsigned int Uint32;
| > | > Uint32 sign(Uint32 value) {return value>>31;}
| > | > Uint32 index, carry, data[4691];
| > | > Uint32 mwc4691() {
| > | > index = (index < 4690) ? index + 1 : 0;
| > | > Uint32 x = data[index];
| > | > Uint32 t = (x << 13) + carry + x;
| > | > if (sign((x<<13)+carry) == sign(x))
| > | > carry = sign(x) + (x>>19);
| > | > else carry = 1 - sign(t) + (x>>19);
| > | > data[index] = t;
| > | > return t;
| > | > }
| > | >
| > | > I think the only difference between that and the algorithm he
| > | > expressed in pseudocode is that I added parentheses on the first left-
| > | > shift (he was apparently assuming higher precedence on the leftshift
| > | > operator than C uses). Maybe that's why you're getting different
| > | > results?
| > |
| > | You do realize that there's no unsigned integer type in Fortran? I don't think
| > | there's much point in trying to predict what the Fortran code does without using
| > | a Fortran compiler.
| >
| > It's perfectly possible to predict how to program the operation
| > (unsigned arithmetic) in Fortran provided that we make the
| > assumption that the hardware provides twos complement arithmetic.
|
| It's possible to predict, but more reliable to test. I can use Fortran to
| reproduce the results generated by the C code, but not using the pseudo-code
| that George provided.

The C code that George provided was for unsigned.
He also gave some alterations in C for signed.

| Unfortunately he seems to have lost interest in this
| thread. Perhaps you'd like to suggest the Fortran code to implement the method
| that George provided in C (assuming twos complement).

I already provided an unsigned version in PL/I.
Trivial changes are required for Fortran.

robin
Guest
Posts: n/a

 08-28-2010
"Richard Maine" <(E-Mail Removed)> wrote in message news:1jn12b8.7v1f0s6wd65N%(E-Mail Removed)...
| Dann Corbit <(E-Mail Removed)> wrote:
|
| > In article <i3ssd1\$be8\$(E-Mail Removed)>,
| > http://www.velocityreviews.com/forums/(E-Mail Removed) says...
|
| > > You do realize that there's no unsigned integer type in Fortran?
| ...
| > Maybe he is using Sun's Fortran 95 compiler.
|
| Though if my admittedly fallible memory serves me, the unsigned integer
| in Sun's f95 has restrictions that are likely to surprise many casual
| users. Being more specific might stretch my memory a bit farther than I
| trust, but I think I recall a lack of mixed-mode operations, for
| example. So one would have to tread pretty carefully in order to
| correctly transcribe pseudocode into working code. For example, I'm not
| at all sure that one can do such things as adding 1 to an unsigned

The entire algorithm can be implemented in unsigned aruithmetic.
It should not therefore present any problems.

robin
Guest
Posts: n/a

 08-28-2010
"glen herrmannsfeldt" <(E-Mail Removed)> wrote in message news:i4dj94\$5b4\$(E-Mail Removed)...

| To do the equivalent of an unsigned less than in twos-complement
| invert the sign bit before comparing.

What happens when the operand is zero?

| j=j+1
| if(j>4691) j=0
| x=Q(j)
| t=ishft(x,13)+c+x
| c=ishft(x,-19)
| if(ieor(t,ishft(1,-31)).lt.ieor(x,ishft(1,-31))) c=c+1

Michael Press
Guest
Posts: n/a

 12-06-2010
In article <i3ta1q\$jls\$(E-Mail Removed)>,
glen herrmannsfeldt <(E-Mail Removed)> wrote:

> In comp.lang.fortran orz <(E-Mail Removed)> wrote:
>
> (snip)
>
> > 3. Shift operators are treated as higher precedence or parentheses
> > are added around that shift.

>
> The C precedence rules for shifts are widely believed to
> be wrong, including by the developers of C, but it is considered
> too late to change. Best to always use parentheses around them.

Shifts bind more tightly than assignment,
and that is not saying much. Shifts _should_
bind as tightly as power, and that is more
tightly than multiplication and division.
As it is, shift binds less tightly than
addition. _That_ bit me once. Speaking of
bits, the bit wise binary operators bind
_less_ tightly than assignment. Yes, before
precedence chart tacked to the wall at
eye level.

> (Many Fortran rules haven't been changed for the same reason.)

--
Michael Press