Velocity Reviews > A new classification method for RNGs: Significance Level

A new classification method for RNGs: Significance Level

joe
Guest
Posts: n/a

 07-11-2008
My experiments show that the random number generator
in Microsoft's VC++6 compiler is a statistical RNG with a
significance level > 1.0%.
Statistical testing at SL >1.0% (for example 1.001%) passes the test,
but 1.0% does not pass...

Can anybody confirm this finding?

The RNG function of the various SW products can be
analyzed and classified better using its significance level
as shown above.
I think this IMO important finding deserves a deeper research...

For the testing method see:
http://en.wikipedia.org/wiki/Binomial_test
http://en.wikipedia.org/wiki/Binomial_distribution

joe
Guest
Posts: n/a

 07-11-2008
"David Kerber" wrote:
> http://www.velocityreviews.com/forums/(E-Mail Removed)lid says:
> >
> > My experiments show that the random number generator
> > in Microsoft's VC++6 compiler is a statistical RNG with a
> > significance level > 1.0%.
> > Statistical testing at SL >1.0% (for example 1.001%) passes the test,
> > but 1.0% does not pass...
> >
> > Can anybody confirm this finding?
> >
> > The RNG function of the various SW products can be
> > analyzed and classified better using its significance level
> > as shown above.
> > I think this IMO important finding deserves a deeper research...
> >
> > For the testing method see:
> > http://en.wikipedia.org/wiki/Binomial_test
> > http://en.wikipedia.org/wiki/Binomial_distribution

>
> What are you using for your sample size and null hypothesis?

Here the details:
Sample size is 500 (ie. calling rand() 500 times),
rnd range is 37 (ie. 0 to 36; yes, a roulette simulation).
Doing the above mentioned StatTest after each rand() call.
The above said is called more than 30 times in a loop,
each time initializing the freq stats anew.
srand(time(0)) done only once at pgmstart.

joe
Guest
Posts: n/a

 07-11-2008
"joe" wrote:
> "David Kerber" wrote:
> > (E-Mail Removed)lid says:
> > >
> > > My experiments show that the random number generator
> > > in Microsoft's VC++6 compiler is a statistical RNG with a
> > > significance level > 1.0%.
> > > Statistical testing at SL >1.0% (for example 1.001%) passes the test,
> > > but 1.0% does not pass...
> > >
> > > Can anybody confirm this finding?
> > >
> > > The RNG function of the various SW products can be
> > > analyzed and classified better using its significance level
> > > as shown above.
> > > I think this IMO important finding deserves a deeper research...
> > >
> > > For the testing method see:
> > > http://en.wikipedia.org/wiki/Binomial_test
> > > http://en.wikipedia.org/wiki/Binomial_distribution

> >
> > What are you using for your sample size and null hypothesis?

H0 = RNG passes the randomness test only at a significance level >1%
ie. try out 1% and 1.001% for example and you will
see it always fails at the <=1% level and always passes at >1% sig level.

> Here the details:
> Sample size is 500 (ie. calling rand() 500 times),
> rnd range is 37 (ie. 0 to 36; yes, a roulette simulation).
> Doing the above mentioned StatTest after each rand() call.
> The above said is called more than 30 times in a loop,
> each time initializing the freq stats anew.
> srand(time(0)) done only once at pgmstart.

joe
Guest
Posts: n/a

 07-11-2008
"David Kerber" wrote:
> (E-Mail Removed)lid says:
> > "joe" wrote:
> > > "David Kerber" wrote:
> > > > (E-Mail Removed)lid says:
> > > > >
> > > > > My experiments show that the random number generator
> > > > > in Microsoft's VC++6 compiler is a statistical RNG with a
> > > > > significance level > 1.0%.
> > > > > Statistical testing at SL >1.0% (for example 1.001%) passes the test,
> > > > > but 1.0% does not pass...
> > > > >
> > > > > Can anybody confirm this finding?
> > > > >
> > > > > The RNG function of the various SW products can be
> > > > > analyzed and classified better using its significance level
> > > > > as shown above.
> > > > > I think this IMO important finding deserves a deeper research...
> > > > >
> > > > > For the testing method see:
> > > > > http://en.wikipedia.org/wiki/Binomial_test
> > > > > http://en.wikipedia.org/wiki/Binomial_distribution
> > > >
> > > > What are you using for your sample size and null hypothesis?

> >
> > H0 = RNG passes the randomness test only at a significance level >1%
> > ie. try out 1% and 1.001% for example and you will
> > see it always fails at the <=1% level and always passes at >1% sig level.
> >
> > > Here the details:
> > > Sample size is 500 (ie. calling rand() 500 times),
> > > rnd range is 37 (ie. 0 to 36; yes, a roulette simulation).
> > > Doing the above mentioned StatTest after each rand() call.
> > > The above said is called more than 30 times in a loop,
> > > each time initializing the freq stats anew.
> > > srand(time(0)) done only once at pgmstart.

>
> I think your sample size is too small for such a conclusion to be
> statistically valid.

The minimum neccessary is given in these relations
(cf. the above wiki pages and also stats books):

n*p >= 5 AND n*(1-p) >= 5

ie. for the above example:
n >= 5 / (1/37) >= 185
Above we have 500, ie. from 185 to 500 the test can be applied.

user923005
Guest
Posts: n/a

 07-11-2008
On Jul 11, 5:39*am, "joe" <(E-Mail Removed)> wrote:
> My experiments show that the random number generator
> in Microsoft's VC++6 compiler is a statistical RNG with a
> significance level > 1.0%.
> Statistical testing at SL >1.0% (for example 1.001%) passes the test,
> but 1.0% does not pass...
>
> Can anybody confirm this finding?
>
> The RNG function of the various SW products can be
> analyzed and classified better using its significance level
> as shown above.
> I think this IMO important finding deserves a deeper research...
>
> For the testing method see:http://en.wikipedia.org/wiki/Binomia...l_distribution

I guess that what you really want is the statistics group.
news:sci.stat.math

joe
Guest
Posts: n/a

 07-11-2008
"David Kerber" wrote:
> (E-Mail Removed)lid says:
> > "David Kerber" wrote:
> > > (E-Mail Removed)lid says:
> > > > "joe" wrote:
> > > > > "David Kerber" wrote:
> > > > > > (E-Mail Removed)lid says:
> > > > > > >
> > > > > > > My experiments show that the random number generator
> > > > > > > in Microsoft's VC++6 compiler is a statistical RNG with a
> > > > > > > significance level > 1.0%.
> > > > > > > Statistical testing at SL >1.0% (for example 1.001%) passes the test,
> > > > > > > but 1.0% does not pass...
> > > > > > >
> > > > > > > Can anybody confirm this finding?
> > > > > > >
> > > > > > > The RNG function of the various SW products can be
> > > > > > > analyzed and classified better using its significance level
> > > > > > > as shown above.
> > > > > > > I think this IMO important finding deserves a deeper research...
> > > > > > >
> > > > > > > For the testing method see:
> > > > > > > http://en.wikipedia.org/wiki/Binomial_test
> > > > > > > http://en.wikipedia.org/wiki/Binomial_distribution
> > > > > >
> > > > > > What are you using for your sample size and null hypothesis?
> > > >
> > > > H0 = RNG passes the randomness test only at a significance level >1%
> > > > ie. try out 1% and 1.001% for example and you will
> > > > see it always fails at the <=1% level and always passes at >1% sig level.
> > > >
> > > > > Here the details:
> > > > > Sample size is 500 (ie. calling rand() 500 times),
> > > > > rnd range is 37 (ie. 0 to 36; yes, a roulette simulation).
> > > > > Doing the above mentioned StatTest after each rand() call.
> > > > > The above said is called more than 30 times in a loop,
> > > > > each time initializing the freq stats anew.
> > > > > srand(time(0)) done only once at pgmstart.
> > >
> > > I think your sample size is too small for such a conclusion to be
> > > statistically valid.

> >
> > The minimum neccessary is given in these relations
> > (cf. the above wiki pages and also stats books):
> >
> > n*p >= 5 AND n*(1-p) >= 5
> >
> > ie. for the above example:
> > n >= 5 / (1/37) >= 185
> > Above we have 500, ie. from 185 to 500 the test can be applied.

>
> But you said you're initializing the frequency stats before each loop
> iteration? To me, that would imply that your sample size is really only
> 36 for each of 30 or so runs. Or am I misunderstanding how you are
> accumulating the results?

The two loops I used are not that important.
I just simulate 30 days with each day 500 rand's (ie. spins, draws etc.).
At the beginning of each day the frequency stats are cleared.
Ie. the tests are done for intraday stats only.
I hope this makes it clear.

BTW, there is also a parallel discussion on this in sci.math
under the subject "Detecting biased random number generators".

Jerry Coffin
Guest
Posts: n/a

 07-12-2008
In article <g58mkt\$rem\$(E-Mail Removed)>, (E-Mail Removed)lid says...

[ ... ]

> The two loops I used are not that important.
> I just simulate 30 days with each day 500 rand's (ie. spins, draws etc.).
> At the beginning of each day the frequency stats are cleared.
> Ie. the tests are done for intraday stats only.
> I hope this makes it clear.
>
> BTW, there is also a parallel discussion on this in sci.math
> under the subject "Detecting biased random number generators".

Out of curiosity, what exact code are you using to get from the original
range of the generator to the 0-35 that you're using?

--
Later,
Jerry.

The universe is a figment of its own imagination.

joe
Guest
Posts: n/a

 07-12-2008
"Jerry Coffin" <(E-Mail Removed)> wrote:
> (E-Mail Removed)lid says...
>
> [ ... ]
>
> > The two loops I used are not that important.
> > I just simulate 30 days with each day 500 rand's (ie. spins, draws etc.).
> > At the beginning of each day the frequency stats are cleared.
> > Ie. the tests are done for intraday stats only.
> > I hope this makes it clear.
> >
> > BTW, there is also a parallel discussion on this in sci.math
> > under the subject "Detecting biased random number generators".

>
> Out of curiosity, what exact code are you using to get from the original
> range of the generator to the 0-35 that you're using?

Jerry, it is 0 to 36.
As recommended in many books (for example Phillip Good
"Permutation, Parametric and Bootstrap Tests of Hypotheses")
I use the following:

int genrand(int lo, int hi)
{
int z = rand() % (hi - lo + 1) + lo;
return z;
}
....
int r = genrand(0, 36);

joe
Guest
Posts: n/a

 07-12-2008
"Richard Heathfield" <(E-Mail Removed)> wrote:
> joe said:
>
> <snip>
>
> > As recommended in many books (for example Phillip Good
> > "Permutation, Parametric and Bootstrap Tests of Hypotheses")
> > I use the following:
> >
> > int genrand(int lo, int hi)
> > {
> > int z = rand() % (hi - lo + 1) + lo;
> > return z;
> > }
> > ...
> > int r = genrand(0, 36);

>
> Better:
>
> #include <stdlib.h>
>
> int genrand(int low, int high)
> {
> return (high - low + 1) * (rand() / (RAND_MAX + 1.0)) + low;
> }
>
> This avoids over-dependence on the low-order bits, which are a little too
> predictable in some implementations.

Yes, this makes much sense, although it uses floating point.
But the observed phenomenon described in the IP of this thread,
ie. that the MS-rand is:
a) a statistical RNG, and that
b) it operates at 1% SL (or equivalently 99% CL)
still holds also with the above.

joe
Guest
Posts: n/a

 07-13-2008
"Pete Becker" <(E-Mail Removed)> wrote:
> Richard Heathfield <(E-Mail Removed)> said:
> > joe said:
> >
> >> As recommended in many books (for example Phillip Good
> >> "Permutation, Parametric and Bootstrap Tests of Hypotheses")
> >> I use the following:
> >>
> >> int genrand(int lo, int hi)
> >> {
> >> int z = rand() % (hi - lo + 1) + lo;
> >> return z;
> >> }
> >> ...
> >> int r = genrand(0, 36);

> >
> > Better:
> >
> > #include <stdlib.h>
> > int genrand(int low, int high)
> > {
> > return (high - low + 1) * (rand() / (RAND_MAX + 1.0)) + low;
> > }
> >
> > This avoids over-dependence on the low-order bits, which are a little too
> > predictable in some implementations. Usage is the same as in your example.

>
> You're both wrong. <g>
>
> Yes, there may be a dependence on low bits, but even if there isn't,
> both of these approaches introduce a bias.
>
> Suppose, for simplicity, that you're trying to generate 37 values with
> a generator for which RAND_MAX is 38. If you reduce the generated value
> modulo 37 you'll get values from 0-37. In fact, the values 0-37
> produced by rand() will map to 0-37, and 38 will map to 0. So 0 will
> come up twice as often as any other value.
>
> Changing the approach to use floating point doesn't change the fact
> that you're trying to produce 38 values from a source that gives you
> 39. No matter how you do it, you're not going to get an even
> distribution.
>
> Granted, the bias isn't large when RAND_MAX is much greater than the
> number of values you're interested in, but it's there, nonetheless. And
> when RAND_MAX is around 32767, which is quite common, it wouldn't
> particularly surprise me if that alone accounted for the 1% deviation
> from expectations.
>
> The way to avoid this is to throw out some of the values from rand() so
> that the number of values that you put into the transformation is a
> multiple of the number of values you want:
>
> int max = RAND_MAX + 1 - (RAND_MAX + 1) % range;
> int value = rand();
> while (max <= value)
> value = rand();
> return value % range;
>
> There's probably an off-by-one error in that code, but it should give
> you the general idea.

Pete, can you confirm that then the max possible value
one can get is RAND_MAX - 1, and not RAND_MAX ?
For me it doesn't matter much whether the max possible
value is RAND_MAX or 1 less of it, but I want be sure.

But I think the above code for max is not correct.
It should be IMHO something like that:
int max = (RAND_MAX / range) * range;
where range can be maximally RAND_MAX.
Then the maximally possible output lies between 0 and RAND_MAX - 1,
excluding RAND_MAX, right?

Ie. standard rand() gives values between 0 and RAND_MAX
but using the above method gives values between 0 and RAND_MAX - 1 only.
But this seems to be correct to avoid the problem you described above
regarding the 0.