Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C Programming > Re: Random 64 bit number in stdc

Reply
Thread Tools

Re: Random 64 bit number in stdc

 
 
Jorgen Grahn
Guest
Posts: n/a
 
      08-17-2013
On Thu, 2013-08-15, Phil Carmody wrote:
> Jorgen Grahn <(E-Mail Removed)> writes:
>
>> On Tue, 2013-08-13, Phil Carmody wrote:
>> ...
>> > Conventional wisdom, from decades of experience, says that the lower
>> > bits tend to be at least as crappy as the upper bits, so shifting them
>> > out is better than masking them in.

>>
>> Decades of experience, or decades-old experience?

>
> The former, and fortunately enough of a memory for the latter too.
>
>> If you're saying you have a rand() with bad randomness in the low bits
>> /today/, I'm interested to learn which implementation that is. I've
>> asked a few times, but as far as I can recall noone has claimed to
>> have seen one since the 1990s or so.

>
> Well, lets try the system I'm typing on right now:
> gcc 4.6/eglibc6 2.15/linux 3.5/x86_64
> but I'm 100% sure I could reproduce these on systems which differ
> from that configuration in every possible way.
>
>
> dieharder on 8*268435456 bits of the 1u<<0 bit, packed into u8s:
>
> #================================================= =========
> test_name |ntup| tsamples |psamples| p-value |Assessment
> #================================================= =========
> diehard_operm5| 0| 1000000| 100|0.00000000| FAILED
> diehard_rank_6x8| 0| 100000| 100|0.00000000| FAILED

[etc]

This tells me nothing, I'm afraid. My problem starts at the Linux libc
rand(3) manual, which says:

The versions of rand() and srand() in the Linux C Library use the
same random number generator as random(3) and srandom(3), so the
lower-order bits should be as random as the higher-order bits.
However, on older rand() implementations, and on current
implementations on different systems, the lower-order bits are
much less random than the higher- order bits. Do not use this
function in applications intended to be portable when good
randomness is needed. (Use random(3) instead.)

(random(3) came from BSD Unix and is part of POSIX, although I'm not
sure POSIX specifies the algorithm, size of the state and so on.)

So the quoted text makes me wonder which these unnamed "current
implementations on different systems" are. For example, lots of Unix
clones with SYSV-based standard libraries have faded away into
irrelevance (for most people) in the last few decades.

> People I know claim to see weak default PRNGs all the time,


Depends on what they mean by weak. I'm not expecting rand() to be
suitable for cryptography, for example. I'm only after the "low-order
bits are less random" claim.

> I have no
> idea where your 1990s claim comes from. You seriously need to reevaluate
> your presumptions in this direction.


The "but as far as I can recall noone has claimed to have seen one" is
based on several earlier discussions like this one in comp.lang.c,
comp.lang.c++ and comp.lang.c++.moderated, since 2003 or so.

Looking through my archives, I see that
- Keith Thompson reported that the Solaris legacy compiler /usr/ucb/cc
links with a bad rand() [presumably the classic Unix one] but that
he was unsure if you could even compile ANSI C with it.
- James Kanze wrote (in 2004) that "quite some time ago" he used to
see bad implementations "but the trend seemed to be to replacing
them".
- "Nobody" mentioned (in this thread) Visual Studio 10 where low bits
have a shorter period, but IIUC it's not nearly as bad as the
classic rand().

That's the only concrete examples I've seen.

/Jorgen

--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .
 
Reply With Quote
 
 
 
 
James Kuyper
Guest
Posts: n/a
 
      08-17-2013
On 08/17/2013 06:38 AM, Jorgen Grahn wrote:
....
> (random(3) came from BSD Unix and is part of POSIX, although I'm not
> sure POSIX specifies the algorithm, size of the state and so on.)


"The random() function shall use a non-linear additive feedback
random-number generator employing a default state array size of 31 long
integers to return successive pseudo-random numbers in the range from 0
to 231-1. The period of this random-number generator is approximately 16
x (231-1). The size of the state array determines the period of the
random-number generator. Increasing the state array size shall increase
the period.
With 256 bytes of state information, the period of the random-number
generator shall be greater than 269.

Like rand(), random() shall produce by default a sequence of numbers
that can be duplicated by calling srandom() with 1 as the seed.

The srandom() function shall initialize the current state array using
the value of seed.

The initstate() and setstate() functions handle restarting and changing
random-number generators. The initstate() function allows a state array,
pointed to by the state argument, to be initialized for future use. The
size argument, which specifies the size in bytes of the state array,
shall be used by initstate() to decide what type of random-number
generator to use; the larger the state array, the more random the
numbers. Values for the amount of state information are 8, 32, 64, 128,
and 256 bytes. Other values greater than 8 bytes are rounded down to the
nearest one of these values. If initstate() is called with 8<=size<32,
then random() shall use a simple linear congruential random number
generator. The seed argument specifies a starting point for the
random-number sequence and provides for restarting at the same point.
The initstate() function shall return a pointer to the previous state
information array.

If initstate() has not been called, then random() shall behave as though
initstate() had been called with seed=1 and size=128."
--
James Kuyper
 
Reply With Quote
 
 
 
 
James Kuyper
Guest
Posts: n/a
 
      08-19-2013
On 08/17/2013 07:51 AM, James Kuyper wrote:
....
> integers to return successive pseudo-random numbers in the range from 0
> to 231-1. The period of this random-number generator is approximately 16
> x (231-1). The size of the state array determines the period of the
> random-number generator. Increasing the state array size shall increase
> the period.
> With 256 bytes of state information, the period of the random-number
> generator shall be greater than 269. ...


That was a cut-and-paste, and I didn't notice some formatting that
didn't paste properly 231 => 2^31, 269=>2^69
 
Reply With Quote
 
Nobody
Guest
Posts: n/a
 
      08-19-2013
On Sat, 17 Aug 2013 10:38:18 +0000, Jorgen Grahn wrote:

> - "Nobody" mentioned (in this thread) Visual Studio 10 where low bits
> have a shorter period, but IIUC it's not nearly as bad as the
> classic rand().


That depends upon which version you consider "the classic rand()".

If you're referring to RANDU, then almost nothing's that bad. But that
hasn't been used for decades.

Many early C implementations (and some current implementations) used a
32-bit LCG. Many (most?) returned bits 16-30, although at least one
returned bits 0-15.

rand() has a return type of "int" and is defined to return non-negative
values, hence the 15-bit limit on 16-bit systems. On early 32-bit systems,
some implementations returned all bits and left the choice of which bits
were good enough to the programmer (who would often take the lower bits by
mistake).

All LCGs inevitably have the issue that the lower bits have shorter
periods than the higher bits, as carries propagate from low to high. If
enough of the low bits are ignored, the period of the lowest returned bit
may be high enough, but it will still only be half the period of the next
bit.

Similarly, all LCGs have the "hyperplane" property, i.e. creating an
N-tuple from N consecutive samples will result in tuples which lie on at
most (2^K)^(1/N) hyperplanes, where K is the number of state bits (not
returned bits; e.g. for a 48-bit LCG which returns 32 bits, 4-tuples will
lie in 2^12 hyperplanes, not 2^8.

This much seems to be widely understood now, at least by the compiler
vendors. rand() is often still implemented using an LCG for simplicity,
but it's unlikely to have defects which make it behave worse than the
theoretical limits for an LCG of its size.

 
Reply With Quote
 
Tim Rentsch
Guest
Posts: n/a
 
      08-21-2013
Phil Carmody <(E-Mail Removed)> writes:

> Tim Rentsch <(E-Mail Removed)> writes:
>> http://www.velocityreviews.com/forums/(E-Mail Removed) writes:

> ...
>> A better approach is to generate an appropriate value directly in
>> unsigned char units, eg
>>
>> #include <limits.h>
>>
>> #define INT_BIT /* .. number of bits in INT_MAX .. */
>> /* This can be done directly from INT_MAX using
>> the well known IMAX_BITS macro */
>>
>> int
>> uniform_nonnegative_int(){
>> unsigned mask_shift = (CHAR_BIT-1) - (INT_BIT-1) % CHAR_BIT;
>> unsigned mask = (unsigned char) -1 >> mask_shift;
>> int r = mask & (unsigned char) rand();
>> int i = (INT_BIT-1) / CHAR_BIT;
>>
>> while( i-- > 0 ) r = r << CHAR_BIT + (unsigned char) rand();

>
> Conventional wisdom, from decades of experience, says that the lower
> bits tend to be at least as crappy as the upper bits, so shifting
> them out is better than masking them in. I would also presume the
> "planes" property (relations between subseqent terms) is less likely
> to kick in too, but don't have the mathematical teeth to prove such
> a claim.


Implementations of rand() may be of arbitrarily low quality, and I
grant you that some are pretty bad. Even so, I consider it a
disservice to promulgate folklore concerning deficiencies in rand(),
even if that folklore is known to have been (or even still be) correct
for some implementations. If one doesn't mind getting low quality
random numbers, the code might just as well use rand() and treat it
as if all its outputs bits are equally good. If one does mind getting
low quality random numbers, the only sensible path is not to use
rand() at all, and use something else that does have high quality
results across all bit positions.
 
Reply With Quote
 
Phil Carmody
Guest
Posts: n/a
 
      08-23-2013
Tim Rentsch <(E-Mail Removed)> writes:
> Phil Carmody <(E-Mail Removed)> writes:
> > Tim Rentsch <(E-Mail Removed)> writes:
> >> (E-Mail Removed) writes:

> > ...
> >> A better approach is to generate an appropriate value directly in
> >> unsigned char units, eg
> >>
> >> #include <limits.h>
> >>
> >> #define INT_BIT /* .. number of bits in INT_MAX .. */
> >> /* This can be done directly from INT_MAX using
> >> the well known IMAX_BITS macro */
> >>
> >> int
> >> uniform_nonnegative_int(){
> >> unsigned mask_shift = (CHAR_BIT-1) - (INT_BIT-1) % CHAR_BIT;
> >> unsigned mask = (unsigned char) -1 >> mask_shift;
> >> int r = mask & (unsigned char) rand();
> >> int i = (INT_BIT-1) / CHAR_BIT;
> >>
> >> while( i-- > 0 ) r = r << CHAR_BIT + (unsigned char) rand();

> >
> > Conventional wisdom, from decades of experience, says that the lower
> > bits tend to be at least as crappy as the upper bits, so shifting
> > them out is better than masking them in. I would also presume the
> > "planes" property (relations between subseqent terms) is less likely
> > to kick in too, but don't have the mathematical teeth to prove such
> > a claim.

>
> Implementations of rand() may be of arbitrarily low quality, and I
> grant you that some are pretty bad. Even so, I consider it a
> disservice to promulgate folklore concerning deficiencies in rand(),
> even if that folklore is known to have been (or even still be) correct
> for some implementations.


I consider it a public service. My advice is free and worth at least
twice what you paid for it.

> If one doesn't mind getting low quality
> random numbers, the code might just as well use rand() and treat it
> as if all its outputs bits are equally good. If one does mind getting
> low quality random numbers, the only sensible path is not to use
> rand() at all, and use something else that does have high quality
> results across all bit positions.


I agree that those who care about their random should use a generator that
is suitable to their specific needs, but should also understand what those
needs are, and why the generator is suitable. I treat "use of the Mersenne
Twister" as a 99% reliable indicator that the programmer doesn't have a
clue about random numbers, for example. (OK, it's about 10/10 reliable,
but someone did inform me that he encountered one project that did actually
use it correctly once.)

Phil
--
If "law-abiding citizens have nothing to fear" from privacy-invading
technologies and policies, then law-abiding governments should have
nothing to fear from whistleblowers.
 
Reply With Quote
 
James Kuyper
Guest
Posts: n/a
 
      08-23-2013
On 08/23/2013 04:42 AM, Phil Carmody wrote:
....
> needs are, and why the generator is suitable. I treat "use of the Mersenne
> Twister" as a 99% reliable indicator that the programmer doesn't have a
> clue about random numbers, for example.


If the programmer doesn't have a clue about random numbers, where did he
hear about Mersenne Twister? A person completely clueless about random
numbers wouldn't even know about rand(), much less srand(), because that
counts as at least one clue.
--
James Kuyper
 
Reply With Quote
 
Keith Thompson
Guest
Posts: n/a
 
      08-23-2013
Phil Carmody <(E-Mail Removed)> writes:
[...]
> I agree that those who care about their random should use a generator that
> is suitable to their specific needs, but should also understand what those
> needs are, and why the generator is suitable. I treat "use of the Mersenne
> Twister" as a 99% reliable indicator that the programmer doesn't have a
> clue about random numbers, for example. (OK, it's about 10/10 reliable,
> but someone did inform me that he encountered one project that did actually
> use it correctly once.)


Can you briefly summarize (or provide a citation for) the problems with
the Mersenne Twister? The Wikipedia article has a small "Disadvantages"
section, but I don't see anything there that suggests it's as
problematic as you say.

--
Keith Thompson (The_Other_Keith) (E-Mail Removed) <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
 
Reply With Quote
 
James Kuyper
Guest
Posts: n/a
 
      08-23-2013
On 08/23/2013 10:55 AM, Phil Carmody wrote:
> James Kuyper <(E-Mail Removed)> writes:
>
>> On 08/23/2013 04:42 AM, Phil Carmody wrote:
>> ...
>>> needs are, and why the generator is suitable. I treat "use of the Mersenne
>>> Twister" as a 99% reliable indicator that the programmer doesn't have a
>>> clue about random numbers, for example.

>>
>> If the programmer doesn't have a clue about random numbers, where did he
>> hear about Mersenne Twister? A person completely clueless about random
>> numbers wouldn't even know about rand(), much less srand(), because that
>> counts as at least one clue.

>
> So you've never heard the phrase "A little knowledge is a dangerous thing"?


Yes, of course, but "clueless" is zero knowledge, not "a little knowledge".
 
Reply With Quote
 
Öö Tiib
Guest
Posts: n/a
 
      08-23-2013
On Friday, 23 August 2013 18:32:16 UTC+3, Keith Thompson wrote:
> Phil Carmody <(E-Mail Removed)> writes:
> [...]
> > I agree that those who care about their random should use a generator that
> > is suitable to their specific needs, but should also understand what those
> > needs are, and why the generator is suitable. I treat "use of the Mersenne
> > Twister" as a 99% reliable indicator that the programmer doesn't have a
> > clue about random numbers, for example. (OK, it's about 10/10 reliable,
> > but someone did inform me that he encountered one project that did actually
> > use it correctly once.)

>
> Can you briefly summarize (or provide a citation for) the problems with
> the Mersenne Twister? The Wikipedia article has a small "Disadvantages"
> section, but I don't see anything there that suggests it's as
> problematic as you say.


It is often (I would agree with Phil that almost idiomatically) seeded
with wrong things like 'time(0)' that Wikipedia's article mentions
several times: "The Mersenne Twister is sensitive to poor initialization
and can take a long time to recover from a zero-excess initial state."
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Re: Random 64 bit number in stdc Ben Bacarisse C Programming 2 08-07-2013 11:49 AM
32-bit Number * 32-bit Number = 64-bit result Jeff.M Javascript 6 05-04-2009 09:21 PM
is this prog stdc? Heinrich Pumpernickel C Programming 7 09-10-2007 05:31 PM
Re: stdc++ fstream file permission Avner C++ 0 10-15-2005 05:34 AM
stdc++ fstream file permission Avner C++ 1 10-15-2005 03:29 AM



Advertisments