# 32/64 bit cc differences

Ben Bacarisse
 01-13-2014
glen herrmannsfeldt <(E-Mail Removed)> writes:

> Dr Nick <(E-Mail Removed)> wrote:
>> Eric Sosman <(E-Mail Removed)> writes:

> (snip, I wrote)
>>>> the easy way is to take that modulo one more than the largest
>>>> value you want.

>>> Many authors recommend against this, because the low-order
>>> bits of maximum-period linear congruential generators have short
>>> periods. But the "Minimal Standard" generator is not such a
>>> generator: It's a pure congruential generator with prime modulus,
>>> and its low-order bits are "as random as" the others.

>> Another reason to do that is that it can lead to a bias in the numbers.

>> Consider a generator that produces 0-9 inclusive, with equal
>> probability. If you take the results mod 3 you get 3 instances of 1,
>> three of 2, and four of 0. It's a small bias, but a real one.

> And, of course, using floating point there isn't any bias...

When the floats are used to make an integer selection (i.e. you replace
int_rand() % max with floor(float_rand() * max)) the bias remains.

Ben.

Keith Thompson
 01-13-2014
Ben Bacarisse <(E-Mail Removed)> writes:
> jacob navia <(E-Mail Removed)> writes:
>> Le 12/01/2014 22:09, Keith Thompson a écrit :
>>> #include <limits.h>
>>> #if INT_MAX < 2147483647
>>> #error This code requires at least 32-bit int
>>> #endif

>> A system with sizeof(int) of 16 bits will have problems with the above
>> constant "2147483647" since it is an integer constant that overflows
>> thompson

> No. What matters is the types intmax_t and uintmax_t and they can't be
> anything like as small as 16 bits. See 6.10.1p4.
And C90 had a similar rule, with preprocessor expressions evaluated in
type long or unsigned long. 2147483647 isn't a problem for any
conforming compiler. (And if you're using a non-conforming compiler,

Keith Thompson (The_Other_Keith)
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

JohnF
 01-13-2014
Keith Thompson <(E-Mail Removed)> wrote:
> JohnF <(E-Mail Removed)> writes:
>> Keith Thompson <(E-Mail Removed)> wrote:
>>> JohnF <(E-Mail Removed)> writes:
>>> [...]
>>>> The code's supposed to be portable, and not care as long as int>=32.
>>>
>>> Then it's only mostly portable; the standard allows int to be as
>>> narrow as 16 bits.

>>
>> Yeah, but that's pretty much deprecated/archaic, at least for
>> general purpose computers. I usually just try to follow K&R 2nd ed
>> for "portable" syntax, whereas "portable semantics" gets trickier,
>> and I usually just try to figure "anything that can go wrong will".

> Most modern *hosted* implementations make int 32 bits, but there's
> nothing deprecated or archaic about 16-bit int (at least as far as
> the C standard is concerned).
>
> POSIX requires at least 32 bits, so if your program already depends on
> POSIX features, you can safely make that assumption. Otherwise, you can
> certainly assume 32-bit or wider int if you want to, but I personally
> would take care to make that assumption explicit, so if someone tries to
> compile my code with a fully conforming implementation that happens to
> have 16-bit int the problem will be detected early.
> #include <limits.h>
> #if INT_MAX < 2147483647
> #error This code requires at least 32-bit int
> #endif

I wasn't really trying to make any big point. Nowadays I just think,
like you say wrt posix, it's reasonable not to worry about <32-bit
architectures, at least for general purpose programs not intended
to be embedded anywhere, etc. Portability across platforms has so
many pitfalls that you can't reasonably worry about every conceivable
one, but have to "choose your battles".
For example, one thing I dislike about mswin (in addition to what
you mention below) is that stdout isn't default binary mode, but
outputs two chars, crlf, for your one \n. Code that cares, and which
is intended to run on both win and unix, gets messy dealing with that.

>>>> The one place it wanted strictly >32 I used long long (despite
>>>> obnoxious -Wall warnings about it). Anyway, I found the problem,
>>>> explained in subsequent followup, kind of along the lines you're
>>>> suggesting, but a rounding problem.
>>>
>>> I'd probably use int64_t and friends. But what warnings do you get when
>>> you use long long? You can likely get rid of any such warnings by
>>> telling your compiler to conform to C99 or later.

>>
>> That might be preferable to LL. All three compilers
>> 64-bit: cc --version cc (Debian 4.3.2-1.1) 4.3.2
>> 32-bit: cc --version cc (NetBSD nb2 20110806) 4.5.3
>> cc --version cc (GCC) 4.7.1
>> issue similar -pedantic -Wall warnings. Explicitly, from 4.7.1,
>> fm.c: In function 'rseeder':
>> fm.c:865:6: warning: ISO C90 does not support 'long long' [-Wlong-long]
>> fm.c:866:11: warning: use of C99 long long integer constant [-Wlong-long]
>> fm.c:877:20: warning: ISO C90 does not support 'long long' [-Wlong-long]
>> fm.c:878:11: warning: ISO C90 does not support 'long long' [-Wlong-long]
>> fm.c:880:25: warning: use of C99 long long integer constant [-Wlong-long]
>> fm.c:880:30: warning: use of C99 long long integer constant [-Wlong-long]
>> fm.c:880:37: warning: use of C99 long long integer constant [-Wlong-long]
>> fm.c:892:3: warning: ISO C90 does not support the 'll' gnu_printf
>> length modifier [-Wformat]
>> But that whole function ought to be re-algorithmized anyway,
>> so my concern is pretty minimal.

> Note how the warning is phrased: "ISO C90 does not support 'long long'".
> The long long type has been a standard C feature since the 1999 standard
> (and a common extension before that). Failure to support long long is
> not merely deprecated, it's completely non-standard. If you're willing
> to assume that int is at least 32 bits, you should be even more willing
> to assume that long long exists.
>
> And <stdint.h> also did not exist in C90; both it and long long were
> introduced by C99.
>
> Just invoke your compiler with options to tell it to use a more modern
> version of the language.
>
> gcc in particular uses "-std=gnu89" by default, which is C89/C90 with
> GNU extensions. IMHO this is unfortunate, and it's time for gcc to
> support C99 by default. But it probably doesn't make much sense to rely
> on gcc's default anyway.
>
> If you need your code to be portable to Microsoft's compiler, you might
> have a problem; I don't remember whether it supports long long, but I
> know it doesn't support C99 or C11.

Thanks for suggestions, but note above remark,
But that whole function [that uses long long] ought to be
re-algorithmized anyway, so my concern is pretty minimal.
And, as per a previous followup, the whole "obnoxious warnings"
remark was intended to be humorous, and I'd actually prefer
continuing to see the warnings, just to remind me that I should
get around to fixing the algorithm (it's the one that seeds
the rng with a hash-like number derived from your key -- I should
choose a better hash).
John Forkosh

JohnF
 01-13-2014
Ike Naar <(E-Mail Removed)> wrote:
> On 2014-01-12, JohnF <(E-Mail Removed)> wrote:
>> int iran1 ( int ilo, int ihi ) { /* you want int rn from ilo to ihi */
>> long ran1(/*some args go here*/), /*original rng from Numerical Recipes*/
>> iran = ran1(/*args*/), /* integer result from rng */
>> range = ihi-ilo+1, /* ihi-ilo+1 */
>> IM = 2147483647, /* ran1()'s actual range is 1...IM */

>
> Isn't ran1()'s actual range [1..2147483646] ?
>
>> imax = IM - (IM%range); /* force iran's max to a multiple of range */
>> while ( iran >= imax ) iran=ran1(/*args*/); /*discard out-of-range iran*/
>> return ( ilo + (iran%range) ); } /* back with random ilo <= i <= ihi */

Yes, actual max is IM-1, which code accommodates with >= in while().
I'll debug the comments later. But the whole bias problem solved by
all this is miniscule when range<<IM, which is pretty much always
the case. You can just comment out that while() and forget the whole
thing.
John Forkosh

John Forkosh
 01-13-2014
Ben Bacarisse <(E-Mail Removed)> wrote:
> JohnF <(E-Mail Removed)> writes:
>> Ben Bacarisse <(E-Mail Removed)> wrote:

>>> Unfortunately, for cryptographic work, you should have very strong
>>> guarantees about the way it behaves, but since this PRNG is designed for
>>> numerical work, you have probably tacitly assumed it is good enough.
>>>
>>> To get some more confidence, test the integer PRNG suing any one of the
>>> standard random test suites. It won't give you cryptographic levels of
>>> confidence, but it will ensure that you can use all the bits with equal
>>> confidence.

>>
>> The section, starting on page 278 of Numerical Recipes in C, 2nd ed,
>> discusses (and provides code for) several rng's, including some tests.

>
> They may be numerical tests based on the floating point value. It will
> make almost no difference to a numerical test if the bottom bit of the
> int (just before the final divide) cycles 0,1,1,0,1,1,0,... (for
> example) but it will make a big difference if you make binary choices by
> using ran1(...) & 1. Eric S suggests that this sort of thing does not
> happen with the PRNG you use, but I'd not seen that post when I wrote.
>
>> Based on all that, you're right, I tacitly assumed ran1() okay.

>
> Assuming that the floats are well distributed, is not quite the same as
> assuming that the ints have all the right properties so a test or two
> would not go aims.

Actually, I think your "amiss" went amiss .
More to the point, I'm now using that iran1() function in preceding
followup, which (if you're not easily finding it) is,
"...the solution I've now coded was based on Eric's preceding
discussion.
It's pseudocoded below from the real code in forkosh.com/fm.zip,"
int iran1 ( int ilo, int ihi ) { /*you want an int rn from ilo to ihi*/
long ran1(/*some args go here*/), /*original rng from Numerical Recipes*/
iran = ran1(/*args*/), /* integer result from rng */
range = ihi-ilo+1, /* the range you want is ihi-ilo+1 */
IM = 2147483647, /* ran1()'s actual range is 1...IM */
imax = IM - (IM%range); /* force ran1's max to a multiple of range*/
while ( iran >= imax ) iran=ran1(/*args*/); /*reject out-of-range iran*/
return ( ilo + (iran%range) ); } /* back with random ilo <= i <= ihi */

So it's using mod arithmetic rather than &. But for the one instance
where a binary choice is needed, I do call iran1(0,1), meaning it
eventually does an iran%2, which is pretty much identical to iran&1.
Of course, I could instead do iran1(0,999)/500 to get 0 or 1.
That would be easy. Trying to come up with a valid test suite
would be harder than I care to contemplate. And if it reveals an
unwanted regularity in those ints, now what?...I have to go get
a whole different rng and start all over with it. Big pain.
But I will change that iran1(0,1). Thanks,
John Forkosh

JohnF
 01-13-2014
J. Clarke <(E-Mail Removed)> wrote:
> (E-Mail Removed) says...
>> <snip>
>> that might give rise to this kind of behavior. Presumably,
>> there's some subtle bug that I'm failing to see in the code,
>> and which the output isn't helping me to zero in on. Thanks,

>
> I'm no expert but one thing I learned <mumble> years ago was to make
> sure that the problem you're chasing really is the problem you _think_
> you're chasing. You've got three different versions of the compiler
> with two of them giving one behavior and the third, oldest one giving a
> different behavior, which you are attributing to 64 bit vs 32-bit. It
> could also be the result of some change made to the more recent releases
> of the compiler and I would want to rule that out rather than assuming
> that it's a 32- vs 64- bit issue.

Problem found and fixed, as per earlier followups.
Turned out to be slightly different float behavior.
But you could be right that it wasn't a 64-bit issue,
per se. And I'd tried cc -m32-bit, as per previous
followups, but compiler barfed at that switch (not
sure why, man cc wasn't on that box). So I couldn't
try to get a finer-grained understanding of problem.
John Forkosh

Ike Naar
 01-13-2014
On 2014-01-13, JohnF <(E-Mail Removed)> wrote:
> Ike Naar <(E-Mail Removed)> wrote:
>> On 2014-01-12, JohnF <(E-Mail Removed)> wrote:
>>> int iran1 ( int ilo, int ihi ) { /* you want int rn from ilo to ihi */
>>> long ran1(/*some args go here*/), /*original rng from Numerical Recipes*/
>>> iran = ran1(/*args*/), /* integer result from rng */
>>> range = ihi-ilo+1, /* ihi-ilo+1 */
>>> IM = 2147483647, /* ran1()'s actual range is 1...IM */

>>
>> Isn't ran1()'s actual range [1..2147483646] ?
>>
>>> imax = IM - (IM%range); /* force iran's max to a multiple of range */
>>> while ( iran >= imax ) iran=ran1(/*args*/); /*discard out-of-range iran*/
>>> return ( ilo + (iran%range) ); } /* back with random ilo <= i <= ihi */

> Yes, actual max is IM-1, which code accommodates with >= in while().
> I'll debug the comments later. But the whole bias problem solved by
> all this is miniscule when range<<IM, which is pretty much always
> the case. You can just comment out that while() and forget the whole
> thing.

There's still a bias:
The result from ran1() is in the range [1..2147483646]
Take, for example, [ilo..ihi] = [0..1],
then range = 2 and imax = 2147483646
After discarding out-of-range values of iran, we
end up with iran in the range [1..imax-1] = [1..2147483645].

There are 1073741822 numbers in that range that are 0 (mod 2),
the lowest number being 2, the highest number being 2147483644.
There are 1073741823 numbers in that range that are 1 (mod 2),
the lowest number being 1, the highest number being 2147483645.
So the outcome 0 has a smaller probability than the outcome 1.

JohnF
 01-13-2014
Ike Naar <(E-Mail Removed)> wrote:
> On 2014-01-13, JohnF <(E-Mail Removed)> wrote:
>> Ike Naar <(E-Mail Removed)> wrote:
>>> On 2014-01-12, JohnF <(E-Mail Removed)> wrote:
>>>> int iran1 ( int ilo, int ihi ) { /* you want int rn from ilo to ihi */
>>>> long ran1(/*some args go here*/), /*original rng from Numerical Recipes*/
>>>> iran = ran1(/*args*/), /* integer result from rng */
>>>> range = ihi-ilo+1, /* ihi-ilo+1 */
>>>> IM = 2147483647, /* ran1()'s actual range is 1...IM */
>>>
>>> Isn't ran1()'s actual range [1..2147483646] ?
>>>
>>>> imax = IM - (IM%range); /* force iran's max to a multiple of range */
>>>> while ( iran >= imax ) iran=ran1(/*args*/); /*discard out-of-range iran*/
>>>> return ( ilo + (iran%range) ); } /* back with random ilo <= i <= ihi */

>> Yes, actual max is IM-1, which code accommodates with >= in while().
>> I'll debug the comments later. But the whole bias problem solved by
>> all this is miniscule when range<<IM, which is pretty much always
>> the case. You can just comment out that while() and forget the whole
>> thing.

> There's still a bias:
> The result from ran1() is in the range [1..2147483646]
> Take, for example, [ilo..ihi] = [0..1],
> then range = 2 and imax = 2147483646
> After discarding out-of-range values of iran, we
> end up with iran in the range [1..imax-1] = [1..2147483645].
> There are 1073741822 numbers in that range that are 0 (mod 2),
> the lowest number being 2, the highest number being 2147483644.
> There are 1073741823 numbers in that range that are 1 (mod 2),
> the lowest number being 1, the highest number being 2147483645.
> So the outcome 0 has a smaller probability than the outcome 1.

Ah, yes. Shh, don't breathe a word to anybody,
but right now, as we speak, I'm submitting my patent
application for my new algorithm that takes an
integer odd number of items, and separates them
into two equal-sized piles.
Can you say "internet billionaire"?
John Forkosh

Keith Thompson
 01-13-2014
JohnF <(E-Mail Removed)> writes:
> For example, one thing I dislike about mswin (in addition to what
> you mention below) is that stdout isn't default binary mode, but
> outputs two chars, crlf, for your one \n. Code that cares, and which
> is intended to run on both win and unix, gets messy dealing with that.

stdout is a text stream in *all* C implementations.

The difference is in the way Windows and, say, UNIX represent
text files. In UNIX, the end of a line is indicated by a single
linefeed ('\n') character; in Windows, it's marked by a carriage
return followed by a linefeed ('\r', '\n').

For text streams, C translates a single newline character to the local
system's end-of-line representation on output, and vice versa on input.

The point of this is to make it *easier* to write portable code that
deals with text files. For example, you can write a single line to
stdout like this:

printf("Hello, world\n");

rather than:

if (running_on_windows) {
printf("Hello, world\r\n"); /* unnecessary */
}
else {
printf("Hello, world\n");
}

Things do become a bit more difficult if you have to deal with "foreign"
text files, but that's pretty much unavoidable.

And if you want to read and write binary files, just use a binary
stream; stdout isn't intended to deal with binary files.

Keith Thompson (The_Other_Keith)
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

JohnF
 01-14-2014
Keith Thompson <(E-Mail Removed)> wrote:
> JohnF <(E-Mail Removed)> writes:
>> For example, one thing I dislike about mswin (in addition to what
>> you mention below) is that stdout isn't default binary mode, but
>> outputs two chars, crlf, for your one \n. Code that cares, and which
>> is intended to run on both win and unix, gets messy dealing with that.

>
> stdout is a text stream in all C implementations.
> For text streams, C translates a single newline character to the local
> system's end-of-line representation on output, and vice versa on input.
> And if you want to read and write binary files, just use a binary
> stream; stdout isn't intended to deal with binary files.

Thanks for the info. Here's the problem that I've encountered.
Lots of my programs are cgi's that emit binary files, typically
gifs, used in html as, e.g.,
<img src="/cgi-bin/myprog.cgi?instructions and/or data for image">
In this case, myprog >>has to<<, as I understand it, emit to stdout.
Is that right? If so, I need to put stdout in "binary mode"
(that's what windows calls it, the typical win C command being
something like setmode(fileno(stdout),O_BINARY)).
Got a fix for, or insight into, dealing with that without
messy #ifdef stuff? Thanks,
John Forkosh