Velocity Reviews > Re: Lotto 7/45

# Re: Lotto 7/45

Ben Bacarisse
Guest
Posts: n/a

 08-07-2008

[I've added comp.lang.c and set followups there since it is clear that
C people to check this over.]

Richard Heathfield <(E-Mail Removed)> writes:

> Ben Bacarisse said:
>
> <snip>
>
>> You warn the OP not to combine the first and second loops, but why not
>> combine the second and third? Having swapped the first x numbers, you
>> can stop -- that array prefix will not change.

>
> Good spot.
>
>> The OP does not even
>> like calling rand() 7 times, so it seems churlish to call it another
>> 38 times!

>
> Perhaps I misunderstood the OP, but I thought he was doing this:
>
> a) set count to 0
> b) pick a random number in the range 1 to 45
> c) if we haven't seen this number before
> d) store it and increment the counter
> e) end if
> f) unless count is now 7, go round again from (b).
>
> This has the potential to call rand() a great many more times than
> 7.

Yes, but the OP complained about making even 7 calls, that's all.
(Actually the expected number of calls with the above algorithm is
only a shade over 7.5, but that's not why I said 7).

>> There is also a danger in the almost canonical:
>>
>> int r = (n - i) * (rand() / (RAND_MAX + 1.0)) + i;
>> int t = arr[r];
>>
>> This is not portable in the comp.lang.c sense but is fine in most
>> practical situations. The problem is that if int is 64 bits, the
>> calculation can yield an r that is out of bounds.

>
> How? Remember that n is the number of balls in the lottery, so
> pragmatically speaking it will be a not-very-large number. The expression
> rand() / (RAND_MAX + 1.0) will give us a value in the range 0 to 1. So
> we've got a not-very-big number less a certainly-no-bigger number,
> multiplied by a value in the range 0 to 1 (making the number probably
> smaller and certainly no bigger), and then adding a not-very-big number. I
> am at a loss to see how this can give us an invalid value for r, and I
> await enlightenment.

The code relies on the floating point division never actually being 1
but on a system with a 64 bit int, RAND_MAX could be as large as a
typical implementation's LLONG_MAX. On my Intel x86 gcc system, this
program

#include <stdio.h>
#include <limits.h>

#define RAND_MAX LLONG_MAX

long long int rand(void) { return RAND_MAX; }

int main(void)
{
int r = 45 * (rand() / (RAND_MAX + 1.0));
printf("%d\n", r);
}

prints 45. It prints 45 for rand() values from RAND_MAX all the way
down to RAND_MAX - 0x1ff [1]. I don't think this is a gcc bug, though
I am not enough of a FP expert to know if this behaviour is
conforming.

Lowly double simply runs out of precision with very big integers, and
the division yields 1 exactly. Change the 1.0 to 1.0L and the problem
goes away. (I used 45, because that is the 'n' used in the OP's
lotto, but the problem has nothing to do with that number.)

For double precision floating point, you don't need integers anything
like this big. If plain int has INT_MAX == RAND_MAX ==
0x20000000000000 then, with the same double precision implementation,
I get 45 as output again but this time, of course, only in the one
case where rand() returns RAND_MAX.

Thinking it over, C90 is not safe. All this seems allowed on any C
implementation unless there is some nice guarantee about floating
point division that gcc is ignoring (and that I've missed).

>> You can also get
>> burnt if you use a system with poor double precision arithmetic,
>> though I am not 100% sure how bad the C standard allows it to get.

>
> Ten digits of precision ought to be enough for anybody.

But with 10 digits even lower RAND_MAX values give problems. 10
digits is very close to the precision that can show this behaviour
with plain 32 bit signed ints (though I can't be sure).

[1] Not many. In fact few enough that

int r;
do r = n * (rand() / (RAND_MAX + 1.0)); while (r == n);

will be safe and fast.

--
Ben.

Old Wolf
Guest
Posts: n/a

 08-08-2008
On Aug 7, 2:52 pm, Ben Bacarisse <(E-Mail Removed)> wrote:
> #include <stdio.h>
> #include <limits.h>
>
> #define RAND_MAX LLONG_MAX
>
> long long int rand(void) { return RAND_MAX; }
>
> int main(void)
> {
> int r = 45 * (rand() / (RAND_MAX + 1.0));
> printf("%d\n", r);
> }
>
> prints 45.

Isn't it undefined behaviour to define your own
functions with external linkage and with the same
name as standard library functions (even if the
associated header is not included) ?

Ben Bacarisse
Guest
Posts: n/a

 08-08-2008
Old Wolf <(E-Mail Removed)> writes:

> On Aug 7, 2:52 pm, Ben Bacarisse <(E-Mail Removed)> wrote:
>> #include <stdio.h>
>> #include <limits.h>
>>
>> #define RAND_MAX LLONG_MAX
>>
>> long long int rand(void) { return RAND_MAX; }
>>
>> int main(void)
>> {
>> int r = 45 * (rand() / (RAND_MAX + 1.0));
>> printf("%d\n", r);
>> }
>>
>> prints 45.

>
> Isn't it undefined behaviour to define your own
> functions with external linkage and with the same
> name as standard library functions (even if the
> associated header is not included) ?

Yes. The post was about what rand() might do so I allowed myself to
keep the name. Imagine the #define and definition of rand() in a
comment and #include <stdlib.h> instead if it helps.

The only interesting point being that I think the frequently seen
expression:

int r = max * (rand() / (RAND_MAX + 1.0));

can leave r == max if int is big enough or the floating point
arithmetic imprecise enough.

--
Ben.