[I've added comp.lang.c and set followups there since it is clear that

C people to check this over.]

Richard Heathfield <(E-Mail Removed)> writes:

> Ben Bacarisse said:

>

> <snip>

>

>> You warn the OP not to combine the first and second loops, but why not

>> combine the second and third? Having swapped the first x numbers, you

>> can stop -- that array prefix will not change.

>

> Good spot.

>

>> The OP does not even

>> like calling rand() 7 times, so it seems churlish to call it another

>> 38 times!

>

> Perhaps I misunderstood the OP, but I thought he was doing this:

>

> a) set count to 0

> b) pick a random number in the range 1 to 45

> c) if we haven't seen this number before

> d) store it and increment the counter

> e) end if

> f) unless count is now 7, go round again from (b).

>

> This has the potential to call rand() a great many more times than

> 7.
Yes, but the OP complained about making even 7 calls, that's all.

(Actually the expected number of calls with the above algorithm is

only a shade over 7.5, but that's not why I said 7).

>> There is also a danger in the almost canonical:

>>

>> int r = (n - i) * (rand() / (RAND_MAX + 1.0)) + i;

>> int t = arr[r];

>>

>> This is not portable in the comp.lang.c sense but is fine in most

>> practical situations. The problem is that if int is 64 bits, the

>> calculation can yield an r that is out of bounds.

>

> How? Remember that n is the number of balls in the lottery, so

> pragmatically speaking it will be a not-very-large number. The expression

> rand() / (RAND_MAX + 1.0) will give us a value in the range 0 to 1. So

> we've got a not-very-big number less a certainly-no-bigger number,

> multiplied by a value in the range 0 to 1 (making the number probably

> smaller and certainly no bigger), and then adding a not-very-big number. I

> am at a loss to see how this can give us an invalid value for r, and I

> await enlightenment.
The code relies on the floating point division never actually being 1

but on a system with a 64 bit int, RAND_MAX could be as large as a

typical implementation's LLONG_MAX. On my Intel x86 gcc system, this

program

#include <stdio.h>

#include <limits.h>

#define RAND_MAX LLONG_MAX

long long int rand(void) { return RAND_MAX; }

int main(void)

{

int r = 45 * (rand() / (RAND_MAX + 1.0));

printf("%d\n", r);

}

prints 45. It prints 45 for rand() values from RAND_MAX all the way

down to RAND_MAX - 0x1ff [1]. I don't think this is a gcc bug, though

I am not enough of a FP expert to know if this behaviour is

conforming.

Lowly double simply runs out of precision with very big integers, and

the division yields 1 exactly. Change the 1.0 to 1.0L and the problem

goes away. (I used 45, because that is the 'n' used in the OP's

lotto, but the problem has nothing to do with that number.)

For double precision floating point, you don't need integers anything

like this big. If plain int has INT_MAX == RAND_MAX ==

0x20000000000000 then, with the same double precision implementation,

I get 45 as output again but this time, of course, only in the one

case where rand() returns RAND_MAX.

Thinking it over, C90 is not safe. All this seems allowed on any C

implementation unless there is some nice guarantee about floating

point division that gcc is ignoring (and that I've missed).

>> You can also get

>> burnt if you use a system with poor double precision arithmetic,

>> though I am not 100% sure how bad the C standard allows it to get.

>

> Ten digits of precision ought to be enough for anybody.
But with 10 digits even lower RAND_MAX values give problems. 10

digits is very close to the precision that can show this behaviour

with plain 32 bit signed ints (though I can't be sure).

[1] Not many. In fact few enough that

int r;

do r = n * (rand() / (RAND_MAX + 1.0)); while (r == n);

will be safe and fast.

--

Ben.