Velocity Reviews > More on random numbers... (Proposed correction on the FAQ)

# More on random numbers... (Proposed correction on the FAQ)

Army1987
Guest
Posts: n/a

 08-04-2007
If you just need to do something with probability 1/N, you could use

if(rand() < (RAND_MAX+1u) / N)

This has an obvious bias problem which is easily fixed (not
perfectly, but an improvement).

There are RAND_MAX + 1 values RAND_MAX can return. So we want to
pick (RAND_MAX + 1.0) / N of them. This is probably not an
integer. So we might want to round it to nearest integer. If we
are willing to use floating point, it turns out that if x is a
positive real number, there are ceil(x) non-negative integers such
as n < x. (For example, if x is 3 there are 3 such values: 0, 1,
and 2. If x is 3.2 there are 4 such values: 0, 1, 2 and 3.) Now
the nearest integer to x is ceil(x - 0.5). [1] So we might try to
do
if (rand() < (RAND_MAX + 1.0) / N - 0.5)
or, for an arbitrary fraction p, not necessarily 1 / N:
if (rand() < p * (RAND_MAX + 1.0) - 0.5)
Now, if we don't want to use floating point, getting back to the
case 1 / N, we can use
if (rand() < (RAND_MAX + 1u + N / 2) / N)

[1] The problem is when p * (RAND_MAX + 1.0) is exactly half an
odd integer. When p is 1 / N this isn't a great concern, because,
where RAND_MAX + 1.0 is a power of two, this is impossible, unless
N is 2 * RAND_MAX, which can be handled in a more obvious way as
if (rand() == 0 && rand() > RAND_MAX / 2).

--
Army1987 (Replace "NOSPAM" with "email")
"Never attribute to malice that which can be adequately explained
by stupidity." -- R. J. Hanlon (?)

Army1987
Guest
Posts: n/a

 08-04-2007
On Sat, 04 Aug 2007 11:57:46 +0200, Army1987 wrote:

> In the FAQ I read:
> If you just need to do something with probability 1/N, you could use
>
> if(rand() < (RAND_MAX+1u) / N)
>
> This has an obvious bias problem which is easily fixed (not
> perfectly, but an improvement).
>
> There are RAND_MAX + 1 values RAND_MAX can return. So we want to

Ok. I meant ...values rand() can return.

--
Army1987 (Replace "NOSPAM" with "email")
"Never attribute to malice that which can be adequately explained
by stupidity." -- R. J. Hanlon (?)

Eric Sosman
Guest
Posts: n/a

 08-04-2007
Army1987 wrote:
> In the FAQ I read:
> If you just need to do something with probability 1/N, you could use
>
> if(rand() < (RAND_MAX+1u) / N)

In fairness to the FAQ, it goes on to explain that this
technique is only suitable for N much smaller than RAND_MAX.

> This has an obvious bias problem which is easily fixed (not
> perfectly, but an improvement).
> [...]
> if (rand() < (RAND_MAX + 1u + N / 2) / N)

That is, rounding the quotient instead of truncating it.
Yes, that could give a slightly more accurate result. How
much more accurate? By less than one part in (RAND_MAX+1u),
which could be as much as 0.0000305+ for the smallest legal
RAND_MAX.

But rounding can still leave you with an error of up to
half that amount! If N is large enough that the truncation
error of 0.0031% is significant, the rounding error of 0.0015%
is probably *also* significant. Or to put it another way, if
N is large enough to make rounding attractive, it is also
large enough to make rounding ineffective; the improvement is
only of interest when it's not good enough. For N that large,
I think you should be using a rejection technique along the
lines of the one illustrated in the FAQ.

--
Eric Sosman
http://www.velocityreviews.com/forums/(E-Mail Removed)lid

Walter Roberson
Guest
Posts: n/a

 08-04-2007
In article <(E-Mail Removed)>,
Army1987 <(E-Mail Removed)> wrote:

>There are RAND_MAX + 1 values RAND_MAX can return.

RAND_MAX is the maximum value that can be returned, but there is
no certainty that every value from 0 to RAND_MAX is returnable.
In particular, 0 was historically not returnable in some implementations.

--
There are some ideas so wrong that only a very intelligent person
could believe in them. -- George Orwell

Army1987
Guest
Posts: n/a

 08-05-2007
On Sat, 04 Aug 2007 18:36:56 +0000, Walter Roberson wrote:
> RAND_MAX is the maximum value that can be returned, but there is
> no certainty that every value from 0 to RAND_MAX is returnable.
> In particular, 0 was historically not returnable in some implementations.

Is there any way this can be known at compile-time or preprocessing
time? I'd consider such an implementation to be broken [1], and I
think it had better have rand() return _internal_rand() - 1 and
#define RAND_MAX (_INTERNAL_RAND_MAX - 1).

[1] An objection that could be made is that the Standard says that
the range is 0 to RAND_MAX, but it doesn't say how bad they
can be, so nothing stops an implementation never returning 0 to
be nonconforming. But then, you could argue that not even an
implementation never returning 1 is nonconforming, not even one
never returning 2, and so on, until you could claim that
#define RAND_MAX 32767
int rand(void) { return RAND_MAX; }
void srand(unsigned int __seed) { return; }
is conforming.
--
Army1987 (Replace "NOSPAM" with "email")
"Never attribute to malice that which can be adequately explained
by stupidity." -- R. J. Hanlon (?)