Velocity Reviews > double cast to int reliable?

# double cast to int reliable?

Seebs
Guest
Posts: n/a

 06-03-2010
On 2010-06-03, Nick Keighley <(E-Mail Removed)> wrote:
> On 2 June, 22:06, Sjouke Burry <(E-Mail Removed)>
> wrote:
>> Yep. Rounding while converting do double will for most integers
>> mean that the double is slightly smaller then the int.
>> converting then to int, will not give you the original.

> really? Can you name an implementation where this is so? Is it a valid
> implementation of C?

The obvious case would be a machine where both int and double are 64-bit,
at which point, it's pretty obvious that for the vast majority of positive
integers, the conversion to double will at the very least change the
value, and I think I've seen it round down, so...

-s
--
Copyright 2010, all wrongs reversed. Peter Seebach / http://www.velocityreviews.com/forums/(E-Mail Removed)
http://www.seebs.net/log/ <-- lawsuits, religion, and funny pictures
http://en.wikipedia.org/wiki/Fair_Game_(Scientology) <-- get educated!

Keith Thompson
Guest
Posts: n/a

 06-03-2010
Seebs <(E-Mail Removed)> writes:
> On 2010-06-03, Nick Keighley <(E-Mail Removed)> wrote:
>> On 2 June, 22:06, Sjouke Burry <(E-Mail Removed)>
>> wrote:
>>> Yep. Rounding while converting do double will for most integers
>>> mean that the double is slightly smaller then the int.
>>> converting then to int, will not give you the original.

>
>> really? Can you name an implementation where this is so? Is it a valid
>> implementation of C?

>
> The obvious case would be a machine where both int and double are 64-bit,
> at which point, it's pretty obvious that for the vast majority of positive
> integers, the conversion to double will at the very least change the
> value, and I think I've seen it round down, so...

Round down or round to zero? If the latter, then it's not the case
that "most" integers yield a slightly smaller double when converted
(unless "smaller" means closer to zero). But yes, this is just
nitpicking.

The point is that the standard requires the conversion of an integer
to a floating-point type to yield an exact result when that result
can be represented (C99 6.3.1.4), and the floating-point model
imposed by C99 5.2.4.2.2 implies that a fairly wide range of integer
values must be exactly representable. That range might not cover
the full range of any integer type (even long double might not be
able to represent CHAR_MAX if CHAR_BIT is big enough).

In particular, converting the value 1 from int to double and back
to int is guaranteed to yield 1; if it doesn't, your implementation
is non-conforming.

There's a common idea that floating-point values can never be
anything more than approximations, and that no floating-point
operation is guaranteed to yield an exact result, but the reality
of it isn't that simple. It might be safer to *assume* that all
such operations are approximate but there are things you can get
away with if you know what you're doing. The trouble is that, even
if you know what you're doing, it can be very easy to accidentally
get outside the range in which the guarantees apply; you can use
double to represent exact integers, but there's no warning when you
exceed the range where that works.

--
Keith Thompson (The_Other_Keith) (E-Mail Removed) <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

Ersek, Laszlo
Guest
Posts: n/a

 06-03-2010
On Thu, 3 Jun 2010, Keith Thompson wrote:

> The trouble is that, even if you know what you're doing, it can be very
> easy to accidentally get outside the range in which the guarantees
> apply; you can use double to represent exact integers, but there's no
> warning when you exceed the range where that works.

For any unsigned type that has no more bits than 612,787,565,149,966; that
is, any conceivable unsigned type, the following is a sufficient condition
to store any value of said type in a "long double":

((long long unsigned)sizeof(utype) * CHAR_BIT * 30103 + 99999) / 100000
<= LDBL_DIG

For uint32_t, the left side evaluates to 10, and both DBL_DIG and LDBL_DIG
must be at least 10 on any conformant platform.

After the conversion to the chosen floating point type, eg. long double,
one must track the possible ranges in every floating point expression
involved, and make sure that any evaluation can't exceed "limit", which
can be initialized like this:

char lim_str[LDBL_DIG + 1] = "";
long double limit;

(void)sscanf(memset(lim_str, '9', LDBL_DIG), "%Lf", &limit);

(Of course not exceeding this bound may not be sufficient for converting
back to "utype", but since "(utype)-1" itself was convertible, this final
condition is only a simple comparison away.)

--o--

The number of full decimal digits needed to represent the C value
"(utype)-1" is given by the math expression

ceil(log_10(2 ** numbits - 1))

"numbits" being the number of value bits in "utype". It is safe to assume
(or rather, we have to assume) that all bits are value bits. Continuing
with math formulas, and exploiting log_10 being strictly monotonic and
ceil being monotonic,

ceil(log_10(2 ** numbits - 1))
<= ceil(log_10(2 ** numbits ))
== ceil(numbits * log_10(2))
<= ceil(numbits * (30103 / 100000))
== ceil(numbits * 30103 / 100000)

which equals the value of the math expression

floor( (numbits * 30103 + (100000 - 1)) / 100000 )

Therefore, this integer value is not less than the number of full decimal
digits needed. As "numbits" increases, this value becomes greater than the
exact number of decimal places required. The speed of divergence is
determined by the accuracy of 30103 / 100000 approximating log_10(2), but
I'm too lazy to try to calculate that relationship.

BTW, 30103 and 100000 are coprimes (30103 is a prime in its own right),
thus the smallest positive "numbits" where "numbits * 30103" is an
integral multiple of 100000 is 100000, which would still make for quite a
big integer type. Hence we can assume that the remainder of the modular
division "numbits * 30103 / 100000" is always nonzero, and the last
ceiling math expression could be rewritten as

floor(numbits * 30103 / 100000) + 1

This simplifies the initial C expression to

(long long unsigned)sizeof(utype) * CHAR_BIT * 30103 / 100000 < LDBL_DIG

Unfortunately, the entire approach falls on its face with uint64_t and an
extended precision (1 + 15 + 64 = 80 bits) "long double", even though the
significand has the required number of bits available. (As said above, the
condition is only sufficient, not necessary.)

The problem is that the method above works with entire base 10 digits. The
decimal representation of UINT64_MAX needs 20 places (19 full places and a
"fractional place", rounded up to 20), but the 64 bit significand only
provides for 19 whole decimal places, and the comparison is done in whole
decimal places. What's worse, an extended precision "long double" can only
allow for an LDBL_DIG of 18 (as my platform defines it), presumably
because (and I'm peeking at C99 5.2.4.2.2 p "long double" must
"accomodate" not only integers with LDBL_DIG decimal places, but also any
decimal fraction with LDBL_DIG digits. The exponent of the "long double"
stores the position of the *binary* point, not that of the *decimal*
point, and this probably sacrifices a further decimal digit.

(I gave you some material to shred, please be gentle while shredding.)

Cheers,
lacos

Keith Thompson
Guest
Posts: n/a

 06-03-2010
"Ersek, Laszlo" <(E-Mail Removed)> writes:
> On Thu, 3 Jun 2010, Keith Thompson wrote:
>
>> The trouble is that, even if you know what you're doing, it can be very
>> easy to accidentally get outside the range in which the guarantees
>> apply; you can use double to represent exact integers, but there's no
>> warning when you exceed the range where that works.

>
> For any unsigned type that has no more bits than 612,787,565,149,966; that
> is, any conceivable unsigned type, the following is a sufficient condition
> to store any value of said type in a "long double":
>
> ((long long unsigned)sizeof(utype) * CHAR_BIT * 30103 + 99999) / 100000
> <= LDBL_DIG

612,787,565,149,966 can be represented in 50 bits.
unsigned long long is at least 64 bits.

Inconceivable? "I do not think that word means what you think it means."

[snip]

--
Keith Thompson (The_Other_Keith) (E-Mail Removed) <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

Ersek, Laszlo
Guest
Posts: n/a

 06-03-2010
On Thu, 3 Jun 2010, Keith Thompson wrote:

> "Ersek, Laszlo" <(E-Mail Removed)> writes:
>> On Thu, 3 Jun 2010, Keith Thompson wrote:
>>
>>> The trouble is that, even if you know what you're doing, it can be very
>>> easy to accidentally get outside the range in which the guarantees
>>> apply; you can use double to represent exact integers, but there's no
>>> warning when you exceed the range where that works.

>>
>> For any unsigned type that has no more bits than 612,787,565,149,966; that
>> is, any conceivable unsigned type, the following is a sufficient condition
>> to store any value of said type in a "long double":
>>
>> ((long long unsigned)sizeof(utype) * CHAR_BIT * 30103 + 99999) / 100000
>> <= LDBL_DIG

>
> 612,787,565,149,966 can be represented in 50 bits.
> unsigned long long is at least 64 bits.
>
> Inconceivable? "I do not think that word means what you think it means."

I believe I wasn't formulating my point carefully enough. Verbatim quote,

>> For any unsigned type that has no more *bits* than 612,787,565,149,966;

The range of such an unsigned type would be

[0 .. 2 ** 612,787,565,149,966 - 1].

The limit is not arbitrary, it is (for the smallest allowed ULLONG_MAX):

(ULLONG_MAX - 99999) / 30103

expressed in C. "unsigned long long" doesn't need to cover the range of
the type in question, it must be able to represent the *number of bits* in
it.

Cheers,
lacos

Ben Bacarisse
Guest
Posts: n/a

 06-03-2010
Keith Thompson <(E-Mail Removed)> writes:

> "Ersek, Laszlo" <(E-Mail Removed)> writes:
>> On Thu, 3 Jun 2010, Keith Thompson wrote:
>>
>>> The trouble is that, even if you know what you're doing, it can be very
>>> easy to accidentally get outside the range in which the guarantees
>>> apply; you can use double to represent exact integers, but there's no
>>> warning when you exceed the range where that works.

>>
>> For any unsigned type that has no more bits than 612,787,565,149,966; that
>> is, any conceivable unsigned type, the following is a sufficient condition
>> to store any value of said type in a "long double":
>>
>> ((long long unsigned)sizeof(utype) * CHAR_BIT * 30103 + 99999) / 100000
>> <= LDBL_DIG

>
> 612,787,565,149,966 can be represented in 50 bits.
> unsigned long long is at least 64 bits.
>
> Inconceivable? "I do not think that word means what you think it
> means."

I'm pretty sure it's a word order confusion. I think he intended "any
unsigned type that has no more than 612,787,565,149,966 bits". That's
the maximum number of bits that won't cause the quoted expression to
fail. I.e. for more than that number of bits, long long unsigned is not
guaranteed to be able to represent the result.

Some people might still conceive of such types, but the term is not
nearly so outlandish in that context.

--
Ben.

Ersek, Laszlo
Guest
Posts: n/a

 06-03-2010
On Thu, 3 Jun 2010, Ben Bacarisse wrote:

> Keith Thompson <(E-Mail Removed)> writes:
>
>> "Ersek, Laszlo" <(E-Mail Removed)> writes:
>>> On Thu, 3 Jun 2010, Keith Thompson wrote:
>>>
>>>> The trouble is that, even if you know what you're doing, it can be very
>>>> easy to accidentally get outside the range in which the guarantees
>>>> apply; you can use double to represent exact integers, but there's no
>>>> warning when you exceed the range where that works.
>>>
>>> For any unsigned type that has no more bits than 612,787,565,149,966; that
>>> is, any conceivable unsigned type, the following is a sufficient condition
>>> to store any value of said type in a "long double":
>>>
>>> ((long long unsigned)sizeof(utype) * CHAR_BIT * 30103 + 99999) / 100000
>>> <= LDBL_DIG

>>
>> 612,787,565,149,966 can be represented in 50 bits.
>> unsigned long long is at least 64 bits.
>>
>> Inconceivable? "I do not think that word means what you think it
>> means."

>
> I'm pretty sure it's a word order confusion. I think he intended "any
> unsigned type that has no more than 612,787,565,149,966 bits".

Yes, thank you. I guess 18 hours of sleep accumulated over four nights is
not too much.

(I don't need decaf, it's my DSPS [0] that doesn't cooperate with the
"strictly scheduled" training of this week. It's 01:04 AM in local time,
again.)

Cheers,
lacos

[0] http://en.wikipedia.org/wiki/Delayed...phase_syndrome

Seebs
Guest
Posts: n/a

 06-04-2010
On 2010-06-03, Keith Thompson <(E-Mail Removed)> wrote:
>> The obvious case would be a machine where both int and double are 64-bit,
>> at which point, it's pretty obvious that for the vast majority of positive
>> integers, the conversion to double will at the very least change the
>> value, and I think I've seen it round down, so...

> Round down or round to zero? If the latter, then it's not the case
> that "most" integers yield a slightly smaller double when converted
> (unless "smaller" means closer to zero). But yes, this is just
> nitpicking.

Which is why I put "positive" in there.

> The point is that the standard requires the conversion of an integer
> to a floating-point type to yield an exact result when that result
> can be represented (C99 6.3.1.4), and the floating-point model
> imposed by C99 5.2.4.2.2 implies that a fairly wide range of integer
> values must be exactly representable. That range might not cover
> the full range of any integer type (even long double might not be
> able to represent CHAR_MAX if CHAR_BIT is big enough).

Right.

But the obvious case would be 64-bit int and 64-bit double. Look at it
this way. Assume a typical mantissa/exponent system. Assume that there
are 58 bits of mantissa. There's 58 bits of numbers that can be represented
exactly, you can represent half of the numbers in the 59-bit range, 1/4 of
the numbers in the 60-bit range... And it turns out that this means that,
of the 63-bit range of int, a very small number of values (rough order of
1/16?) can be represented exactly in a double.

Now, as it happens, 99% of the numbers I've ever used in a C program are
in that range.

> The trouble is that, even
> if you know what you're doing, it can be very easy to accidentally
> get outside the range in which the guarantees apply; you can use
> double to represent exact integers, but there's no warning when you
> exceed the range where that works.

Yes.

For plain float, on the systems I've tried, the boundary seems to be about
2^24; 2^24+1 cannot be represented exactly in a 32-bit float. I wouldn't
be surprised to find that double came out somewhere near 2^48+1 as the first
positive integer value that couldn't be represented.

-s
--
Copyright 2010, all wrongs reversed. Peter Seebach / (E-Mail Removed)
http://www.seebs.net/log/ <-- lawsuits, religion, and funny pictures
http://en.wikipedia.org/wiki/Fair_Game_(Scientology) <-- get educated!

Keith Thompson
Guest
Posts: n/a

 06-04-2010
"Ersek, Laszlo" <(E-Mail Removed)> writes:
> On Thu, 3 Jun 2010, Keith Thompson wrote:
>
>> "Ersek, Laszlo" <(E-Mail Removed)> writes:
>>> On Thu, 3 Jun 2010, Keith Thompson wrote:
>>>
>>>> The trouble is that, even if you know what you're doing, it can be very
>>>> easy to accidentally get outside the range in which the guarantees
>>>> apply; you can use double to represent exact integers, but there's no
>>>> warning when you exceed the range where that works.
>>>
>>> For any unsigned type that has no more bits than 612,787,565,149,966; that
>>> is, any conceivable unsigned type, the following is a sufficient condition
>>> to store any value of said type in a "long double":
>>>
>>> ((long long unsigned)sizeof(utype) * CHAR_BIT * 30103 + 99999) / 100000
>>> <= LDBL_DIG

>>
>> 612,787,565,149,966 can be represented in 50 bits.
>> unsigned long long is at least 64 bits.
>>
>> Inconceivable? "I do not think that word means what you think it means."

>
> I believe I wasn't formulating my point carefully enough. Verbatim
> quote, with emphasis added:
>
>>> For any unsigned type that has no more *bits* than 612,787,565,149,966;

Ok, I see what you mean. ("no more than ... bits" would have been
clearer.)

> The range of such an unsigned type would be
>
> [0 .. 2 ** 612,787,565,149,966 - 1].
>
> The limit is not arbitrary, it is (for the smallest allowed ULLONG_MAX):
>
> (ULLONG_MAX - 99999) / 30103
>
> expressed in C. "unsigned long long" doesn't need to cover the range
> of the type in question, it must be able to represent the *number of
> bits* in it.

And the formula doesn't say "yes" for smaller types and "no" for
bigger ones; it breaks down for really huge types, right?

When I have time, I'll have to go back and re-read what you wrote.

--
Keith Thompson (The_Other_Keith) (E-Mail Removed) <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

Keith Thompson
Guest
Posts: n/a

 06-04-2010
Seebs <(E-Mail Removed)> writes:
> On 2010-06-03, Keith Thompson <(E-Mail Removed)> wrote:
>>> The obvious case would be a machine where both int and double are 64-bit,
>>> at which point, it's pretty obvious that for the vast majority of positive
>>> integers, the conversion to double will at the very least change the
>>> value, and I think I've seen it round down, so...

>
>> Round down or round to zero? If the latter, then it's not the case
>> that "most" integers yield a slightly smaller double when converted
>> (unless "smaller" means closer to zero). But yes, this is just
>> nitpicking.

>
> Which is why I put "positive" in there.

Which I very cleverly failed to notice. *sigh*

>> The point is that the standard requires the conversion of an integer
>> to a floating-point type to yield an exact result when that result
>> can be represented (C99 6.3.1.4), and the floating-point model
>> imposed by C99 5.2.4.2.2 implies that a fairly wide range of integer
>> values must be exactly representable. That range might not cover
>> the full range of any integer type (even long double might not be
>> able to represent CHAR_MAX if CHAR_BIT is big enough).

>
> Right.
>
> But the obvious case would be 64-bit int and 64-bit double. Look at it
> this way. Assume a typical mantissa/exponent system. Assume that there
> are 58 bits of mantissa. There's 58 bits of numbers that can be represented
> exactly, you can represent half of the numbers in the 59-bit range, 1/4 of
> the numbers in the 60-bit range... And it turns out that this means that,
> of the 63-bit range of int, a very small number of values (rough order of
> 1/16?) can be represented exactly in a double.

Yes, that's the obvious case. My point, which I didn't express very
clearly, is that it's possible that *every* integer type has values
that can't be exactly represented in *any* floating-point type.
I know of no such systems in real life, but a system where everything
from char to long long and from float to long double is exactly 64
bits is certainly plausible (the Cray T90 I keep bringing up made
char 8 bits only for compatibility with other Unix-like systems; all
other arithmetic types were 64 bits).

An implementation could have integer values don't just lose precision
but *overflow* when converted to a floating-point type. On my
system, FLT_MAX is slightly less than 2**128, so (float)UINT128_MAX
would overflow if uint128_t existed.

> Now, as it happens, 99% of the numbers I've ever used in a C program are
> in that range.

You counted? }

>> The trouble is that, even
>> if you know what you're doing, it can be very easy to accidentally
>> get outside the range in which the guarantees apply; you can use
>> double to represent exact integers, but there's no warning when you
>> exceed the range where that works.

>
> Yes.
>
> For plain float, on the systems I've tried, the boundary seems to be about
> 2^24; 2^24+1 cannot be represented exactly in a 32-bit float. I wouldn't
> be surprised to find that double came out somewhere near 2^48+1 as the first
> positive integer value that couldn't be represented.

It's more likely to be 2^53-1, assuming IEEE floating-point; look at the
values of FLT_MANT_DIG and DBL_MANT_DIG.

--
Keith Thompson (The_Other_Keith) (E-Mail Removed) <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"