In article <(E-Mail Removed)>

Fred Ma <(E-Mail Removed)> writes:

>I'm using the expression "int a = ceil( SomeDouble )". The man

>page says that ceil returns the smallest integer that is not less

>than SomeDouble, represented as a double. However, my understanding

>is that a double has nonuniform precision throughout its value range.
This is correct (well, I can imagine a weird implementation that

deliberately makes "double"s have constant precision by often

wasting a lot of space; it seems quite unlikely though).

Note that ceil() returns a double, not an int.

>Will a double always be able to exactly represent any value of

>type int?
This is implementation-dependent. If "double" is not very precise

but INT_MAX is very large, it is possible that not all "int"s can

be represented. This is one reason ceil() returns a double (though

a small one at best -- the main reason is so that ceil(1.6e35) can

still be 1.6e35, for instance).

>Could someone please point me to an explanation of how this is ensured,

>given that the details of a type realization varies with the platform?
I am not sure what you mean by "this", especially with the PS:

>P.S. I am not worried about overflowing the int value

>range, just about the guaranteed precise representation

>of int by double.
.... but let me suppose you are thinking of a case that actually occurs

if we substitute "float" for "double" on most of today's implementations.

Here, we get "interesting" effects near 8388608.0 and 16777216.0.

Values below 16777216.0 step by ones: 8388608.0 is followed

immediately by 8388609.0, for instance, and 16777215.0 is followed

immediately by 16777216.0. On the other hand, below (float)(1<<23)

or above (float)(1<<24), we step by 1/2 or 2 respectively. Using

nextafterf() (if you have it) and variables set to the right values,

you might printf() some results and find:

nextafterf(8388608.0, -inf) = 8388607.5

nextafterf(16777216.0, +inf) = 16777216.2

So all ceil() has to do with values that are at least 8388608.0

(in magnitude) is return those values -- they are already integers.

It is only values *below* this area that can have fractional

parts.

Of course, when we use actual "double"s on today's real (IEEE style)

implementations, the tricky point is not 2-sup-23 but rather

2-sup-52. The same principal applies, though: values that meet or

exceed some magic constant (in either positive or negative direction)

are always integral, because they have multiplied away all their

fraction bits by their corresponding power of two. Since 2-sup-23 +

2-sup-22 + ... + 2-sup-0 is a sum of integers, it must itself be

an integer. Only if the final terms of the sum involve negative

powers of two can it contain fractions.

The other "this" you might be wondering about is: how do you

drop off the fractional bits? *That* one depends (for efficiency

reasons) on the CPU. The two easy ways are bit-twiddling, and

doing addition followed by subtraction. In both cases, we just

want to zero out any mantissa (fraction) bits that represent

negative powers of two. The bit-twiddling method does it with

the direct and obvious way: mask them out. The add-and-subtract

method uses the normalization hardware to knock them out. If

normalization is slow (e.g., done in software or with a microcode

loop), the bit-twiddling method is generally faster.

--

In-Real-Life: Chris Torek, Wind River Systems

Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603

email: forget about it

http://web.torek.net/torek/index.html
Reading email is like searching for food in the garbage, thanks to spammers.