Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C Programming > double precise enough for unsigned short?

Reply
Thread Tools

double precise enough for unsigned short?

 
 
Andy
Guest
Posts: n/a
 
      05-31-2010
Hello,

I will be storing a series of unsigned shorts (C language)
as doubles. I used gcc 3.2 to print out a bunch of
sizeof statements, and unsigned shorts are 4 bytes.
Doubles are 8 bytes, with the fractional part being 52 bits.
Everything seems to indicate that double has more than
enough precision to represent all possible values of
unsigned short. I'm coding with this assumption in mind.
Just to be sure, can anyone confirm this? My simplistic
reason for assuming this is that 32 bits of the fractional
part of the double can be made nonfractional by shifting
by 32. That's enough nonfractional bits to represent all
32 bits of an unsigned short. To indicate this 32-bit
shift, the exponent part of the double must be 6 bits. This
easily fits into the 11 bits for the exponent of a double.

Despite this justification, I just want to be sure that I'm
not missing something conceptually about the IEEE
number representation. I recall vaguely that the numbers
represented by double are not uniformly distributed, but
the above reasoning seems to indicate that they are good
enough for unsigned shorts.

Thanks.
 
Reply With Quote
 
 
 
 
SG
Guest
Posts: n/a
 
      05-31-2010
On 31 Mai, 23:12, Andy wrote:
>
> I will be storing a series of unsigned shorts (C language)
> as doubles. *I used gcc 3.2 to print out a bunch of
> sizeof statements, and unsigned shorts are 4 bytes.


Interesting. What platform is this?

> Doubles are 8 bytes, with the fractional part being 52 bits.
> Everything seems to indicate that double has more than
> enough precision to represent all possible values of
> unsigned short. *I'm coding with this assumption in mind.
> Just to be sure, can anyone confirm this?


There is (as far as I know) no upper bound on what values can be
represented with unsigned shorts. Typically shorts are 16 bits wide
but could be wider.

If I remember correctly, the C standard requires the macro DBL_DIG
(see float.h) to evaluate to an int of at least 10. So you have at
least 10 significant digits. For IEEE 754 doubles (52 bit mantissas)
you have even more precision. But 10 digits is enough to store 16 bit
integers losslessly. And it might be just enough to store 32 bit
integers losslessly (not 100% about that). But IEEE 754 doubles
certainly will do so.

> Despite this justification, I just want to be sure that I'm
> not missing something conceptually about the IEEE
> number representation.


No, I don't think that you missed something -- except maybe that short
is not restricted in size. If cases where short is 64 bits (or more)
wide an IEEE-754 double won't be able to losslessly hold all short
values.

Cheers!
SG
 
Reply With Quote
 
 
 
 
bart.c
Guest
Posts: n/a
 
      05-31-2010

"Andy" <(E-Mail Removed)> wrote in message
news:hu18ns$9ho$(E-Mail Removed)...
> Hello,
>
> I will be storing a series of unsigned shorts (C language)
> as doubles. I used gcc 3.2 to print out a bunch of
> sizeof statements, and unsigned shorts are 4 bytes.
> Doubles are 8 bytes, with the fractional part being 52 bits.
> Everything seems to indicate that double has more than
> enough precision to represent all possible values of
> unsigned short. I'm coding with this assumption in mind.
> Just to be sure, can anyone confirm this? My simplistic
> reason for assuming this is that 32 bits of the fractional
> part of the double can be made nonfractional by shifting
> by 32. That's enough nonfractional bits to represent all
> 32 bits of an unsigned short. To indicate this 32-bit
> shift, the exponent part of the double must be 6 bits. This
> easily fits into the 11 bits for the exponent of a double.
>
> Despite this justification, I just want to be sure that I'm
> not missing something conceptually about the IEEE
> number representation. I recall vaguely that the numbers
> represented by double are not uniformly distributed, but
> the above reasoning seems to indicate that they are good
> enough for unsigned shorts.


You might just try testing every possible value:

#include <stdio.h>
#include <limits.h>

int main(void){
unsigned short i,j;
double x;

i=0;
do {
x=i;
j=x;
if (i!=j)
printf("Can't represent %u as double\n",i);
++i;
} while (i!=0);
}

(Turn off optimisation)

--
Bartc

 
Reply With Quote
 
Peter Nilsson
Guest
Posts: n/a
 
      06-01-2010
Andy <(E-Mail Removed)> wrote:
> I will be storing a series of unsigned shorts (C language)
> as [IEEE] doubles.


Why?

--
Peter
 
Reply With Quote
 
bart.c
Guest
Posts: n/a
 
      06-01-2010

"Richard Heathfield" <(E-Mail Removed)> wrote in message
news:(E-Mail Removed)...
> In <hmWMn.34334$8g7.14401@hurricane>, bart.c wrote:


>> You might just try testing every possible value:
>> unsigned short i,j;
>> double x;
>>
>> i=0;
>> do {
>> x=i;
>> j=x;
>> if (i!=j)
>> printf("Can't represent %u as double\n",i);
>> ++i;
>> } while (i!=0);
>> }

>
> If there is genuine cause for concern, unsigned short is sufficiently
> large to make your program rather tedious to run.


16-bit shorts are instant. 32-bit ones took a few minutes on my computer,
but the test is only done once.

With 64-bits shorts, you already know they can't be represented.

> It would make more
> sense to binary-search. Start at USHRT_MAX, and play the max/min
> game.


You mean, to find the point in the range where it stops working? More
sensible then to just test 0 and USHRT_MAX then.

You need to test every possible value if there are any doubts being able to
represent each one (for example if the conversion to and from double is more
elaborate than simple assignment).

>> (Turn off optimisation)

>
> By all means turn off machine optimisation if you think it'll make a
> difference,


It's just to stop the compiler possibly using a register floating point
value (which on my machine has more precision than double).

> but there's no need to turn off design optimisation.


--
Bartc

 
Reply With Quote
 
SG
Guest
Posts: n/a
 
      06-01-2010
On 1 Jun., 11:33, bart.c wrote:
>
> You need to test every possible value if there are any doubts being able to
> represent each one (for example if the conversion to and from double is more
> elaborate than simple assignment).


I think it's safe to assume that the differences between consecutive
representable numbers is a monotonically increasing function. In that
case you only need to check USHRT_MAX and USHRT_MAX-1.

But something of the form

DBL_EPSILON * USHRT_MAX < some_threshold

for some value of some_threshold (between 0.4 and 1.0) is probably a
better way to test this.

> >> (Turn off optimisation)

>
> > By all means turn off machine optimisation if you think it'll make a
> > difference,

>
> It's just to stop the compiler possibly using a register floating point
> value (which on my machine has more precision than double).


Aren't there rules that prohibit a compiler from using such
intermetiate high precision representations beyond the evaluation of a
full expression? I'm not sure. But I think I read something like that
somewhere ... If so, it should suffice to store the double value in a
variable in one expression and read/use the variable's value in
another expression a la

double x = ...;
...x... // new expression

(?)


Cheers!
SG
 
Reply With Quote
 
Eric Sosman
Guest
Posts: n/a
 
      06-01-2010
On 6/1/2010 4:35 AM, Richard Heathfield wrote:
> In<hmWMn.34334$8g7.14401@hurricane>, bart.c wrote:
>
> <snip>
>>
>> You might just try testing every possible value:
>>
>> #include<stdio.h>
>> #include<limits.h>
>>
>> int main(void){
>> unsigned short i,j;
>> double x;
>>
>> i=0;
>> do {
>> x=i;
>> j=x;
>> if (i!=j)
>> printf("Can't represent %u as double\n",i);
>> ++i;
>> } while (i!=0);
>> }

>
> If there is genuine cause for concern, unsigned short is sufficiently
> large to make your program rather tedious to run. It would make more
> sense to binary-search. Start at USHRT_MAX, and play the max/min
> game.


Not sure how you'd apply binary search to this problem, since
there's no obvious "order:" each test tells you that a given value
does or does not garble when converted to double and back, and gives
no obvious clue about what happens to larger and smaller values.
Did you have some other order relation in mind?

Anyhow, I think the only value one need test is USHRT_MAX itself,
unless FLT_RADIX is something really, really weird. For the common
FLT_RADIX values of 2 and 16 (and even for the uncommon value 10),
USHRT_MAX will need at least as many non-zero fraction digits as
any other unsigned short value.

--
Eric Sosman
http://www.velocityreviews.com/forums/(E-Mail Removed)lid
 
Reply With Quote
 
bart.c
Guest
Posts: n/a
 
      06-01-2010

"Richard Heathfield" <(E-Mail Removed)> wrote in message
news:(E-Mail Removed)...
> bart.c wrote:
>>
>> "Richard Heathfield" <(E-Mail Removed)> wrote in message
>> news:(E-Mail Removed)...

>
> <snip>
>
>>> If there is genuine cause for concern, unsigned short is sufficiently
>>> large to make your program rather tedious to run.

>>
>> 16-bit shorts are instant. 32-bit ones took a few minutes on my computer,
>> but the test is only done once.
>>
>> With 64-bits shorts, you already know they can't be represented.

>
> Why? If an implementation has 64-bit short ints, maybe it also has 256-bit
> doubles.


I think a 64-bit short/64-bit double implementation would be far more
common. But given 64-bit short/256-bit double, then probably a different
approach is needed, or just more confidence that all 64-bit values are in
fact representable.

--
Bartc

 
Reply With Quote
 
Keith Thompson
Guest
Posts: n/a
 
      06-01-2010
Richard Heathfield <(E-Mail Removed)> writes:
> Eric Sosman wrote:
>> On 6/1/2010 4:35 AM, Richard Heathfield wrote:
>>> In<hmWMn.34334$8g7.14401@hurricane>, bart.c wrote:
>>>
>>> <snip>
>>>>
>>>> You might just try testing every possible value:
>>>>

[snip]
>>>
>>> If there is genuine cause for concern, unsigned short is sufficiently
>>> large to make your program rather tedious to run. It would make more
>>> sense to binary-search. Start at USHRT_MAX, and play the max/min
>>> game.

>>
>> Not sure how you'd apply binary search to this problem, since
>> there's no obvious "order:" each test tells you that a given value
>> does or does not garble when converted to double and back, and gives
>> no obvious clue about what happens to larger and smaller values.
>> Did you have some other order relation in mind?

>
> Oh, it was just a faulty (albeit probably fairly accurate!) assumption -
> that double will be able to represent integers precisely, up to a
> certain point, but not (reliably) beyond that point. If there is such a
> point for unsigned short ints, it would be nice to know what it is.


But since the non-representable values aren't consecutive, a simple
binary search might not find the smallest one.

It's probably safe to assume that if N-1 and N are both exactly
representable, then all values from 0 to N are exactly representable.
Given that assumption, you could do a binary search to find the
smallest non-representable value.

> (Having said that, I suspect that the range of contiguous integer values
> starting from 0 and exactly representable by double is likely to exceed
> USHRT_MAX on any "normal" implementation, so there is probably no such
> point.)


Well, I've used a system (Cray T90) where both short and double
were 64 bits. But short may have had padding bits; I never had a
chance to check.

--
Keith Thompson (The_Other_Keith) (E-Mail Removed) <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
 
Reply With Quote
 
Malcolm McLean
Guest
Posts: n/a
 
      06-01-2010
On Jun 1, 12:12*am, Andy <(E-Mail Removed)> wrote:
>
> I will be storing a series of unsigned shorts (C language)
> as doubles. *
>

Double is usually capable of representing every possible short, long
or int. However some very small compilers have very low-precision
floating point representation (this is not compliant), and some very
large systems have shorts with 64 bits and doubles with 64 bits (this
is compliant, but normally you would only use the lower two bytes of a
short).
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
(int) -> (unsigned) -> (int) or (unsigned) -> (int) -> (unsigned):I'll loose something? pozz C Programming 12 03-20-2011 11:32 PM
cannot convert parameter from 'double (double)' to 'double (__cdecl *)(double)' error Sydex C++ 12 02-17-2005 06:30 PM
Anything more precise than long double Moritz Beller C++ 2 09-19-2004 03:37 PM
Multiple precise: how to make 16bytes double ??? Auronc C++ 2 08-26-2004 12:05 PM
When (32-bit) double precision isn't precise enough DAVID SCHULMAN C Programming 5 09-12-2003 10:24 AM



Advertisments