Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C Programming > Converting Floats to Strings yields erratic results

Reply
Thread Tools

Converting Floats to Strings yields erratic results

 
 
BartC
Guest
Posts: n/a
 
      06-09-2011
"BartC" <(E-Mail Removed)> wrote in message news:isoqbd$i94$(E-Mail Removed)...

> Code to convert binary floating point to decimal is very tricky to write.
> I've just had a go, and while the following sort of works, it's probably
> full of numerical problems too.


My code introduces errors in the lower bits of the number. I'm trying to
compare the results to printf(), but if I write:

printf("%f\n",1.0e40);

I get an impossibly clean result of 1 followed by loads of zeros, no matter
what numbers I stick inside %f. Clearly 1e40 can't be stored exactly (can
it? I thought it would need over 120 bits of precision), so how can I tell
printf to show it as it is?

(If I print 1e50, then I've noticed there is a '1' amongst the zeros; that
can't be right either..)

--
bartc

 
Reply With Quote
 
 
 
 
James Kuyper
Guest
Posts: n/a
 
      06-09-2011
On 06/09/2011 06:03 AM, BartC wrote:
> "BartC" <(E-Mail Removed)> wrote in message news:isoqbd$i94$(E-Mail Removed)...
>
>> Code to convert binary floating point to decimal is very tricky to write.
>> I've just had a go, and while the following sort of works, it's probably
>> full of numerical problems too.

>
> My code introduces errors in the lower bits of the number. I'm trying to
> compare the results to printf(), but if I write:
>
> printf("%f\n",1.0e40);
>
> I get an impossibly clean result of 1 followed by loads of zeros, no matter
> what numbers I stick inside %f. Clearly 1e40 can't be stored exactly (can
> it? I thought it would need over 120 bits of precision), so how can I tell
> printf to show it as it is?
>
> (If I print 1e50, then I've noticed there is a '1' amongst the zeros; that
> can't be right either..)


1e40 is 2^40*5^40. 2^40 is just absorbed into the exponent. 5^40
requires only 93 bits to be represented exactly, not 120. It's therefore
possible to store 1e40 exactly in a 128-bit type. I've heard of
implementations with 128-bit long double, but not 128-bit double.

With respect to %f, the standard says "The value is rounded to
the appropriate number of digits." (7.19.6.1p. I think that this is a
little unclear, but it could be used to justify producing all zeros
after the last significant digit of the output string. That's what I've
usually seen real implementations do.
--
James Kuyper
 
Reply With Quote
 
 
 
 
Noob
Guest
Posts: n/a
 
      06-09-2011
BartC wrote:

> Clearly 1e40 can't be stored exactly (can it?
> I thought it would need over 120 bits of precision)


1e40 = 10^40
= 5^40 * 2^40
= 0x1D6329F1C35CA4BFABB9F561 * 2^40

5^40 requires 93 bits to store precisely.

Thus, a double-precision IEEE 754 floating-point number
cannot store 1e40 exactly, as the mantissa is expressed
over (only) 52 bits.

http://en.wikipedia.org/wiki/Double_...g-point_format

0x1.D6329F1C35CA4 * 2^132 < 1e40 < 0x1.D6329F1C35CA5 * 2^132

i.e. respectively
9999999999999999094860208812374492184576
10000000000000000000000000000000000000000
10000000000000000303786028427003666890752

The exponent (132) is stored as 132+1023 = 1155 = 0x483

And, in fact, the following program prints 483d6329f1c35ca3
(Note the rounding error).

#include <stdio.h>
#include <string.h>
int main(void)
{
int i;
double d = 1.0;
for (i=0; i < 40; ++i) d *= 10.0;
unsigned char buf[8];
memcpy(buf, &d, sizeof buf);
for (i=0; i < 8; ++i) printf("%02x", buf[7-i]);
putchar('\n');
return 0;
}

cf. also
What Every Computer Scientist Should Know About Floating-Point Arithmetic
http://download.oracle.com/docs/cd/E..._goldberg.html

Regards.
 
Reply With Quote
 
BartC
Guest
Posts: n/a
 
      06-09-2011
"James Kuyper" <(E-Mail Removed)> wrote in message
news:isqcvb$9i7$(E-Mail Removed)...
> On 06/09/2011 06:03 AM, BartC wrote:
>> "BartC" <(E-Mail Removed)> wrote in message
>> news:isoqbd$i94$(E-Mail Removed)...


>> My code introduces errors in the lower bits of the number. I'm trying to
>> compare the results to printf(), but if I write:
>>
>> printf("%f\n",1.0e40);
>>
>> I get an impossibly clean result of 1 followed by loads of zeros, no
>> matter
>> what numbers I stick inside %f. Clearly 1e40 can't be stored exactly (can
>> it? I thought it would need over 120 bits of precision), so how can I
>> tell
>> printf to show it as it is?
>>
>> (If I print 1e50, then I've noticed there is a '1' amongst the zeros;
>> that
>> can't be right either..)

>
> 1e40 is 2^40*5^40. 2^40 is just absorbed into the exponent. 5^40
> requires only 93 bits to be represented exactly, not 120. It's therefore
> possible to store 1e40 exactly in a 128-bit type. I've heard of
> implementations with 128-bit long double, but not 128-bit double.
>
> With respect to %f, the standard says "The value is rounded to
> the appropriate number of digits." (7.19.6.1p. I think that this is a
> little unclear, but it could be used to justify producing all zeros
> after the last significant digit of the output string. That's what I've
> usually seen real implementations do.


OK. The problem is that with a double set to 1e40, printf gives:

10000000000000000000000000000000000000000.000000

But using a bigint library to calculate the actual value stored in the bit
pattern, it gives:

10000000000000000303786028427003666890752

(although I might have expected a result just below 1e40), while the simple
code I posted earlier gives this result:

10000000000000004440892098500626161694526

So is not as bad as I thought! However my point was I couldn't figure how to
get printf to give me the true value (in decimal; I can't do anything with
the hex).

--
Bartc

 
Reply With Quote
 
Keith Thompson
Guest
Posts: n/a
 
      06-09-2011
"BartC" <(E-Mail Removed)> writes:
> "BartC" <(E-Mail Removed)> wrote in message news:isoqbd$i94$(E-Mail Removed)...
>> Code to convert binary floating point to decimal is very tricky to write.
>> I've just had a go, and while the following sort of works, it's probably
>> full of numerical problems too.

>
> My code introduces errors in the lower bits of the number. I'm trying to
> compare the results to printf(), but if I write:
>
> printf("%f\n",1.0e40);
>
> I get an impossibly clean result of 1 followed by loads of zeros, no matter
> what numbers I stick inside %f. Clearly 1e40 can't be stored exactly (can
> it? I thought it would need over 120 bits of precision), so how can I tell
> printf to show it as it is?
>
> (If I print 1e50, then I've noticed there is a '1' amongst the zeros; that
> can't be right either..)


What implementation are you using?

This program:

#include <stdio.h>
int main(void)
{
printf("%f\n", 1e40);
printf("%f\n", 1e41);
printf("%f\n", 1e42);
printf("%f\n", 1e43);
printf("%f\n", 1e44);
printf("%f\n", 1e45);
printf("%f\n", 1e46);
printf("%f\n", 1e47);
printf("%f\n", 1e4;
printf("%f\n", 1e49);
printf("%f\n", 1e50);
return 0;
}

gives me this output:

10000000000000000303786028427003666890752.000000
100000000000000000620008645040778319495168.000000
1000000000000000044885712678075916785549312.000000
10000000000000000139372116959414099130712064.00000 0
100000000000000008821361405306422640701865984.0000 00
999999999999999929757289024535551219930759168.0000 00
9999999999999999931398190359470212947659194368.000 000
100000000000000004384584304507619735463404765184.0 00000
1000000000000000043845843045076197354634047651840. 000000
9999999999999999464902769475481793196872414789632. 000000
10000000000000000762976984109188700329496497094656 0.000000

both on Linux/x86 with gcc and glibc, and on Solaris/SPARC.

(The trailing ".000000" makes sense; for numbers that large, the
only representable values are exact integers.)

If your implementation supports it, you can see the exact values
using "%a" or "%A":

0x1.d6329f1c35ca5p+132
0x1.25dfa371a19e7p+136
0x1.6f578c4e0a061p+139
0x1.cb2d6f618c879p+142
0x1.1efc659cf7d4cp+146
0x1.66bb7f0435c9ep+149
0x1.c06a5ec5433c6p+152
0x1.18427b3b4a05cp+156
0x1.5e531a0a1c873p+159
0x1.b5e7e08ca3a8fp+162
0x1.11b0ec57e649ap+166

--
Keith Thompson (The_Other_Keith) http://www.velocityreviews.com/forums/(E-Mail Removed) <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
 
Reply With Quote
 
BartC
Guest
Posts: n/a
 
      06-09-2011


"Keith Thompson" <(E-Mail Removed)> wrote in message
news:(E-Mail Removed)...
> "BartC" <(E-Mail Removed)> writes:


>> printf("%f\n",1.0e40);
>>
>> I get an impossibly clean result of 1 followed by loads of zeros, no
>> matter
>> what numbers I stick inside %f.


> What implementation are you using?
>
> This program:
>
> #include <stdio.h>
> int main(void)
> {
> printf("%f\n", 1e40);
> printf("%f\n", 1e41);
> printf("%f\n", 1e42);
> printf("%f\n", 1e43);
> printf("%f\n", 1e44);
> printf("%f\n", 1e45);
> printf("%f\n", 1e46);
> printf("%f\n", 1e47);
> printf("%f\n", 1e4;
> printf("%f\n", 1e49);
> printf("%f\n", 1e50);
> return 0;
> }
>
> gives me this output:
>
> 10000000000000000303786028427003666890752.000000


(That matches the result in my other post; a good sign..)

> 100000000000000000620008645040778319495168.000000
> 1000000000000000044885712678075916785549312.000000
> 10000000000000000139372116959414099130712064.00000 0
> 100000000000000008821361405306422640701865984.0000 00
> 999999999999999929757289024535551219930759168.0000 00
> 9999999999999999931398190359470212947659194368.000 000
> 100000000000000004384584304507619735463404765184.0 00000
> 1000000000000000043845843045076197354634047651840. 000000
> 9999999999999999464902769475481793196872414789632. 000000
> 10000000000000000762976984109188700329496497094656 0.000000
>
> both on Linux/x86 with gcc and glibc, and on Solaris/SPARC.


....

lcc-win32 and gcc (all on x86-32) both give me:

10000000000000000000000000000000000000000.000000
100000000000000000000000000000000000000000.000000
1000000000000000000000000000000000000000000.000000
10000000000000000000000000000000000000000000.00000 0
100000000000000010000000000000000000000000000.0000 00
999999999999999930000000000000000000000000000.0000 00
.....

Using DMC, I do get the 303 in the middle of 1e40 for example, but the rest
is still zeros:

10000000000000000303000000000000000000000.000000

Odd. Perhaps some compiler switch is needed?

> (The trailing ".000000" makes sense; for numbers that large, the
> only representable values are exact integers.)


I think it's because you start with an integer (of 52/53 bits for example),
then end up multiplying by a power of two, at least for these bigger
numbers.

> If your implementation supports it, you can see the exact values
> using "%a" or "%A":
>
> 0x1.d6329f1c35ca5p+132


Trouble is I don't 'read' hex; that's why I used a separate program to
extract the bits and build the number.

--
Bartc

 
Reply With Quote
 
Keith Thompson
Guest
Posts: n/a
 
      06-09-2011
"BartC" <(E-Mail Removed)> writes:
> "Keith Thompson" <(E-Mail Removed)> wrote in message
> news:(E-Mail Removed)...
>> "BartC" <(E-Mail Removed)> writes:

>
>>> printf("%f\n",1.0e40);
>>>
>>> I get an impossibly clean result of 1 followed by loads of zeros, no
>>> matter
>>> what numbers I stick inside %f.

>
>> What implementation are you using?
>>
>> This program:
>>
>> #include <stdio.h>
>> int main(void)
>> {
>> printf("%f\n", 1e40);

[...]
>> return 0;
>> }
>>
>> gives me this output:
>>
>> 10000000000000000303786028427003666890752.000000

>
> (That matches the result in my other post; a good sign..)


Oh? You said you got "an impossibly clean result of 1 followed by loads
of zeros"; that doesn't look "impossibly clean" to me.

> lcc-win32 and gcc (all on x86-32) both give me:
>
> 10000000000000000000000000000000000000000.000000
> 100000000000000000000000000000000000000000.000000
> 1000000000000000000000000000000000000000000.000000
> 10000000000000000000000000000000000000000000.00000 0
> 100000000000000010000000000000000000000000000.0000 00
> 999999999999999930000000000000000000000000000.0000 00
> ....


gcc probably isn't relevant; printf is provided by the runtime library,
not by the compiler. For lcc-win32, apparently both are form the same
source, but gcc will use whatever runtime library is on the system.

gcc with glibc on Linux produces the results I posted. gcc on Cygwin
(which I think uses something called "newlib") produces subtly different
results for some values; the Cygwin results are identical for 1e40,
1e41, and 1e42, but have more trailing zeros before the decimal point
for the other values.

glibc seems to be showing the exact stored value, or something
close to it. The results you're getting seem to be the result of
something doing some rounding somewhere.

> Using DMC, I do get the 303 in the middle of 1e40 for example, but the rest
> is still zeros:
>
> 10000000000000000303000000000000000000000.000000
>
> Odd. Perhaps some compiler switch is needed?


Unless a compiler switch affects the behavior of the runtime library
(which is possible), you're probably stuck with the behavior you're
getting unless you change systems.

>> (The trailing ".000000" makes sense; for numbers that large, the
>> only representable values are exact integers.)

>
> I think it's because you start with an integer (of 52/53 bits for example),
> then end up multiplying by a power of two, at least for these bigger
> numbers.


Right.

>> If your implementation supports it, you can see the exact values
>> using "%a" or "%A":
>>
>> 0x1.d6329f1c35ca5p+132

>
> Trouble is I don't 'read' hex; that's why I used a separate program to
> extract the bits and build the number.


....

Ok, here's some new information. Using Perl's Math::BigInt, I've
converted the results (with the trailing ".000000" ignored) to
hexadecimal. The results I got with glibc are:

0x1d6329f1c35ca500000000000000000000
0x125dfa371a19e7000000000000000000000
0xb7abc627050308000000000000000000000
0x72cb5bd86321e40000000000000000000000
0x47bf19673df53000000000000000000000000
0x2cd76fe086b93c000000000000000000000000
0x1c06a5ec5433c60000000000000000000000000
0x118427b3b4a05c00000000000000000000000000
0xaf298d050e439800000000000000000000000000
0x6d79f82328ea3c000000000000000000000000000
0x446c3b15f992680000000000000000000000000000

showing about 53 significant bits followed by zeros, which makes sense.

The results I got on Cygwin are:

0x1d6329f1c35ca500000000000000000000
0x125dfa371a19e7000000000000000000000
0xb7abc627050308000000000000000000000
0x72cb5bd86321e3fffffffffffffffffffffc
0x47bf19673df53000000000000000000000010
0x2cd76fe086b93c000000000000000000000020
0x1c06a5ec5433c5ffffffffffffffffffffffe90
0x118427b3b4a05c00000000000000000000008800
0xaf298d050e439800000000000000000000055000
0x6d79f82328ea3c0000000000000000000000335c0
0x446c3b15f992680000000000000000000001bb5200

which implies some spurious rounding.

Your output of
100000000000000010000000000000000000000000000.0000 00
is
0x47bf19673df5303cef26e3f42126110000000
which has 118 significant bits.

I'm not suggesting that the implementations you're using are
non-conforming; I don't think printf is required to print more
significant digits than are really there. The standard says "The
value is rounded to the appropriate number of digits."; there may
be some disagreement among implementers about what this means.

My guess is that
100000000000000010000000000000000000000000000.0000 00
results from computing a rounded result, and then picking up an
insignificant digit in the conversion to a string. I'd probably
call that a bug, but not in the sense that it's non-conforming.

--
Keith Thompson (The_Other_Keith) (E-Mail Removed) <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
 
Reply With Quote
 
BartC
Guest
Posts: n/a
 
      06-09-2011
"Keith Thompson" <(E-Mail Removed)> wrote in message
news:(E-Mail Removed)...
> "BartC" <(E-Mail Removed)> writes:
>> "Keith Thompson" <(E-Mail Removed)> wrote in message


>>> printf("%f\n", 1e40);


>>> gives me this output:
>>>
>>> 10000000000000000303786028427003666890752.000000

>>
>> (That matches the result in my other post; a good sign..)

>
> Oh? You said you got "an impossibly clean result of 1 followed by loads
> of zeros"; that doesn't look "impossibly clean" to me.


I meant my reply to James; printf() gives me the clean result, but using
special code to calculate the stored value with bigints gave me the proper
result above.

>> lcc-win32 and gcc (all on x86-32) both give me:
>>
>> 10000000000000000000000000000000000000000.000000


> gcc with glibc on Linux produces the results I posted. gcc on Cygwin


> glibc seems to be showing the exact stored value, or something
> close to it. The results you're getting seem to be the result of
> something doing some rounding somewhere.
>
>> Using DMC, I do get the 303 in the middle of 1e40 for example, but the
>> rest
>> is still zeros:
>>
>> 10000000000000000303000000000000000000000.000000
>>
>> Odd. Perhaps some compiler switch is needed?

>
> Unless a compiler switch affects the behavior of the runtime library
> (which is possible), you're probably stuck with the behavior you're
> getting unless you change systems.


It seems even odder now (if that makes sense...); some implementations will
print the exact result, some will do some internal rounding. And if a
certain precision is specified, it will be ignored if it's beyond some
limit. And it's all a matter of luck.

> Ok, here's some new information. Using Perl's Math::BigInt, I've
> converted the results (with the trailing ".000000" ignored) to
> hexadecimal. The results I got with glibc are:
>
> 0x1d6329f1c35ca500000000000000000000


> showing about 53 significant bits followed by zeros, which makes sense.
>
> The results I got on Cygwin are:
>
> 0x1d6329f1c35ca500000000000000000000


OK, the actual stored value.

> 0x72cb5bd86321e3fffffffffffffffffffffc


> which implies some spurious rounding.
>
> Your output of
> 100000000000000010000000000000000000000000000.0000 00
> is
> 0x47bf19673df5303cef26e3f42126110000000
> which has 118 significant bits.
>
> I'm not suggesting that the implementations you're using are
> non-conforming; I don't think printf is required to print more
> significant digits than are really there. The standard says "The
> value is rounded to the appropriate number of digits."; there may
> be some disagreement among implementers about what this means.
>
> My guess is that
> 100000000000000010000000000000000000000000000.0000 00
> results from computing a rounded result, and then picking up an
> insignificant digit in the conversion to a string. I'd probably
> call that a bug, but not in the sense that it's non-conforming.


It doesn't sound right to me. It should be possible to turn off rounding in
order to see exactly what's happening in a calculation,, and one way is by
specifying an extra-large precision. But some libraries apparently don't do
that.

--
Bartc

 
Reply With Quote
 
Noob
Guest
Posts: n/a
 
      06-10-2011
[ Re-posting through E-S, as aioe seems heavily filtered ]
OT: How many see this post, and not the one through aioe?

BartC wrote:

> Clearly 1e40 can't be stored exactly (can it?
> I thought it would need over 120 bits of precision)


1e40 = 10^40
= 5^40 * 2^40
= 0x1D6329F1C35CA4BFABB9F561 * 2^40

5^40 requires 93 bits to store precisely.

Thus, a double-precision IEEE 754 floating-point number
cannot store 1e40 exactly, as the mantissa is expressed
over (only) 52 bits.

http://en.wikipedia.org/wiki/Double_...g-point_format

0x1.D6329F1C35CA4 * 2^132 < 1e40 < 0x1.D6329F1C35CA5 * 2^132

i.e. respectively
9999999999999999094860208812374492184576
10000000000000000000000000000000000000000
10000000000000000303786028427003666890752

The exponent (132) is stored as 132+1023 = 1155 = 0x483

And, in fact, the following program prints 483d6329f1c35ca3
(Note the rounding error).

#include <stdio.h>
#include <string.h>
int main(void)
{
int i;
double d = 1.0;
for (i=0; i < 40; ++i) d *= 10.0;
unsigned char buf[8];
memcpy(buf, &d, sizeof buf);
for (i=0; i < 8; ++i) printf("%02x", buf[7-i]);
putchar('\n');
return 0;
}

cf. also
What Every Computer Scientist Should Know About Floating-Point Arithmetic
http://download.oracle.com/docs/cd/E..._goldberg.html

Regards.
 
Reply With Quote
 
Seebs
Guest
Posts: n/a
 
      06-10-2011
On 2011-06-10, Noob <root@127.0.0.1> wrote:
> [ Re-posting through E-S, as aioe seems heavily filtered ]
> OT: How many see this post, and not the one through aioe?


Me. I filter aioe because such a large number of the recurring trolls of
this group seem to use it for their socks.

> cf. also
> What Every Computer Scientist Should Know About Floating-Point Arithmetic


I think I see that recommended, on average, about twice a week. I think
I read it once, but either a very long time ago or not actually at all.
Gonna go do that now.

-s
--
Copyright 2011, all wrongs reversed. Peter Seebach / (E-Mail Removed)
http://www.seebs.net/log/ <-- lawsuits, religion, and funny pictures
http://en.wikipedia.org/wiki/Fair_Game_(Scientology) <-- get educated!
I am not speaking for my employer, although they do rent some of my opinions.
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
RE: hash() yields different results for different platforms Kerry, Richard Python 2 07-13-2006 01:16 PM
hash() yields different results for different platforms Qiangning Hong Python 12 07-12-2006 02:08 PM
Problem with method that starts process, yields pid then yields return code x1 Ruby 11 12-06-2005 01:30 AM
Converting floats to Strings and back dcMan Java 7 11-23-2005 10:58 AM
Floats to chars and chars to floats Kosio C Programming 44 09-23-2005 09:49 AM



Advertisments