Velocity Reviews

Velocity Reviews (http://www.velocityreviews.com/forums/index.php)
-   C Programming (http://www.velocityreviews.com/forums/f42-c-programming.html)
-   -   Who can explain this bug? (http://www.velocityreviews.com/forums/t959848-who-can-explain-this-bug.html)

mathog 04-17-2013 07:49 PM

Who can explain this bug?
 
Platform: gcc 4.6.3 on AMD Athlon(tm) II X2 245 Processor on
Ubuntu 12.04

The bug I thought was due to a particularly hard to find memory access
problem (see other thread I started today) seems to be even odder than
that. It was tracked down to a particular conditional, which seems to
optimize in such a way that the result is not always correct. (But
possibly only when embedded in the rest of the program.)

The skeletal elements of the problem code are:

double function(double *ymax){

double yheight;
double baseline;
double tmp=0.0;
double fs;
double yll, boff; /* these have values */
/* bbox is a struct that includes yMin and yMax */

/* set fs */

/* start of loop */
/* set yll, boff, bbox */
yheight = bbox.yMax - bbox.yMin;
printf("baseline %lf tmp %20.17lf %16llX ymax %20.17lf %16llX \n",
baseline, tmp, *(uint64_t *)&tmp,
(ymax ? *ymax: 666), (ymax ? *(uint64_t *)ymax: 666));


if(ymax){
tmp = fs * (((double)bbox.yMax)/yheight);
//the next line would "fix" the code when compiled in:
// printf("DEBUG ymax %lf tmp %lf\n", *ymax, tmp);fflush(stdout);
if(*ymax <= tmp){ // <--------THIS IS THE PROBLEM
*ymax = tmp;
baseline = yll - boff;
}
}
/* end of loop */
return(baseline);
}

When this is compiled with -O2 or -O3, AND when it is part of the larger
program, the indicated test is failing for the equals case, the
output from the printf statements is:

baseline 0.000000 tmp 0.00000000000000000 0 ymax
0.00000000000000000 0
baseline 12.680004 tmp 6.85406955898171422 401B6A9135E157A9 ymax
6.85406955898171422 401B6A9135E157A9
baseline 12.680004 tmp 6.85406955898171422 401B6A9135E157A9 ymax
6.85406955898171422 401B6A9135E157A9
baseline 12.680004 tmp 6.85406955898171422 401B6A9135E157A9 ymax
6.85406955898171422 401B6A9135E157A9

which is wrong, the baseline value should have been changing because
yll and boff were different on each loop iteration.

Compile it with -O0 (or with -O2 or -O3 and not be part of the larger
program, or with the commented out printf uncommented) and
instead it emits:

baseline 0.000000 tmp 0.00000000000000000 0 ymax
0.00000000000000000 0
baseline 12.680004 tmp 6.85406955898171422 401B6A9135E157A9 ymax
6.85406955898171422 401B6A9135E157A9
baseline 13.480004 tmp 6.85406955898171422 401B6A9135E157A9 ymax
6.85406955898171422 401B6A9135E157A9
baseline 17.480006 tmp 6.85406955898171422 401B6A9135E157A9 ymax
6.85406955898171422 401B6A9135E157A9

which is the correct result.

So, bizarrely, there is a problem with the conditional "<=" not giving
the right answer when the values of *ymax and tmp are equal when the
code is compiled in certain ways. And they are equal, down to the last
bit, as shown by the print statements!

This modification does not resolve the issue (when compiled in such a
way that it would be an issue):
if(*ymax <= (tmp * 1.0000000000000001)){

but this one does:
if(*ymax <= (tmp * 1.00000000001)){

So, what sort of "optimization" is it that gives the wrong result for
the "<=" condition when *ymax and tmp are equal????

Thanks,

David Mathog



mathog 04-18-2013 08:41 PM

Re: Who can explain this bug?
 
Gordon Burditt wrote:
>> So, what sort of "optimization" is it that gives the wrong result for
>> the "<=" condition when *ymax and tmp are equal????

>
> An optimization that keeps excess precision in a calculation.
>
> A typical floating point unit uses registers that are 80 bits wide,
> and only narrows it down to 64 bits when you store in a double.
> The code may not bother storing the result, then re-loading it
> before doing a comparison.


That would do it. In the example tmp is calculated and probably lives
in a register but the ymax value is stored. (If after the calculation
tmp is not stored to ymax, it is never used again, so there is no reason
it would not just be in the register.) So tmp may differ in the extra
16 bits out of the 80 bits that were not stored into ymax.

> See the GCC option -ffloat-store option.


Right. But I want this to work always - code behavior generally should
not depend on obscure compiler flags being set one way or the other.
That flag cannot be the one causing troubles here in any case, because
it is disabled for -O0 and -O3. Actually comparing O0 and O3 it was
not immediately evident which of the many changed flags might have
caused this.

>
> Comparing for floating-point equality is treading on dangerous
> ground. Where calculations are involved, there's almost always an
> excuse for at least one bit of roundoff. Also, very few non-integer
> decimal numbers can be exactly represented in finite-precision
> binary floating point.


The odd thing here is effectively what is happening is

b = 10000;
a = 1.0/9.0; /* a calculation with no exact representation */
if(b <= a) b= a; /* comparison is true */
a = 1.0/9.0; /* the SAME calculation */
if(b <= a) b= a; /* comparison is false */

Seems like a compiler bug to me that a value _known to have been stored_
in memory at 64 bits would be compared directly to a recent calculation
at 80 bits. Possibly faster to do it that way, but, as here, it can
give the wrong answer.

Thanks,

David Mathog


James Kuyper 04-18-2013 09:31 PM

Re: Who can explain this bug?
 
On 04/18/2013 04:41 PM, mathog wrote:
> Gordon Burditt wrote:
>>> So, what sort of "optimization" is it that gives the wrong result for
>>> the "<=" condition when *ymax and tmp are equal????

>>
>> An optimization that keeps excess precision in a calculation.
>>
>> A typical floating point unit uses registers that are 80 bits wide,
>> and only narrows it down to 64 bits when you store in a double.
>> The code may not bother storing the result, then re-loading it
>> before doing a comparison.

>
> That would do it. In the example tmp is calculated and probably lives
> in a register but the ymax value is stored. (If after the calculation
> tmp is not stored to ymax, it is never used again, so there is no reason
> it would not just be in the register.) So tmp may differ in the extra
> 16 bits out of the 80 bits that were not stored into ymax.
>
>> See the GCC option -ffloat-store option.

>
> Right. But I want this to work always - code behavior generally should
> not depend on obscure compiler flags being set one way or the other.


You're out of luck - the standard doesn't mandate the existence of any
feature that would deal with this. There's no portable way to do this,
just a variety of non-portable ways.

If you can restrict the use of your code to fully conforming
implementations of C99 or later, you can examine the value of
FLT_EVAL_METHOD (defined in <float.h>) to determine whether or not a
given compiler carries excess precision when evaluating expressions of
type float or double. However, if that value is -1, the evaluation
method is indeterminable, and if it's any other negative value, the
meaning is implementation-defined. If FLT_EVAL_METHOD >= 0, then float_t
and double_t (defined in <math.h>) are guaranteed to be wide enough that
intermediate expressions are evaluated in the same type, but the
standard imposes no requirements on float_t or double_t when
FLT_EVAL_METHOD < 0.

> That flag cannot be the one causing troubles here in any case, because
> it is disabled for -O0 and -O3. Actually comparing O0 and O3 it was
> not immediately evident which of the many changed flags might have
> caused this.


I couldn't find any documentation of a connection between --float-store
and -O0 or -O3. It also seems odd that something would be disabled for
those two levels, but not for -O1 or -O2. Can you explain that comment a
little more?

....
> The odd thing here is effectively what is happening is
>
> b = 10000;
> a = 1.0/9.0; /* a calculation with no exact representation */
> if(b <= a) b= a; /* comparison is true */
> a = 1.0/9.0; /* the SAME calculation */
> if(b <= a) b= a; /* comparison is false */


It might seem to you that both comparisons are identical, but they can
look very different to an optimizer. One side in one of the comparisons
might be a value that the optimizer has decided to keep in a
high-precision register; the corresponding side in the other comparison
might be a value that it has decided to retrieve from a lower-precision
memory location.

> Seems like a compiler bug to me that a value _known to have been stored_
> in memory at 64 bits would be compared directly to a recent calculation
> at 80 bits. Possibly faster to do it that way, but, as here, it can
> give the wrong answer.


That's why --float-store was invented, so you could express your
preference for "more consistent" vs. "faster and with greater but
inconsistent precision". That's precisely why the project I'm working on
has mandated --float-store when compiling using gcc (SGIs MIPS-PRO
compiler used to be our main compiler, and we used to use several
others, but for several years now we've been using gcc exclusively - it
makes portability testing a bit difficult).


Nobody 04-18-2013 09:58 PM

Re: Who can explain this bug?
 
On Thu, 18 Apr 2013 13:41:14 -0700, mathog wrote:

>> See the GCC option -ffloat-store option.

>
> Right. But I want this to work always - code behavior generally should
> not depend on obscure compiler flags being set one way or the other.


Tough; floating-point calculations are effectively non-deterministic
without certain additional compiler flags.


Tim Rentsch 04-18-2013 10:38 PM

Re: Who can explain this bug?
 
mathog <dmathog@gmail.com> writes:

> Platform: gcc 4.6.3 on AMD Athlon(tm) II X2 245 Processor on
> Ubuntu 12.04
>
> The bug I thought was due to a particularly hard to find memory
> access problem (see other thread I started today) seems to be
> even odder than that. It was tracked down to a particular
> conditional, which seems to optimize in such a way that the
> result is not always correct. (But possibly only when embedded
> in the rest of the program.)
>
> The skeletal elements of the problem code [white space added] are:
>
> double function( double *ymax ){
> double yheight;
> double baseline;
> double tmp = 0.0;
> double fs;
> double yll, boff; /* these have values */
> /* bbox is a struct that includes yMin and yMax */
>
> /* set fs */
>
> /* start of loop */
> /* set yll, boff, bbox */
> yheight = bbox.yMax - bbox.yMin;
> printf( "baseline %lf tmp %20.17lf %16llX ymax %20.17lf %16llX \n",
> baseline, tmp, *(uint64_t *)&tmp,
> (ymax ? *ymax : 666), (ymax ? *(uint64_t *)ymax : 666) );
>
> if( ymax ){
> tmp = fs * ( ((double) bbox.yMax) / yheight );
> // the next line would "fix" the code when compiled in:
> // printf( "DEBUG ymax %lf tmp %lf\n", *ymax, tmp); fflush(stdout);
> if( *ymax <= tmp ){ // <--------THIS IS THE PROBLEM
> *ymax = tmp;
> baseline = yll - boff;
> }
> }
> /* end of loop */
> return(baseline);
> }
>
> [program behaves differently compiled with no optimization
> and -O2 or -O3]
>
> So, bizarrely, there is a problem with the conditional "<=" not
> giving the right answer when the values of *ymax and tmp are
> equal when the code is compiled in certain ways.
> [snip elaboration]


The problem is gcc, which is non-conforming in how it handles
floating point. Changing the program with

volatile double tmp = 0.0;

or, if you prefer a more delicate touch, with

if( *ymax <= *(volatile double *)&tmp ){ ...

should remove the offending non-conforming behavior.

Tim Rentsch 04-18-2013 10:52 PM

Re: Who can explain this bug?
 
gordonb.f3kx8@burditt.org (Gordon Burditt) writes:

>> So, what sort of "optimization" is it that gives the wrong result for
>> the "<=" condition when *ymax and tmp are equal????

>
> An optimization that keeps excess precision in a calculation.
>
> A typical floating point unit uses registers that are 80 bits wide,
> and only narrows it down to 64 bits when you store in a double.
> The code may not bother storing the result, then re-loading it
> before doing a comparison. [snip]


Only in non-conforming implementations. What happens in a
conforming implementation must be as if the value were actually
stored and re-fetched, and the semantics of assignment imcludes
removing any extra range and precision (footnote, 6.3.1.8 p2).
What gcc does is not a legitimate optimization but non-conforming
behavior.

Tim Rentsch 04-18-2013 10:59 PM

Re: Who can explain this bug?
 
James Kuyper <jameskuyper@verizon.net> writes:

> On 04/18/2013 04:41 PM, mathog wrote:
>> Gordon Burditt wrote:
>>>> So, what sort of "optimization" is it that gives the wrong result for
>>>> the "<=" condition when *ymax and tmp are equal????
>>>
>>> An optimization that keeps excess precision in a calculation.
>>>
>>> A typical floating point unit uses registers that are 80 bits wide,
>>> and only narrows it down to 64 bits when you store in a double.
>>> The code may not bother storing the result, then re-loading it
>>> before doing a comparison.

>>
>> That would do it. In the example tmp is calculated and probably lives
>> in a register but the ymax value is stored. (If after the calculation
>> tmp is not stored to ymax, it is never used again, so there is no reason
>> it would not just be in the register.) So tmp may differ in the extra
>> 16 bits out of the 80 bits that were not stored into ymax.
>>
>>> See the GCC option -ffloat-store option.

>>
>> Right. But I want this to work always - code behavior generally should
>> not depend on obscure compiler flags being set one way or the other.

>
> You're out of luck - the standard doesn't mandate the existence of any
> feature that would deal with this. There's no portable way to do this,
> just a variety of non-portable ways. [snip elaboration]


This explanation is all messed up. There is a portable way of
doing what he wants to do, for _conforming_ implementations, and
he is doing it: storing the result of a floating point
calculation into a variable (ie, by assignment) is obliged to
remove any extra range and precision, which would eliminate the
unexpected behavior. That gcc does not behave this way is a bug
in gcc, not behavior that the Standard admits in conforming
implementations.

Tim Rentsch 04-18-2013 11:05 PM

Re: Who can explain this bug?
 
Nobody <nobody@nowhere.com> writes:

> On Thu, 18 Apr 2013 13:41:14 -0700, mathog wrote:
>
>>> See the GCC option -ffloat-store option.

>>
>> Right. But I want this to work always - code behavior generally should
>> not depend on obscure compiler flags being set one way or the other.

>
> Tough; floating-point calculations are effectively non-deterministic
> without certain additional compiler flags.


Floating-point operations can be effectively non-deterministic.
But in C, assigning and accessing floating-point variables is not
as non-deterministic as you seem to think it is. The culprit
here is gcc, which cheerfully generates non-conforming code for
floating point expressions involving assignment and subsequent
access.

glen herrmannsfeldt 04-18-2013 11:22 PM

Re: Who can explain this bug?
 
mathog <dmathog@gmail.com> wrote:
> Gordon Burditt wrote:
>>> So, what sort of "optimization" is it that gives the wrong result for
>>> the "<=" condition when *ymax and tmp are equal????


It is commonly said, for good reasons, not to depend on equality
in floating point comparisons. Not even if you just assigned the
same value to a variable.

>> An optimization that keeps excess precision in a calculation.


>> A typical floating point unit uses registers that are 80 bits wide,
>> and only narrows it down to 64 bits when you store in a double.
>> The code may not bother storing the result, then re-loading it
>> before doing a comparison.


> That would do it. In the example tmp is calculated and probably lives
> in a register but the ymax value is stored. (If after the calculation
> tmp is not stored to ymax, it is never used again, so there is no reason
> it would not just be in the register.) So tmp may differ in the extra
> 16 bits out of the 80 bits that were not stored into ymax.


Actually, I believe it is 11 bits, you also get 4 more exponent bits.
The latter is convenient in not having to worry about overflow in
intermediate calculations when the result will fit. (and no hidden 1,
like in the other forms.)

>> See the GCC option -ffloat-store option.


> Right. But I want this to work always - code behavior generally should
> not depend on obscure compiler flags being set one way or the other.
> That flag cannot be the one causing troubles here in any case, because
> it is disabled for -O0 and -O3. Actually comparing O0 and O3 it was
> not immediately evident which of the many changed flags might have
> caused this.


The x87 floating point processor has eight registers, each 80 bits
wide. The original design was for a virtual stack such that when
it overflowed an interrupt routine would write some out to memory,
and on underflow read them back. Programs could be written without
worry about stack size. After the hardware (8087) was built, it
turned out that it was not possible to write such a routine.
As I remember it, you couldn't restore all the status bits back
to the appropriate values.

There are control bits that affect the precision of some operations,
but not all of them. Some systems (runtime start-up code) set those
bits, others don't.

>> Comparing for floating-point equality is treading on dangerous
>> ground. Where calculations are involved, there's almost always an
>> excuse for at least one bit of roundoff. Also, very few non-integer
>> decimal numbers can be exactly represented in finite-precision
>> binary floating point.

>
> The odd thing here is effectively what is happening is


> b = 10000;
> a = 1.0/9.0; /* a calculation with no exact representation */
> if(b <= a) b= a; /* comparison is true */
> a = 1.0/9.0; /* the SAME calculation */
> if(b <= a) b= a; /* comparison is false */


If you really have a compiler where 10000 <= 1.0/9.0 then
it is a bug.

As far as I know, it is usual for -O0 to keep intermediates in
register during evaluation of an expression, but store to the actual
variable in the end. With -O3, most try to keep values in registers
between statements, until they run out of registers.

In any case, the complications of writing a register allocator
for arbitrarily complicated expressions and eight registers means
that often you will be surprised with the results.

> Seems like a compiler bug to me that a value _known to have been stored_
> in memory at 64 bits would be compared directly to a recent calculation
> at 80 bits. Possibly faster to do it that way, but, as here, it can
> give the wrong answer.


If you don't depend on equality, or even being close, for floating
point then it isn't a problem. For even more fun, note that there
was (maybe still are) Cray computers where:

if(a*b!=b*a) printf("Ooops!\n");

might surprise you. That is, the floating point multiplier gave
different answers depending on the order of the operands.

The normal problem with the inconsistent extra precision is that it
causes some calculations designed to avoid problems with precision
loss to fail.

Another common problem is that constant expressions evaluated at
compile time give different results from the same expression,
(with variables) evaluated at run time.

-- glen

glen herrmannsfeldt 04-18-2013 11:27 PM

Re: Who can explain this bug?
 
Tim Rentsch <txr@alumni.caltech.edu> wrote:
> gordonb.f3kx8@burditt.org (Gordon Burditt) writes:


>>> So, what sort of "optimization" is it that gives the wrong result for
>>> the "<=" condition when *ymax and tmp are equal????


>> An optimization that keeps excess precision in a calculation.


>> A typical floating point unit uses registers that are 80 bits wide,
>> and only narrows it down to 64 bits when you store in a double.
>> The code may not bother storing the result, then re-loading it
>> before doing a comparison. [snip]


> Only in non-conforming implementations. What happens in a
> conforming implementation must be as if the value were actually
> stored and re-fetched, and the semantics of assignment imcludes
> removing any extra range and precision (footnote, 6.3.1.8 p2).
> What gcc does is not a legitimate optimization but non-conforming
> behavior.


I believe that is correct. Fortran is more lenient in what it allows
for floating point expression evaluation. But storing every intermediate
value can slow things down a lot!

Well, they should go to cache instead of all the way to main memory,
but it is still slow.

Does gcc always store intermediate values with -O0?

-- glen


All times are GMT. The time now is 10:22 PM.

Powered by vBulletin®. Copyright ©2000 - 2014, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.