Velocity Reviews > C++ > Two implementations of simple math equation yield different results

# Two implementations of simple math equation yield different results

Avi
Guest
Posts: n/a

 05-13-2008
I need to implement the following calculation:
f = (a*b) + (c*d)
where a,b,c,d are given double values and f is a double variable for
the result

I found out that using two different implementations gives two
different results:
1) In the first implementation I compute the entire equation in memory
2) In the second implementation I store intermediate to variables, and
then load them to compute the final result.
The results differ by a tiny amount (1E-15). I suspect that the reason
for the difference is from round off errors of the compiler.
Nonetheless, I need to know the reason for the difference as I need to
have an absolute reference for optimization work.

The two ways are as follows:

1)
Calculating two double values and then summing them to a single double
value
double f1 = (a*b) + (c*d)

2)
Calculating two double values and storing them to two double variables
Summing the two double variables to a single double value
double e1 = (a*b);
double e2 = (c*d);
double f2 = e1 + e2;

The difference is calculated as follows:
double diff = f2-f1;

I expect the difference to be exactly 0 but instead, I get a tiny
value.
Could someone explain the reason for differences in the result?

I’m compiling with gcc

Thanks
Avi

James Kanze
Guest
Posts: n/a

 05-13-2008
On May 13, 7:16 am, Avi <(E-Mail Removed)> wrote:
> I need to implement the following calculation:
> f = (a*b) + (c*d)
> where a,b,c,d are given double values and f is a double
> variable for the result

> I found out that using two different implementations gives two
> different results:
> 1) In the first implementation I compute the entire equation in memory

You mean in CPU and registers. Variables are in memory.

> 2) In the second implementation I store intermediate to
> variables, and then load them to compute the final result.

> The results differ by a tiny amount (1E-15). I suspect that
> the reason for the difference is from round off errors of the
> compiler.

The C++ standard allows a compiler to use extended precision for
intermediate results, only "forcing" the result to the final
precision when it is assigned to a variable.

> Nonetheless, I need to know the reason for the difference as I
> need to have an absolute reference for optimization work.

> The two ways are as follows:

> 1)
> Calculating two double values and then summing them to a
> single double value
> double f1 = (a*b) + (c*d)

All calculations may be done in extended precision.

> 2)
> Calculating two double values and storing them to two double variables
> Summing the two double variables to a single double value
> double e1 = (a*b);

This forces the result to double.

> double e2 = (c*d);

As does this as well.

> double f2 = e1 + e2;

> The difference is calculated as follows:
> double diff = f2-f1;

> I expect the difference to be exactly 0 but instead, I get a
> tiny value. Could someone explain the reason for differences
> in the result?

It's very implementation dependent.

> I'm compiling with gcc

More to the point, you're compiling on a machine which uses
extended precision in its floating point unit, and its floating
point registers. (An Intel, perhaps.)

The reason for this freedom is speed. Forcing the results of
each operation to be rounded to double can be done, but would
slow things down considerably (at least on Intel). And for the
most part, increased precision is not considered a defect.

--
James Kanze (GABI Software) email:(E-Mail Removed)
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Greg Herlihy
Guest
Posts: n/a

 05-13-2008
On May 13, 2:42*am, James Kanze <(E-Mail Removed)> wrote:
> On May 13, 7:16 am, Avi <(E-Mail Removed)> wrote:
> > double f2 = e1 + e2;
> > The difference is calculated as follows:
> > double diff = f2-f1;
> > I expect the difference to be exactly 0 but instead, I get a
> > tiny value. *Could someone explain the reason for differences
> > in the result?

>
> It's very implementation dependent.
>
> > I'm compiling with gcc

>
> More to the point, you're compiling on a machine which uses
> extended precision in its floating point unit, and its floating
> point registers. *(An Intel, perhaps.)
>
> The reason for this freedom is speed. *Forcing the results of
> each operation to be rounded to double can be done, but would
> slow things down considerably (at least on Intel). *And for the
> most part, increased precision is not considered a defect.

But the extended floating point precision is a defect if - as in this
case - consistent numeric results are required. Fortunately, all Intel
x86 chips since the Pentium 3 support double precision (64-bit)
floating point in their SSE units. So, the solution would be to have
the compiler generate SSE floating point instructions ("-msse2 -
mfpmath=sse") instead of 387 instructions. According to the gcc
documentation, this strategy has other benefits as well:

"The resulting code should be considerably faster in the majority of
cases and avoid the numerical instability problems of 387 code, but
may break some existing code that expects temporaries to be 80-bit."

Greg

Avi
Guest
Posts: n/a

 05-13-2008
One way to identify and ignore small differences is to set a small
threshold, say:
double epsilon = 1E-6
and compare the difference against the threshold as follows:

if (abs(diff) < epsilon)
{
// difference is considered as noise
}
{
// difference is significant
}

I’m looking for a predefined threshold that the differences can be
measured against.
For example, the difference between 1 and the smallest value greater
than 1 that is representable for the double data type is 2.22045e-16.
I get it using: numeric_limits<double>::epsilon()

The differences that I’m seeing between the two implementations (in
the order of 1-E15) are bigger than the
numeric_limits<double>::epsilon() value (2.22045e-16).

I understand that the accumulated round off errors can be bigger than
the numeric_limits<double>::epsilon() value.

Is there a systematic threshold that can be used to define when the
differences are noise and when they are significant?
Or should I just pick a number which is small enough in my mind (say
epsilon = 1 E-6)?

Thanks,
Avi

Avi
Guest
Posts: n/a

 05-13-2008

One way to identify and ignore small differences is to set a small
threshold, say:
double epsilon = 1E-6
and compare the difference against the threshold as follows:

if (abs(diff) < epsilon)
{
// difference is considered as noise
}
{
// difference is significant
}

I’m looking for a predefined threshold that the differences can be
measured against.
For example, the difference between 1 and the smallest value greater
than 1 that is representable for the double data type is 2.22045e-16.
I get it using: numeric_limits<double>::epsilon()

The differences that I’m seeing between the two implementations (in
the order of 1-E15) are bigger than the
numeric_limits<double>::epsilon() value (2.22045e-16).

I understand that the accumulated round off errors can be bigger than
the numeric_limits<double>::epsilon() value.

Is there a systematic threshold that can be used to define when the
differences are noise and when they are significant?
Or should I just pick a number which is small enough in my mind (say
epsilon = 1 E-6)?

Thanks,
Avi

Avi
Guest
Posts: n/a

 05-13-2008

and helped me set a mechanism to distinguish between round off noise
and significant differences.

Avi

James Kanze
Guest
Posts: n/a

 05-14-2008
On May 13, 12:15 pm, Greg Herlihy <(E-Mail Removed)> wrote:
> On May 13, 2:42 am, James Kanze <(E-Mail Removed)> wrote:

> > On May 13, 7:16 am, Avi <(E-Mail Removed)> wrote:
> > > double f2 = e1 + e2;
> > > The difference is calculated as follows:
> > > double diff = f2-f1;
> > > I expect the difference to be exactly 0 but instead, I get a
> > > tiny value. Could someone explain the reason for differences
> > > in the result?

> > It's very implementation dependent.

> > > I'm compiling with gcc

> > More to the point, you're compiling on a machine which uses
> > extended precision in its floating point unit, and its floating
> > point registers. (An Intel, perhaps.)

> > The reason for this freedom is speed. Forcing the results of
> > each operation to be rounded to double can be done, but would
> > slow things down considerably (at least on Intel). And for the
> > most part, increased precision is not considered a defect.

> But the extended floating point precision is a defect if - as
> in this case - consistent numeric results are required.

By defect, I presume you mean with regards to what is needed,
not according to the standard. I'm not that good in numerics to
argue one way or the other. Intuitively, I would agree with
specialists.

--
James Kanze (GABI Software) email:(E-Mail Removed)
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34