Velocity Reviews > C++ > complexity of trigonometric functions

# complexity of trigonometric functions

Alf P. Steinbach /Usenet
Guest
Posts: n/a

 09-06-2010
* James Kanze, on 06.09.2010 13:33:
> On Sep 3, 5:39 pm, red floyd<(E-Mail Removed)> wrote:
>> On Sep 2, 10:21 pm, red floyd<(E-Mail Removed)> wrote:

>
> [...]
>> According to my 8087 manual (Intel 8086 manual, July 1981,
>> appendix S), the 8087 instruction set did not include FSIN or
>> FCOS.

>
> The 8087 definitely had some support for trig, since I used it
> back then. IIRC, it was in the form of an FSINCOS or FCOSSIN
> function, which calculated both sin and cos (over a very limited
> range).
>
> My 8087 manual was lost in a move, many, many years ago; if
> you've got one and can check it, I'd be curious to know.

I don't have an 8087 manual any longer, but my little Turbo Assembler Quick
Reference Guide says FSINCOS is "387 and i486 only".

However, it seems that the 8087 did know about PI (that it supported FLDPI),
which is more than even the C++ standard library does today! <g>

I think back in 1978/79 it was pretty advanced just to have floating point
support on chipset for personal computers.

Cheers & hth.,

- Alf

--
blog at <url: http://alfps.wordpress.com>

SG
Guest
Posts: n/a

 09-06-2010
On 2 Sep., 10:22, Vladimir Jovic wrote:
>
> I have a piece of code that I am trying to optimize. It is nothing
> special - just some calculations, including trigonometric functions
> (bunch of multiplications with sin and cos here and there). My code
> duplicates in some places, but that is ok.

In case you want to compute a sequence of sine values a la

f[k] = sin(a+b*k)

you can make use of a simple linear recurrence:

f[0] = sin(a)
f[1] = sin(a+b)
f[k] = 2 cos(b) f[k-1] - f[k-2]

where 2*cos(b) is obviously a constant. And since cos(x)=sin(pi/2+x)
you can use this to compute cosine values as well, of course. Only two
FLOPs per value is a very low number of operations. The only drawback
is rounding errors. Every one in a while one should start new or
somehow "normalize" the values so that the amplitude doesn't decrease
or increase due to rounding errors.

Cheers!
SG

Guest
Posts: n/a

 09-06-2010
SG wrote:
> On 2 Sep., 10:22, Vladimir Jovic wrote:
>> I have a piece of code that I am trying to optimize. It is nothing
>> special - just some calculations, including trigonometric functions
>> (bunch of multiplications with sin and cos here and there). My code
>> duplicates in some places, but that is ok.

>
> In case you want to compute a sequence of sine values a la
>
> f[k] = sin(a+b*k)
>
> you can make use of a simple linear recurrence:
>
> f[0] = sin(a)
> f[1] = sin(a+b)
> f[k] = 2 cos(b) f[k-1] - f[k-2]
>
> where 2*cos(b) is obviously a constant. And since cos(x)=sin(pi/2+x)
> you can use this to compute cosine values as well, of course. Only two
> FLOPs per value is a very low number of operations. The only drawback
> is rounding errors. Every one in a while one should start new or
> somehow "normalize" the values so that the amplitude doesn't decrease
> or increase due to rounding errors.

That is exactly what I am doing. Thanks for the tip.
btw is 1000 iterations (k=1000) going to produce big error?

SG
Guest
Posts: n/a

 09-06-2010
On 6 Sep., 14:33, Vladimir Jovic wrote:
> SG wrote:
>
> > In case you want to compute a sequence of sine values a la

>
> > * f[k] = sin(a+b*k)

>
> > you can make use of a simple linear recurrence:

>
> > * f[0] = sin(a)
> > * f[1] = sin(a+b)
> > * f[k] = 2 cos(b) f[k-1] - f[k-2]

>
> > where 2*cos(b) is obviously a constant. And since cos(x)=sin(pi/2+x)
> > you can use this to compute cosine values as well, of course. Only two
> > FLOPs per value is a very low number of operations. The only drawback
> > is rounding errors. Every one in a while one should start new or
> > somehow "normalize" the values so that the amplitude doesn't decrease
> > or increase due to rounding errors.

>
> That is exactly what I am doing. Thanks for the tip.
> btw is 1000 iterations (k=1000) going to produce big error?

I'm not sure. The rounding error in the precomputed constant for
2*cos(b) obviously affects the frequency to some degree. And a
rounding error 'e' in one of the samples produces an infinite response
(a sinusoid with amplitude e/sin(b)). Assuming the rounding errors are
not correlated and ignoring the frequency error, the expected absolute
error for the k-th sample should be proportional to sqrt(k)/sin(b).

If the frequency dependency (1/sin(b)) is bugging you and you need
both, sin AND cos an alternative which is almost as fast would be to
use the following complex-valued 1st order linear recurrence:

c[k] = exp(i*(a+b*k))
= exp(i* b ) * c[k-1]

where -1=i*i, real(c[k])=cos(a+b*k) and imag(c[k])=sin(a+b*k). With
this recurrence you need 4 multiplications and two additions for a
pair of sin/cos values. That's twice as many multiplications compared
to above. You still have the sqrt(k)-scaling but the frequency
dependency is gone now (yay!).

Since you're concerned about speed, let me share my recent experiences
with GCC's std::complex<double> implementation: Without a compiler
option like -ffast-math operator* maps to a function called __muldc3
(according to the profiler) which is terribly slow compared to 4 MULs
and 2 ADDs for some reason. I also tested operator* for std::complex<>
using a recent Microsoft compiler. According to my measurements
operator* was only half as fast compared to 4 MULs and 2 ADDs on an
old Pentium 4 HT. I guess there's some special case handling involved
with respect to NaN or Inf that is slowing down the multiplication.

Cheers!
SG

Guest
Posts: n/a

 09-07-2010
SG wrote:
> On 6 Sep., 14:33, Vladimir Jovic wrote:
>> SG wrote:
>>
>>> In case you want to compute a sequence of sine values a la
>>> f[k] = sin(a+b*k)
>>> you can make use of a simple linear recurrence:
>>> f[0] = sin(a)
>>> f[1] = sin(a+b)
>>> f[k] = 2 cos(b) f[k-1] - f[k-2]
>>> where 2*cos(b) is obviously a constant. And since cos(x)=sin(pi/2+x)
>>> you can use this to compute cosine values as well, of course. Only two
>>> FLOPs per value is a very low number of operations. The only drawback
>>> is rounding errors. Every one in a while one should start new or
>>> somehow "normalize" the values so that the amplitude doesn't decrease
>>> or increase due to rounding errors.

>> That is exactly what I am doing. Thanks for the tip.
>> btw is 1000 iterations (k=1000) going to produce big error?

>
> I'm not sure. The rounding error in the precomputed constant for
> 2*cos(b) obviously affects the frequency to some degree. And a
> rounding error 'e' in one of the samples produces an infinite response
> (a sinusoid with amplitude e/sin(b)). Assuming the rounding errors are
> not correlated and ignoring the frequency error, the expected absolute
> error for the k-th sample should be proportional to sqrt(k)/sin(b).
>

That means for low angles, the error is higher. Then I can not use it,
since my angles are around zero (going +-15 degrees in small steps).

> If the frequency dependency (1/sin(b)) is bugging you and you need
> both, sin AND cos an alternative which is almost as fast would be to
> use the following complex-valued 1st order linear recurrence:
>
> c[k] = exp(i*(a+b*k))
> = exp(i* b ) * c[k-1]
>
> where -1=i*i, real(c[k])=cos(a+b*k) and imag(c[k])=sin(a+b*k). With
> this recurrence you need 4 multiplications and two additions for a
> pair of sin/cos values. That's twice as many multiplications compared
> to above. You still have the sqrt(k)-scaling but the frequency
> dependency is gone now (yay!).
>

So far, I have implemented this using straight method (using standard
library's sin/cos), and to be honest I haven't thought of using complex

> Since you're concerned about speed, let me share my recent experiences
> with GCC's std::complex<double> implementation: Without a compiler
> option like -ffast-math operator* maps to a function called __muldc3
> (according to the profiler) which is terribly slow compared to 4 MULs
> and 2 ADDs for some reason. I also tested operator* for std::complex<>
> using a recent Microsoft compiler. According to my measurements
> operator* was only half as fast compared to 4 MULs and 2 ADDs on an
> old Pentium 4 HT. I guess there's some special case handling involved
> with respect to NaN or Inf that is slowing down the multiplication.
>

Thanks, I will play a bit with the profiler, and see which method is the
best (for my purpose)

Guest
Posts: n/a

 09-08-2010
SG <(E-Mail Removed)> writes:
> Without a compiler option like -ffast-math operator* maps to a
> function called __muldc3 (according to the profiler) which is terribly
> slow compared to 4 MULs and 2 ADDs for some reason. I also tested
> operator* for std::complex<> using a recent Microsoft
> compiler. According to my measurements operator* was only half as fast
> compared to 4 MULs and 2 ADDs on an old Pentium 4 HT. I guess there's
> some special case handling involved with respect to NaN or Inf that is
> slowing down the multiplication.

Perhaps that's this (from the gcc manual):

`-fcx-limited-range'
When enabled, this option states that a range reduction step is not
needed when performing complex division. Also, there is no
checking whether the result of a complex multiplication or
division is `NaN + I*NaN', with an attempt to rescue the situation
in that case. The default is `-fno-cx-limited-range', but is
enabled by `-ffast-math'.

This option controls the default setting of the ISO C99
`CX_LIMITED_RANGE' pragma. Nevertheless, the option applies to
all languages.

`-fcx-fortran-rules'
Complex multiplication and division follow Fortran rules. Range
reduction is done as part of complex division, but there is no
checking whether the result of a complex multiplication or
division is `NaN + I*NaN', with an attempt to rescue the situation
in that case.

The default is `-fno-cx-fortran-rules'.

-miles

--
"Whatever you do will be insignificant, but it is very important that
you do it." Mahatma Gandhi