complexity of trigonometric functions

A

Alf P. Steinbach /Usenet

* James Kanze, on 06.09.2010 13:33:
[...]
According to my 8087 manual (Intel 8086 manual, July 1981,
appendix S), the 8087 instruction set did not include FSIN or
FCOS.

The 8087 definitely had some support for trig, since I used it
back then. IIRC, it was in the form of an FSINCOS or FCOSSIN
function, which calculated both sin and cos (over a very limited
range).

My 8087 manual was lost in a move, many, many years ago; if
you've got one and can check it, I'd be curious to know.

I don't have an 8087 manual any longer, but my little Turbo Assembler Quick
Reference Guide says FSINCOS is "387 and i486 only".

However, it seems that the 8087 did know about PI (that it supported FLDPI),
which is more than even the C++ standard library does today! <g>

I think back in 1978/79 it was pretty advanced just to have floating point
support on chipset for personal computers.


Cheers & hth.,

- Alf
 
S

SG

I have a piece of code that I am trying to optimize. It is nothing
special - just some calculations, including trigonometric functions
(bunch of multiplications with sin and cos here and there). My code
duplicates in some places, but that is ok.

In case you want to compute a sequence of sine values a la

f[k] = sin(a+b*k)

you can make use of a simple linear recurrence:

f[0] = sin(a)
f[1] = sin(a+b)
f[k] = 2 cos(b) f[k-1] - f[k-2]

where 2*cos(b) is obviously a constant. And since cos(x)=sin(pi/2+x)
you can use this to compute cosine values as well, of course. Only two
FLOPs per value is a very low number of operations. The only drawback
is rounding errors. Every one in a while one should start new or
somehow "normalize" the values so that the amplitude doesn't decrease
or increase due to rounding errors.

Cheers!
SG
 
V

Vladimir Jovic

SG said:
I have a piece of code that I am trying to optimize. It is nothing
special - just some calculations, including trigonometric functions
(bunch of multiplications with sin and cos here and there). My code
duplicates in some places, but that is ok.

In case you want to compute a sequence of sine values a la

f[k] = sin(a+b*k)

you can make use of a simple linear recurrence:

f[0] = sin(a)
f[1] = sin(a+b)
f[k] = 2 cos(b) f[k-1] - f[k-2]

where 2*cos(b) is obviously a constant. And since cos(x)=sin(pi/2+x)
you can use this to compute cosine values as well, of course. Only two
FLOPs per value is a very low number of operations. The only drawback
is rounding errors. Every one in a while one should start new or
somehow "normalize" the values so that the amplitude doesn't decrease
or increase due to rounding errors.

That is exactly what I am doing. Thanks for the tip.
btw is 1000 iterations (k=1000) going to produce big error?
 
S

SG

SG said:
In case you want to compute a sequence of sine values a la
  f[k] = sin(a+b*k)
you can make use of a simple linear recurrence:
  f[0] = sin(a)
  f[1] = sin(a+b)
  f[k] = 2 cos(b) f[k-1] - f[k-2]
where 2*cos(b) is obviously a constant. And since cos(x)=sin(pi/2+x)
you can use this to compute cosine values as well, of course. Only two
FLOPs per value is a very low number of operations. The only drawback
is rounding errors. Every one in a while one should start new or
somehow "normalize" the values so that the amplitude doesn't decrease
or increase due to rounding errors.

That is exactly what I am doing. Thanks for the tip.
btw is 1000 iterations (k=1000) going to produce big error?

I'm not sure. The rounding error in the precomputed constant for
2*cos(b) obviously affects the frequency to some degree. And a
rounding error 'e' in one of the samples produces an infinite response
(a sinusoid with amplitude e/sin(b)). Assuming the rounding errors are
not correlated and ignoring the frequency error, the expected absolute
error for the k-th sample should be proportional to sqrt(k)/sin(b).

If the frequency dependency (1/sin(b)) is bugging you and you need
both, sin AND cos an alternative which is almost as fast would be to
use the following complex-valued 1st order linear recurrence:

c[k] = exp(i*(a+b*k))
= exp(i* b ) * c[k-1]

where -1=i*i, real(c[k])=cos(a+b*k) and imag(c[k])=sin(a+b*k). With
this recurrence you need 4 multiplications and two additions for a
pair of sin/cos values. That's twice as many multiplications compared
to above. You still have the sqrt(k)-scaling but the frequency
dependency is gone now (yay!).

Since you're concerned about speed, let me share my recent experiences
with GCC's std::complex<double> implementation: Without a compiler
option like -ffast-math operator* maps to a function called __muldc3
(according to the profiler) which is terribly slow compared to 4 MULs
and 2 ADDs for some reason. I also tested operator* for std::complex<>
using a recent Microsoft compiler. According to my measurements
operator* was only half as fast compared to 4 MULs and 2 ADDs on an
old Pentium 4 HT. I guess there's some special case handling involved
with respect to NaN or Inf that is slowing down the multiplication.

Cheers!
SG
 
V

Vladimir Jovic

SG said:
SG said:
In case you want to compute a sequence of sine values a la
f[k] = sin(a+b*k)
you can make use of a simple linear recurrence:
f[0] = sin(a)
f[1] = sin(a+b)
f[k] = 2 cos(b) f[k-1] - f[k-2]
where 2*cos(b) is obviously a constant. And since cos(x)=sin(pi/2+x)
you can use this to compute cosine values as well, of course. Only two
FLOPs per value is a very low number of operations. The only drawback
is rounding errors. Every one in a while one should start new or
somehow "normalize" the values so that the amplitude doesn't decrease
or increase due to rounding errors.
That is exactly what I am doing. Thanks for the tip.
btw is 1000 iterations (k=1000) going to produce big error?

I'm not sure. The rounding error in the precomputed constant for
2*cos(b) obviously affects the frequency to some degree. And a
rounding error 'e' in one of the samples produces an infinite response
(a sinusoid with amplitude e/sin(b)). Assuming the rounding errors are
not correlated and ignoring the frequency error, the expected absolute
error for the k-th sample should be proportional to sqrt(k)/sin(b).

That means for low angles, the error is higher. Then I can not use it,
since my angles are around zero (going +-15 degrees in small steps).
If the frequency dependency (1/sin(b)) is bugging you and you need
both, sin AND cos an alternative which is almost as fast would be to
use the following complex-valued 1st order linear recurrence:

c[k] = exp(i*(a+b*k))
= exp(i* b ) * c[k-1]

where -1=i*i, real(c[k])=cos(a+b*k) and imag(c[k])=sin(a+b*k). With
this recurrence you need 4 multiplications and two additions for a
pair of sin/cos values. That's twice as many multiplications compared
to above. You still have the sqrt(k)-scaling but the frequency
dependency is gone now (yay!).

So far, I have implemented this using straight method (using standard
library's sin/cos), and to be honest I haven't thought of using complex
numbers, but I must admit
Since you're concerned about speed, let me share my recent experiences
with GCC's std::complex<double> implementation: Without a compiler
option like -ffast-math operator* maps to a function called __muldc3
(according to the profiler) which is terribly slow compared to 4 MULs
and 2 ADDs for some reason. I also tested operator* for std::complex<>
using a recent Microsoft compiler. According to my measurements
operator* was only half as fast compared to 4 MULs and 2 ADDs on an
old Pentium 4 HT. I guess there's some special case handling involved
with respect to NaN or Inf that is slowing down the multiplication.

Thanks, I will play a bit with the profiler, and see which method is the
best (for my purpose)
 
M

Miles Bader

SG said:
Without a compiler option like -ffast-math operator* maps to a
function called __muldc3 (according to the profiler) which is terribly
slow compared to 4 MULs and 2 ADDs for some reason. I also tested
operator* for std::complex<> using a recent Microsoft
compiler. According to my measurements operator* was only half as fast
compared to 4 MULs and 2 ADDs on an old Pentium 4 HT. I guess there's
some special case handling involved with respect to NaN or Inf that is
slowing down the multiplication.

Perhaps that's this (from the gcc manual):

`-fcx-limited-range'
When enabled, this option states that a range reduction step is not
needed when performing complex division. Also, there is no
checking whether the result of a complex multiplication or
division is `NaN + I*NaN', with an attempt to rescue the situation
in that case. The default is `-fno-cx-limited-range', but is
enabled by `-ffast-math'.

This option controls the default setting of the ISO C99
`CX_LIMITED_RANGE' pragma. Nevertheless, the option applies to
all languages.

`-fcx-fortran-rules'
Complex multiplication and division follow Fortran rules. Range
reduction is done as part of complex division, but there is no
checking whether the result of a complex multiplication or
division is `NaN + I*NaN', with an attempt to rescue the situation
in that case.

The default is `-fno-cx-fortran-rules'.

-miles
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,145
Messages
2,570,826
Members
47,371
Latest member
Brkaa

Latest Threads

Top