Vladimir Jovic said:
Hello,
I have a piece of code that I am trying to optimize. It is nothing
special - just some calculations, including trigonometric functions (bunch
of multiplications with sin and cos here and there). My code duplicates in
some places, but that is ok.
The questions are :
How complex are sin and cos functions? Are they simple look up tables? Or
are they some super-complex calculations.
If translated to multiplication/addition, how many
multiplications/additions would one sin or cos involve?
I am well aware this is implementation dependant, but it is probably done
in very similar fashion.
on x86, and probably other major architectures, these functions are
essentially built right into the processor, and are typically fairly fast.
most of the 'slowness' one may experience with them (in some cases), is
actually due to things the compiler is doing, rather than the operations
themselves. for example, MSVC defaults to using fairly stupid/inefficient
implementations of the math functions, and their "optimization" is simply to
use less stupid implementations (typically excluding some special case
handling, usually for NaN, Inf, denormals, ...).
(one can also get a major speedup by writing their own ASM functions which
simply send the values directly to the processor and return the result, if
possibly slightly less "safe" at edge cases).
but, at least in a general-purpose floating-point sense, it is unlikely one
will be able to write much of anything faster than the built-in functions.
if using fixed point integer though, and willing to settle with a little
reduced accuracy, it is possible to replace them by a table, which is
typically a little faster, but this is more because conversions between
floats and integers tend to be expensive.
this usually involves either using a modified angle scheme (256 degrees in a
circle, ...), or doing a fixed-point multiply to get the angle into the
correct base, and masking it off to get the array index.
say, 20.12 fixed point scheme in radians(val).
i=(val*10420)>>8; //scale to be in a 256 degree system (0.8
fixed-multiply, result still 20.12)
i=(i>>8)&4095; //convert into 12 bit table index
ret=foo_sintab
;
however, usually one avoids fixed point unless really needed, since with
most modern computers it is generally slower (as well as being less
accurate) than the use of floats, except in a few edge cases, such as
processing integer input data to integer output data, as would take place in
a video or audio codec, where often calucaltions such as the DCT or MDCT are
done with fixed point and lookup tables, ...
the main reason for use in codecs is because often otherwise a lot of time
would get used up in conversions between floating point and integer values,
....
or such...