"Malcolm said:
This was a perfectly good and interesting thread on how to implement or
optimise the acos() function, ruined by this sort of C content-free
bickering.
If you want a very fast implementation of the acos() function with good
precision: Have a look first at the cephes implementation to get an idea
how their method works. Download
http://www.netlib.org/cephes/cmath.tgz
The acos function is together with asin in asin.c
Now do a few improvements to make it faster:
1. The cephes implementation uses an approximation by a rational
function (one polynomial divided by another polynomial). It is better to
use a single polynomial of higher degree to avoid division. Evaluating
high degree polynomials is _not_ done using the Horner scheme because
latency kills you; for example a polynomial of degree 7 might be
evaluated as
((ax+b)*x^2 + (cx+d))*x^4 + ((ex+f)*x^2 + (gx+h))
This will need some experimentation which scheme is fastest on your
processor.
2. For x > 0.5, the cephes scheme calculates 2*asin (sqrt ((1-x)/2)).
Change this as follows: If you substitute the asin formula then you will
get an odd polynomial in sqrt ((1-x)/2). That is the same as
sqrt((1-x)/2) multiplied by a polynomial in ((1-x)/2). Calculate
(1-x)/2, then calculate the square root and the polynomial in parallel
to avoid latency. The code for square root needs to be inlined; it can
be simplified because you know the range of the argument. And I hope you
know how to calculate a square root without using divisions.
3. Instead of using two different methods for x <= 0.5 and x > 0.5, use
a different point for the switch between these methods. The reason is
that the method used for large values is slower; if you restrict the
range where the slower method is used to fewer arguments then the
average runtime will be better.