Efficiency of math.h

Dave Rudolf · Feb 25, 2004

Hi all,

Normally, I would trust that the ANSI libraries are written to be as
efficient as possible, but I have an application in which the majority of
the run time is calling the acos(...) method. So now I start to wonder how
that method, and others in the math.h library, are implemented.

Dave

Joona I Palaste · Feb 25, 2004

Dave Rudolf said:
Hi all,

Normally, I would trust that the ANSI libraries are written to be as
efficient as possible, but I have an application in which the majority of
the run time is calling the acos(...) method. So now I start to wonder how
that method, and others in the math.h library, are implemented.

This depends completely on the C compiler used and is not standardised.
If they fulfill the specification in the ISO C standard, they might be
implemented by carrier pigeons, for all C cares.

Mike Wahler · Feb 25, 2004

Dave Rudolf said:
Hi all,

Normally, I would trust that the ANSI libraries are written to be as
efficient as possible,

I would *hope* that, not necessarily "trust".

but I have an application in which the majority of
the run time is calling the acos(...) method.

I seriously doubt that the C runtime calls 'acos()' at all.
Perhaps the application itself does.

So now I start to wonder how
that method, and others in the math.h library, are implemented.

However a given implementor does it. The language standard
only specifies *behavior*, not implementation or performance.
Such performance also necessarily depends upon the underlying
hardware.

-Mike

Jens.Toerring · Feb 25, 2004

Dave Rudolf said:
Normally, I would trust that the ANSI libraries are written to be as
efficient as possible, but I have an application in which the majority of
the run time is calling the acos(...) method. So now I start to wonder how
that method, and others in the math.h library, are implemented.

There's nothing in the ANSI standard that requires a certain
implementation, so acos() could be slow as molasses without
any violation of the standard. From this also follows that
your question can't be answered in this group, since it's
not a question about standard C but about the quality of the
implementation you're using. So your best bet is asking in
a group devoted to your compiler/linker/standard library
combination, or, if it's a commercial product, the company
that wrote them.

<OT>
I would guess that on a machine with a floating point unit
it will be used by the acos() library function, so in this
case you probably won't get much faster with some fancy
optimizations.
</OT>
Regards, Jens

E. Robert Tisdale · Feb 25, 2004

Dave said:
Normally, I would trust that
the ANSI libraries are written to be as efficient as possible

The ANSI/ISO C standards do *not* specify performance.
The "quality" of the implementation is determined by "market forces".
It is up to you to evaluate the performance of your implementation
and shop around for a better one if you find it inadequate.

but I have an application in which the majority of the run time
how is calling the acos(...) method. So now I start to wonder
that method, and others in the math.h library, are implemented.

The arc cosine function is expensive to compute.
It's hard to beat commercial C math library developers
but you should shop around.

Guillaume · Feb 25, 2004

Normally, I would trust that the ANSI libraries are written to be as
Impossible to tell. Especially not for time-critical duties. You're not
even defining what you call "efficient".

I seriously doubt that the C runtime calls 'acos()' at all.
Perhaps the application itself does.

That's probably what he meant.

FYI, in C we don't speak of "methods" but rather of "functions".

Nobody but the library's author can answer that question.
Optimizing is a whole job in itself, and it starts with profiling. It
seems like you profiled some and found the culprit. Now it's up to
you to improve your code: since using acos() from the standard library
on your particular platform is too slow for your needs, you'll have
to either 1) not use it at all (rewriting your numerical algorithm may
help) or 2) if you absolutely can't do without it, find a speedier
implementation of acos for your platform (it may involve using some
assembly, or whatever else that you may see fit).

As a general (but not absolute) rule, the C standard library is
implemented more with "exactness" in mind than "speed". For all
maths functions, that means that they are generally implemented as to
lead to the best precision possible, possibly sacrifying speed.

You'll have to explore other paths: writing your own maths functions,
or find a third-party library that will fit your needs.

Peter Ammon · Feb 25, 2004

Dave said:
Hi all,

Normally, I would trust that the ANSI libraries are written to be as
efficient as possible,

As possible given what? Performance is important to most compiler
implementors, but it's often third down on the list, after 1)
correctness and 2) portability.

And not just correctness according to the ANSI C standard, but
correctness according to the relevant IEEE floating point standards,
with sane handling of NaN and plus or minus infinity. And also
correctness for very large numbers, very small numbers, etc.

Those checks may not be necessary in your program, but you're still
paying for them.

but I have an application in which the majority of
the run time is calling the acos(...) method. So now I start to wonder how
that method, and others in the math.h library, are implemented.

Why not get libm from GNU and find out? I wouldn't be surprised if you
could beat acos() with "good enough" precision over a particular range
(whatever you're interested in). Try using a Taylor expansion.

Eric Sosman · Feb 25, 2004

Dave said:
Hi all,

Normally, I would trust that the ANSI libraries are written to be as
efficient as possible, but I have an application in which the majority of
the run time is calling the acos(...) method. So now I start to wonder how
that method, and others in the math.h library, are implemented.

<topicality level="marginal">

Others have pointed out that there are no guarantees of
efficiency in the Standard, and that efficiencies will vary
from one implementation to the next. However (and although
there's no guarantee of this, either), it seems likely that
implementors will consider accuracy and "good behavior" to
be more important than efficiency. After all, if efficiency
were paramount, one could get away with

double acos(double x) { return 42.0; }

There's actually a little bit of sense behind this joke
implementation: It should prompt you to ask how much accuracy
you actually need in your application. For example, if you
only need a few decimal places' worth, you might gain speed
by pre-computing a table of accurate acos() values and then
doing crude interpolations in the table to get the quick-and-
dirty values you need in such great quantities.

... which also raises the question: Why does the application
need such a huge number of acos() values?

</topicality>

CBFalconer · Feb 25, 2004

Peter said:
.... snip ...

Why not get libm from GNU and find out? I wouldn't be surprised
if you could beat acos() with "good enough" precision over a
particular range (whatever you're interested in). Try using a
Taylor expansion.

Not if you want speed you don't. With some slight trigonometry
you should be able to convert cos(angle) into tan(angle). Use
this as input to an arctan(value) routine, which can in turn be
built in various ways. One uses a levelled Tschebychev
approximation over 0 to 1.

E. Robert Tisdale · Feb 25, 2004

CBFalconer said:
Peter Ammon wrote:

... snip ...

Not if you want speed you don't.
With some slight trigonometry,
you should be able to convert cos(angle) into tan(angle).
Use this as input to an arctan(value) routine,
which can in turn be built in various ways.
One uses a leveled Tschebychev approximation over 0 to 1.

Not if you want speed you don't.

This method requires reading a bunch of high precision
floating-point numbers from system memory
which can actually require more time than computing
Taylor coefficients in registers on-the-fly.

You're out of your depth here Chuck.

Christian Bau · Feb 25, 2004

"Dave Rudolf said:
Hi all,

Normally, I would trust that the ANSI libraries are written to be as
efficient as possible, but I have an application in which the majority of
the run time is calling the acos(...) method. So now I start to wonder how
that method, and others in the math.h library, are implemented.

Since I have actually not used the acos () function _once_ in over
twenty years of C programming, including tons of numerical mathematics,
3D graphics etc., I am just wondering what you are doing.

You might consider posting what you are actually trying to do, and it is
very likely that there will be ways to reduce the number of calls to
acos () considerably.

E. Robert Tisdale · Feb 25, 2004

Christian said:
Since I have actually not used the acos() function _once_
in over twenty years of C programming,

So... you're a newbie eh?

including tons of numerical mathematics, 3D graphics, etc.

It appears in signal processing applications --
a Dolph-Chebychev window function for example.

Christian Bau · Feb 25, 2004

"E. Robert Tisdale said:
Not if you want speed you don't.

This method requires reading a bunch of high precision
floating-point numbers from system memory
which can actually require more time than computing
Taylor coefficients in registers on-the-fly.

You're out of your depth here Chuck.

Tisdale, you are clueless. As usual.

The fastest approach would be to split the argument range into two
parts, abs (x) <= c and abs (x) >= c for some suitably chosen c,
probably somewhere around 0.7 or 0.8. For abs (x) <= c, approximate acos
x by a polynomial of the form (pi - ax - bx^3 - cx^5...). For x > c,
define f (z) = acos (1 - z^2). Appromiate f (z) by a polynomial of the
form ax + bx^3 + cx^5 + dx^7. Calculate acos (x) = f (sqrt (1-x)). For x
< -c use acos (x) = pi () - acos (-x) = pi () - f (sqrt (1+x)). Take a
higher polynomial depending on how much precision you want. Finding the
coefficients is left as an exercise to the reader

The substitution z = sqrt (1-x) nicely catches the behavior of acos x
for abs (x) close to 1, where the first derivative is unbounded and
avoids excessive rounding errors.

Nick Landsberg · Feb 25, 2004

E. Robert Tisdale said:
So... you're a newbie eh?

It appears in signal processing applications --
a Dolph-Chebychev window function for example.

I think the point of Christian's posting was that
he would like more information from the OP why the
acos() function was called so often. If it is something
that requires high precision and MUST be called often
(as in a signal processing application), then it is
one thing.

If it does not require high precision or is used
gratuitously, there might be better ways
to accomplish the same thing. As one poster pointed
out, a rough extrapolation between values in a static
array may satisfy the requirement. Without knowing
the degree of precision needed, we just don't know.

Christian Bau · Feb 25, 2004

Nick Landsberg said:
I think the point of Christian's posting was that
he would like more information from the OP why the
acos() function was called so often. If it is something
that requires high precision and MUST be called often
(as in a signal processing application), then it is
one thing.

If it does not require high precision or is used
gratuitously, there might be better ways
to accomplish the same thing. As one poster pointed
out, a rough extrapolation between values in a static
array may satisfy the requirement. Without knowing
the degree of precision needed, we just don't know.

Actually, Tisdale just had to prove once again his stupidity: If you
read an article about the Dolph-Chebychev window function, the first
thing that you see is that Chebychev functions are defined as cos (n *
acos (x)) - and they are easily calculated using a trivial recursion
formula that doesn't need anything more complicated than adding and
multiplying.

Only an idiot would calculate them using the acos () function.

E. Robert Tisdale · Feb 25, 2004

Christian said:
The fastest approach would be to split the argument range into two
parts, abs(x) <= c and abs(x) >= c for some suitably chosen c,
probably somewhere around 0.7 or 0.8. For abs (x) <= c, approximate acos
x by a polynomial of the form (pi - ax - bx^3 - cx^5...). For x > c,
define f(z) = acos(1 - z^2). Approximate f(z) by a polynomial of the
form ax + bx^3 + cx^5 + dx^7. Calculate acos(x) = f(sqrt(1 - x)).
For x < -c use acos(x) = pi() - acos(-x) = pi() - f(sqrt(1 + x)).
Take a higher polynomial depending on how much precision you want.
Finding the coefficients is left as an exercise to the reader

The substitution z = sqrt(1-x) nicely catches the behavior of acos x
for abs(x) close to 1, where the first derivative is unbounded
and avoids excessive rounding errors.

Have you every implemented and tested this "algorithm"?
If so please publish it here so that we can "benchmark" it
against our C library implementations of acos. I'll bet you $1.00 that
I can find one that beats your implementation.

E. Robert Tisdale · Feb 25, 2004

Christian said:
If you read an article about the Dolph-Chebychev window function,
the first thing that you see is that Chebychev functions
are defined as cos (n*acos(x)) - and they are easily calculated
using a trivial recursion formula that doesn't need anything
more complicated than adding and multiplying.

Please show us the formula.

E. Robert Tisdale · Feb 26, 2004

Nick said:
I think the point of Christian's posting was that
he would like more information from the [Dave Rudolf]
[about] why the acos() function was called so often.
If it is something that requires high precision
and MUST be called often
(as in a signal processing application),
then it is one thing.

If it does not require high precision
or [high precision] is used gratuitously,
there might be better ways to accomplish the same thing.
As one poster pointed out, a rough [interpolation]
between values in a static array may satisfy the requirement.
Without knowing the degree of precision needed,
we just don't know.

Unfortunately,
I don't know what degree of precision is required either.
We can only hope that Dave Rudolf is still following this thread
and will reply with the answer.

But I'll tell you what you can do without knowing any more
about Dave Rudolf's precision requirements.
You can implement you proposed interpolation scheme
and benchmark it against the acos provided with your C compiler.
I predict that you will find that your interpolation scheme
is *slower* even if you need just a few decimal digits precision.
Please post your implementation so that we can repeat
your benchmark experiments ourselves.

There is a reason why math functions like acos
are included in the standard library.
There is *no* single portable implementation
that will deliver performance and accuracy everywhere.
Compiler developers hire experts to customize implementations
for each target platform and, usually, they are hard to beat.

Nick Landsberg · Feb 26, 2004

E. Robert Tisdale said:
Please show us the formula.

Now cut that out! I'm a relative newbie to c.l.c in this
incarnation but I used to dwell here 15 years ago or so.

Neither of you is making a good name for themselves
by airing your differences in a public forum, and,
you are both making a case FOR the folks who want
c.l.c "expanded" to take into account other stuff
than just pure "C".

This is an algorithm issue. Move it and your
petty squabbles to the algorithms newsgroup, please.

--Grumpy--

P.S. - the above formula, expanded, becomes
cos(n) * cos(acos(x)) ---
unless I've forgotten my trig after 45 years.
Since cos(acos(X)) degenerates to X, it should
be replaceable with ...
cos(n) * x...

where have I made a mistake? (honestly asking,
not baiting.)

Dik T. Winter · Feb 26, 2004

> CBFalconer wrote: ....
>
> Not if you want speed you don't.
>
> This method requires reading a bunch of high precision
> floating-point numbers from system memory
> which can actually require more time than computing
> Taylor coefficients in registers on-the-fly.

Yup, you are clueless.
1. Calculating Taylor coefficients on the fly requires division, which
is on most processors much slower than multiplication.
2. The number of coefficients required in a Tschebychev approximation is
less than the number of Taylor coefficients needed. For instance,
to get a good sine in double precision on [0, pi/2] you can use only
6 terms of the Tschebychev expansion of (sin(x)/x - 1). (This
formula to get good relative precision near x = 0.)

Testing math.h	1	Dec 24, 2006
C99 float variants of math.h functions	6	Aug 2, 2009
How Do I See math.h Implementations	24	Sep 29, 2010
Efficiency of s///e?	9	May 17, 2013
Read efficiency?	6	Feb 21, 2010
Improving efficiency of a bytecode interpreter	45	Oct 27, 2009
Raspberry Pi Open Source PLC Communication Wonder LECPython, and Example of Communication with Omron PLC	0	Oct 9, 2024
Loop Efficiency	17	Feb 23, 2009

Efficiency of math.h

Dave Rudolf

Joona I Palaste

Mike Wahler

Jens.Toerring

E. Robert Tisdale

Guillaume

Peter Ammon

Eric Sosman

CBFalconer

E. Robert Tisdale

Christian Bau

E. Robert Tisdale

Christian Bau

Nick Landsberg

Christian Bau

E. Robert Tisdale

E. Robert Tisdale

E. Robert Tisdale

Nick Landsberg

Dik T. Winter

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads