sqrtl(), etc. for Cygwin

  • Thread starter James Dow Allen
  • Start date
G

glen herrmannsfeldt

(snip, I wrote)
Mathematically, ((r + x/r)/2) and (x/r + (r - x/r)/2) are equivalent. Is
there a numerical reason to prefer one over the other? (Obviously, if
there are no such concerns, one might as well use the simpler one.)

In base 2 floating point, either one works. The final divide just
changes the exponent. In any other base, though, the simpler one can
lose precision.

In two digit decimal, (32+34)/2 is 33, but (64+68)/2 is 65.

64+68 is 13e1 and 13e1/2 is 65.

It is used for IBM base 16 floating point and, will be needed
for the now part of the IEEE standard, decimal float.

-- glen
 
P

Phil Carmody

Philip Lantz said:
Mathematically, ((r + x/r)/2) and (x/r + (r - x/r)/2) are equivalent. Is
there a numerical reason to prefer one over the other? (Obviously, if
there are no such concerns, one might as well use the simpler one.)

Given that r and x/r are very close (on the assumption that there's a good
initial estimate, which holds in this case), the (r - x/r) expression is a
case of catastrophic loss of precision, which is normally something you
want to avoid like the plague. Gut feel would make me prefer the original.

Phil
 
G

glen herrmannsfeldt

Phil Carmody said:

(snip, then I wrote)
Given that r and x/r are very close (on the assumption that there's a good
initial estimate, which holds in this case), the (r - x/r) expression is a
case of catastrophic loss of precision, which is normally something you
want to avoid like the plague. Gut feel would make me prefer the original.

That is certainly true in a large number of algorithms, but you should
look more carefully at this one.

Yes the difference can have a large error relative to its (possibly
small) magnitude, but the absolute error is the same as the two
operands if the exponents are the same.

Note that the more cancellation of low digits, means that you are
closer to the correct square root, and so is better, not worse!

Unless the sum is kept with extra precision, in non-binary (base 2)
floating point, a low digit can be lost in the sum that can't be
restored in the divide by two.

It is the lower (relative) precision of the difference that allows for
no loss on the divide by two.

-- glen
 
P

Phil Carmody

glen herrmannsfeldt said:
(snip, then I wrote)



That is certainly true in a large number of algorithms, but you should
look more carefully at this one.

Yes the difference can have a large error relative to its (possibly
small) magnitude, but the absolute error is the same as the two
operands if the exponents are the same.

Note that the more cancellation of low digits, means that you are
closer to the correct square root, and so is better, not worse!

Agreed. Hence my "normally".
Unless the sum is kept with extra precision, in non-binary (base 2)
floating point, a low digit can be lost in the sum that can't be
restored in the divide by two.

Anyone not using binary FP deserves all ills that come to them!
It is the lower (relative) precision of the difference that allows for
no loss on the divide by two.

The thing I like about the sum is its symmetry.
With the half-the-difference method, there are two ways of approaching the
middle.
(r + (x/r-r))/2
(x/r + (r-x/r))/2
I don't see why one should chose one rather the other. And in particular
why you should have chosen the one with the longer expression.

Phil
 
G

glen herrmannsfeldt

(snip, I wrote)
(snip)
(snip)
Anyone not using binary FP deserves all ills that come to them!
The thing I like about the sum is its symmetry.
With the half-the-difference method, there are two ways of approaching the
middle.
(r + (x/r-r))/2
(x/r + (r-x/r))/2
I don't see why one should chose one rather the other. And in particular
why you should have chosen the one with the longer expression.

Once the values are in registers, it doesn't make so much difference,
but yes, I probably would do the former, to avoid the possibility of
doing the divide twice.

I first learned about this from the OS/360 routines written in S/360
Assembler. IBM S/360 uses base 16 floating point. All the techniques
to avoid precision loss were well studied at the time, but maybe
now forgotten. Now, IEEE has standardized decimal float, so all
these will be brought back out again.

-- glen
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,077
Messages
2,570,567
Members
47,203
Latest member
EmmaSwank1

Latest Threads

Top