std::norm defect?

K

Kai-Uwe Bux

SG said:
That's why I tried to give reasons for that. I'm sorry if it sounded
condescending.


That's not what I tried to communicate. I tried to present an
explanation for it. I guess I failed.

Now, rereading what you wrote, you actually succeeded :)
I merely said that the last part of it should be obvious. The last
part of the reasoning compared
real * real * sqr( imag/real )
to
imag * imag

Oops, I see. Sorry for the noise.

Taking both parts of the argument together, it provides a pretty good a
priori reason to think that norm_c should be better than norm_b. I guess
that is what you wanted to convey and what I failed to realize.


Best

Kai-Uwe Bux
 
Z

Zeppe

highegg wrote [02/03/09 20:53]:
That is exactly what I meant by taking the standard too literally.
std::abs does not, and can not, return the magnitude of x, because in
most cases the magnitude of x is not representable. You should
interpret the standard here as "return an approximation to..." because
nothing else makes sense. With that interpretation, of course, you
lose your reasoning.

yes, I was referring to consistency between functions. But I agree, this
is not mandatory anyway I guess.

I don't think so. The implementation that is turned on by ffast-math
is not violating anything.
But this is what this dispute is all about.

This is the original post in which the gcc implementers discussed the
issue: http://gcc.gnu.org/ml/gcc/2001-10/msg01593.html

You'll find out that the issue that convinced them to leave your
implementation as an option is a loss of precision in certain cases.
You can find numerous applications for sumsq. Nearest neighbour
identification, for instance. Not commonly done in complex spaces,
admittedly.
It's difficult to show real-life examples on demand, but I think most
Octave's user would agree that a factor 8.3 speed-up for a core
reduction function is worth the trouble, even if they don't use it
every other day.

I partly agree, but I value so much "robust" functions that behave
consistently the most of the times. I mean, consistently with the common
sense rather than the standard clauses.
You can only expect that if it is defined that way.

fair enough, but again: if the standard leaves me the freedom to
implement the function in two different ways, I'm not sure I'd go for
the optimised one when another one would give me a more robust result.

Best wishes,

Zeppe
 
H

highegg

highegg wrote [02/03/09 20:53]:
That is exactly what I meant by taking the standard too literally.
std::abs does not, and can not, return the magnitude of x, because in
most cases the magnitude of x is not representable. You should
interpret the standard here as "return an approximation to..." because
nothing else makes sense. With that interpretation, of course, you
lose your reasoning.

yes, I was referring to consistency between functions. But I agree, this
is not mandatory anyway I guess.
I don't think so. The implementation that is turned on by ffast-math
is not violating anything.
But this is what this dispute is all about.

This is the original post in which the gcc implementers discussed the
issue:http://gcc.gnu.org/ml/gcc/2001-10/msg01593.html

You'll find out that the issue that convinced them to leave your
implementation as an option is a loss of precision in certain cases.

I see. Good lord. Thanks a billion for that link. So there *is* a
(minuscule) numerical advantage after all. Thinking about underflow
didn't occur to me. As if I didn't know xLASSQ from LAPACK! What a
blunder.
Still, the current implementation is sub-optimal, because doing the
sqrt is really not necessary. It also makes the result less smooth.
Hmmm. But I suppose I'm free to propose a better implementation (if
only I had the time. sigh.)

I wonder what to do for Octave's sumsq, though. I don't want to lose
the factor 8.3 in performance. Doing a splitting point trick will
solve this easily in the column reduction case, but seems painful for
row reduction. hmmmmmm. It's exactly like the min/max vs. NaNs issue.

anyway, many thanks for the insight.
 
S

SG

This is the original post in which the gcc implementers discussed the
issue:http://gcc.gnu.org/ml/gcc/2001-10/msg01593.html

You'll find out that the issue that convinced them to leave your
implementation as an option is a loss of precision in certain cases.

Interesting. Still, I'd like to point out that the cases they tested
were dangerous corner cases anyways. Everyone should try hard to
avoid de-normalized FP numbers. They optimized these corner cases at
the cost of reduced accuracy and performance for 99,9999999% of all
the other cases. Looks like a bad choice to me.

Cheers!
SG
 
H

highegg

Interesting. Still, I'd like to point out that the cases they tested
were dangerous corner cases anyways.  Everyone should try hard to
avoid de-normalized FP numbers.  They optimized these corner cases at
the cost of reduced accuracy and performance for 99,9999999% of all
the other cases. Looks like a bad choice to me.

Cheers!
SG

SG,
it really depends. You see, in languages like C++ or Fortran,
encapsulated functions for seemingly simple operations typically
prefer numerical robustness, because programmers can (usually)
relatively easily plug in the straightforward formula if they're only
interested in performance, as user-coded functions carry no penalty
compared to built-in functions.

An instance is LAPACK's LASSQ, which uses a sum-and-scale approach to
calculating sum of squares, precisely to avoid underflows, or even
overflows (in some cases it is not even necessary to form the result
explicitly).
An user not interested in these issues can easily code a
straightforward loop (or array reduction in Fortran 90).

However, in interpreted languages like Octave (which I'm a co-
developer of, and which brought me here), the situation is different,
because user-coded functions typically *do* carry a performance
penalty compared to built-in functions, often a significant one (and
Octave still has no JIT :-(). That's why I intend to leave the
behavior of sumsq in the current state, using the straightforward
formula instead of std::norm. Those who prefer accuracy should use
"norm" and square the result.
The funny thing is that I actually faced the same issue before when
reimplementing "norm" for Octave 3.2 (which will avoid any underflows
and overflows etc). I just failed to see it this time.

Still, the std::norm implementation in gcc *can* be improved; the
square root can indeed be avoided, while avoiding the underflows as
well. It's just not that easy.
If I find the time, I'll send them a patch for that part.

many thanks to all who responded
 
S

SG

I see. Good lord. Thanks a billion for that link. So there *is* a
(minuscule) numerical advantage after all. Thinking about underflow
didn't occur to me.

I see that you revoked your GCC bug report. I was just about to
support you on this. :)

The value of this "numerical advantage" is at least questionable.
This implementation only improves the accuracy in a rather small
subset of corner cases that are dangerously close to "underflow world"
anyways. In all other cases the precision is worse.

IMHO, this doesn't justify an implementation of std::norm that is also
slower.

Cheers!
SG
 
S

SG

SG,
it really depends. You see, in languages like C++ or Fortran,
encapsulated functions for seemingly simple operations typically
prefer numerical robustness, [...]

An instance is LAPACK's LASSQ, which uses a sum-and-scale approach to
calculating sum of squares, precisely to avoid underflows, or even
overflows (in some cases it is not even necessary to form the result
explicitly).

This is all fine and dandy. I would agree with you that it's worth
the hassle to prevent underflows/overflows for things like std::abs.
A std::abs function with overflow/underflow prevention effectivly
doubles the cases it works on with reasonable accuracy. Instead of
only supporting exponents in the range of -511 ... 511 (rough
estimate) it'll work also in cases where numbers with exponents in the
range -1022 ... 1022 are involved (assuming IEEE 754 doubles). That's
a big difference that is worth mentioning.

But I don't see where a complicated std::norm like that helps
anybody. If your program works with numbers/vectors where the squared
norm is almost de-normalized you have a problem either way. A
complicated version of std::norm isn't going to help here much except
that it extends "its domain of not-sucking" by only a teensy bit.
That's not even worth mentioning, IMHO.

simple std::norm version:
-------------------------
PRO: better accuracy in most cases
PRO: fast
CON: bad accuracy on a tiny fraction of cases
that are dangerously close to values where
even the complicated version won't work

complicated std::norm version:
------------------------------
PRO: Underflow prevention for an insignificant
fraction of cases (not really a PRO, is it?)
CON: slow
CON: worse overall accuracy compared to simple
version for almost every case.


Am I making sense?


Cheers!
SG
 
M

Martin Eisenberg

SG said:
On 3 Mrz., 11:28, Zeppe


Interesting. Still, I'd like to point out that the cases they
tested were dangerous corner cases anyways.

In approximating functions, when one makes the right thing happen at
the esoteric extremes it's not for their own sake but because it
typically constrains deviations at not quite so far-out but still
large exponents.
Everyone should try hard to avoid de-normalized FP numbers. They
optimized these corner cases at the cost of reduced accuracy and
performance for 99,9999999% of all the other cases.

It's true that formulations with a lower incidence of denormalized
intermediates are preferable, but denormals exist for when that won't
suffice. Do you feel std library providers are entitled to ignore
them?

A way to implement norm() with good accuracy everywhere and no
complicated operations would be to apply a resumming dot-product
algorithm, such as presented in the paper at
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.2.1547 .


Martin
 
H

highegg

SG,
it really depends. You see, in languages like C++ or Fortran,
encapsulated functions for seemingly simple operations typically
prefer numerical robustness, [...]
An instance is LAPACK's LASSQ, which uses a sum-and-scale approach to
calculating sum of squares, precisely to avoid underflows, or even
overflows (in some cases it is not even necessary to form the result
explicitly).

This is all fine and dandy.  I would agree with you that it's worth
the hassle to prevent underflows/overflows for things like std::abs.
A std::abs function with overflow/underflow prevention effectivly
doubles the cases it works on with reasonable accuracy.  Instead of
only supporting exponents in the range of -511 ... 511 (rough
estimate) it'll work also in cases where numbers with exponents in the
range -1022 ... 1022 are involved (assuming IEEE 754 doubles).  That's
a big difference that is worth mentioning.

But I don't see where a complicated std::norm like that helps
anybody.  If your program works with numbers/vectors where the squared
norm is almost de-normalized you have a problem either way.  A
complicated version of std::norm isn't going to help here much except
that it extends "its domain of not-sucking" by only a teensy bit.
That's not even worth mentioning, IMHO.

simple std::norm version:
-------------------------
PRO: better accuracy in most cases
PRO: fast
CON: bad accuracy on a tiny fraction of cases
     that are dangerously close to values where
     even the complicated version won't work

complicated std::norm version:
------------------------------
PRO: Underflow prevention for an insignificant
     fraction of cases (not really a PRO, is it?)
CON: slow
CON: worse overall accuracy compared to simple
     version for almost every case.

Am I making sense?

Just an update: I've sent the GCC developers the following patch to
the "complex" header:

Index: libstdc++-v3/include/std/complex
===================================================================
--- libstdc++-v3/include/std/complex (revision 144596)
+++ libstdc++-v3/include/std/complex (working copy)
@@ -649,8 +649,13 @@ _GLIBCXX_BEGIN_NAMESPACE(std)
template<typename _Tp>
static inline _Tp _S_do_it(const complex<_Tp>& __z)
{
- _Tp __res = std::abs(__z);
- return __res * __res;
+ const _Tp __x = abs(__z.real());
+ const _Tp __y = abs(__z.imag());
+ const _Tp __s = std::max(__x, __y);
+ const _Tp __r = std::min(__x, __y);
+ if (__s == _Tp())
+ return __s;
+ return __s * (__s + __r * (__r / __s));
}
};

This seems to win over the present implementation in terms of both
performance and accuracy. It's still not as fast as the
straightforward formula, but I'm now convinced that it *is* a good
idea to avoid the underflow issues.

best regards
 
S

SG

Just an update: I've sent the GCC developers the following patch to
the "complex" header:

Index: libstdc++-v3/include/std/complex
===================================================================
--- libstdc++-v3/include/std/complex    (revision 144596)
+++ libstdc++-v3/include/std/complex    (working copy)
@@ -649,8 +649,13 @@ _GLIBCXX_BEGIN_NAMESPACE(std)
       template<typename _Tp>
         static inline _Tp _S_do_it(const complex<_Tp>& __z)
         {
-          _Tp __res = std::abs(__z);
-          return __res * __res;
+          const _Tp __x = abs(__z.real());
+          const _Tp __y = abs(__z.imag());
+          const _Tp __s = std::max(__x, __y);
+          const _Tp __r = std::min(__x, __y);
+          if (__s == _Tp())
+            return __s;
+          return __s * (__s + __r * (__r / __s));
         }
     };

This is me just nit-picking: You used qualified names for max and min.
In generic user code this is discouraged because it prevents ADL. But
ADL is all you have left if you want min/max calls to resolve to your
own function templates:

namespace mymath
{
template<typename Integer> class rational;

template<typename Integer>
rational<Integer> min(rational<Integer> const& a,
rational<Integer> const& b);
}

This is the only possibility because you can't partially specialize
std::min and std::max. Also, overloading them in the namespace "std"
is illegal as far as the C++ standard is concerned (it usually works,
though).

So, in generic user code (some other namespace than std) you would
write:

template<typename T>
T do_it(const complex<_Tp>& __z)
{
using std::min;
using std::max;
...
const T s = max(x,y); // ADL still works
}

If I remember correctly this is covered in "Effective C++" by Scott
Meyers.


Cheers!
SG
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,161
Messages
2,570,892
Members
47,431
Latest member
ElyseG3173

Latest Threads

Top