std::sort causes segfault when sorting class arrays

J

James Kanze

No, the outcome is implementation defined. Arguably, your
implementation has the _better_ result.

The outcome is definitely not implementation defined. The
accuracy (and repeatability?) of expression evaluation is
unspecified.
Yup, and that's ok with the standard by [5/10].

Within an expression. As far as I know, the value with extended
precision *must* be rounded to double when it is used to
initialize a double, e.g. the return value.

In g+++, this is under control of the option -ffloat-store.
Regretfully, the default results in the non-conformant and
unexpected behavior, and the documentation of the option
suggests that it usually doesn't matter, where as it often does.
Actually, _that_ is not the reason I heard. I thaught equality
is very likely not what you want anyway because of what the
floating point numbers in your program represent.

Actually, I've yet to hear anyone who really knows floating
point make such a recommendation. (It certainly doesn't apply
here, because there is no comparison for equality in the
original code.) What I have heard is that to use machine
floating point, you need to understand it first. And that it
can have some rather unexpected properties; the fact that you
can't actually know the precision of intermediate values---that
a double isn't necessarily a double---is perhaps the most
surprising ones (and is, of course, not inherent in floating
point, but rather a particularity of one particular
implementation).
However, it is true that the excess precision in registers can
screw up comparisons. But it can be prevented forcing a
write-reread cycle.

Not with g++, unless you specify -ffloat-store.
The program
#include <iostream>
#include <iomanip>
#include <cmath>
using namespace std;
bool equal ( double const & lhs, double const & rhs ) {
return ( lhs == rhs );
}
int main ( void ) {
double x = 1;
double y = 1;
cout
<< boolalpha
<< equal( sin(x)+cos(y), sin(x)+cos(y) ) << '\n';
}
prints true. I am not certain whether the standard requires
that or not, but since the OP also uses gcc on an intel
platform, the example pertains to the behavior he observes.

Note, however, that the above doesn't force a write-reread cycle
any more than the original, at least according to the standard.
(None of the writes or reads are "observable behavior".) It
would be interesting to look at the generated code under
optimization; I wouldn't be surprised if g++ drops both writes,
and that it works because neither value has been truncated to
double, not because both have been.
 
K

Kai-Uwe Bux

James said:
Thomas J. Gritzan wrote: [snip]
The x86 floating point co-processor operates on 80 bit
floating point values. Temporaries stored in memory are in
64 bit double precision format. So by storing the first
result into a temporary, it was truncated and rounded to 64
bit, and by loading it back, it was extended to 80 bit
again. The second result still is 80 bit when the two are
compared.
Yup, and that's ok with the standard by [5/10].

Within an expression. As far as I know, the value with extended
precision *must* be rounded to double when it is used to
initialize a double, e.g. the return value.

Which clause says that? As said elsethread, I am worried that RVO might
_allow_ the compiler to cut corners in some unexpected cases.


[snip]
Not with g++, unless you specify -ffloat-store.

I don't understand: are you implying that a write-reread cycle would not be
sufficient to truncate the values _or_ that with g++ one needs the
option -ffloat-store to force a write-reread cycle? You make it sound like
you contradict what I am saying whereas, in fact, you might not.
Note, however, that the above doesn't force a write-reread cycle
any more than the original, at least according to the standard.

The original, has

sin(x)+cos(y) == sin(x)+cos(y)

where the amended version has

equal( sin(x)+cos(y), sin(x)+cos(y) )

The difference is that equal() is a function call that requires
initialization of the parameters.
(None of the writes or reads are "observable behavior".)

I don't think that that matters, and it is not what I meant to imply by
using the term write-reread cycle. What _does_ matter is that an object is
initialized under certain circumstances and that that initialization
requires truncation.
It
would be interesting to look at the generated code under
optimization; I wouldn't be surprised if g++ drops both writes,
and that it works because neither value has been truncated to
double, not because both have been.

I had a look (before I posted), but unfortunately I do not understand
assembler all that well :-( So, it didn't really help.


Best

Kai-Uwe Bux
 
T

tharinda.gl

Hi,
I'm trying to write some image processing software, and as part of that I
need to sort an array of pixel values by their luminance...

I'm using std::sort to do this, which works OK.. as long as no two pixels
in the array have the same luminance value. In that case.... it
segfaults. Here's my code:

////// CODE BEGINS

// gcc -o test test.cpp
#include <algorithm>

using namespace std;

class Pixel {
        public:
                double r, g, b;

                Pixel(double _r=0, double _g=0, double _b=0) {
                        r = _r;
                        g = _g;
                        b = _b;
                }

                // Return the CIE luminance of this pixel
                double get_luminance() const {
                        return (r * 255.0 * 0.212671) + (g * 255.0 *
0.715160) + (b * 255.0 * 0.072169);
                }

                bool operator<(const Pixel &rhs) const { return
(get_luminance() < rhs.get_luminance()); }

};

int main(int argc, char **argv)
{
        for (int gx=0; gx<1000; gx++) {
                Pixel pixels[15 * 15];
                unsigned int pixel_count = 0;

                for (int ym=0; ym<15; ym++) {
                        for (int xm=0; xm<15; xm++) {
#ifndef MAKE_PIXELS_THE_SAME
                                pixels[pixel_count].r = (xm*15)+ym;
                                pixels[pixel_count].g = (xm*15)+ym;
                                pixels[pixel_count].b = (xm*15)+ym;
#else
                                pixels[pixel_count].r = 0.12;
                                pixels[pixel_count].g = 0.93;
                                pixels[pixel_count].b = 0.32;
#endif
                                pixel_count++;
                        }
                }

                sort(pixels, pixels+pixel_count);
        }

        return 0;

}

////// CODE ENDS

The segfault always happens on the call to sort().

I know I'm probably missing something blindingly obvious here, but I've
spent an hour hacking away at this and still can't see what I'm doing
wrong...

Can anyone help me figure out why my code isn't working?
(this is only the second time I've used sort(), and the first time I've
used it on an array, but my C++ book says it is possible to do the
latter... is my book wrong?)

Thanks,
Phil.

Well this segfault always happens to me (in g++) when I incorrectly
implement the comparison method (i.e forgetting the strict weak
ordering and just returning result of equal (==) operator.

Why don't you compare the difference with a predefined zero value when
dealing with floating point values?

ex:-

#define __ZERO__ 0.00001

bool operator ()(const double left, const double right) const
{
return left - right < __ZERO__;
}
 
J

Juha Nieminen

James said:
Quick sort will.

I don't understand why.

The partitioning process first selects one of the elements as the
pivot value, and then goes from the beginning towards the end and from
the end towards the beginning, until these two index values are the
same. Along the way it may swap pairs of values depending on how they
compare to the pivot value. However, I don't see how this can make the
index values go out of range. They are only compared to each other, not
to the array elements or the pivot element.
 
J

James Kanze

I don't understand why.
The partitioning process first selects one of the elements as
the pivot value, and then goes from the beginning towards the
end and from the end towards the beginning, until these two
index values are the same. Along the way it may swap pairs of
values depending on how they compare to the pivot value.
However, I don't see how this can make the index values go out
of range. They are only compared to each other, not to the
array elements or the pivot element.

The basic partitionning loop contains two smaller loops, one to
advance the bottom, and the other to descend the top. These
loops don't (necessarily) contain any check on the indexes,
since you've an established pre-condition that the pivot value
is in the sequence, and you'll stop when you hit it, if not
before.

The code for the partitioning in g++ is:

template<typename _RandomAccessIterator, typename _Tp, typename
_Compare>
_RandomAccessIterator
__unguarded_partition(_RandomAccessIterator __first,
_RandomAccessIterator __last,
_Tp __pivot, _Compare __comp)
{
while (true)
{
while (__comp(*__first, __pivot))
++__first;
--__last;
while (__comp(__pivot, *__last))
--__last;
if (!(__first < __last))
return __first;
std::iter_swap(__first, __last);
++__first;
}
}

Note the two inner loops. If __comp always returns true (or
even if it returns true from time to time when it shouldn't),
you'll get an array bounds violation.

If you're still not convinced, try:

struct BadCompare
{
bool operator()( int, int ) const
{
return true ;
}
} ;

int
main()
{
std::vector< int > v( 10000, 43 ) ;
std::sort( v.begin(), v.end(), BadCompare() ) ;
return 0 ;
}

compiled with bounds checking turned on (-D_GLIBCXX_DEBUG with
g++). (Interestingly enough, compiled without bounds checking,
and with optimization, it seems to run forever.)
 
J

James Kanze

James said:
Thomas J. Gritzan wrote: [snip]
The x86 floating point co-processor operates on 80 bit
floating point values. Temporaries stored in memory are
in 64 bit double precision format. So by storing the
first result into a temporary, it was truncated and
rounded to 64 bit, and by loading it back, it was
extended to 80 bit again. The second result still is 80
bit when the two are compared.
Yup, and that's ok with the standard by [5/10].
Within an expression. As far as I know, the value with
extended precision *must* be rounded to double when it is
used to initialize a double, e.g. the return value.
Which clause says that? As said elsethread, I am worried that
RVO might _allow_ the compiler to cut corners in some
unexpected cases.

RVO doesn't change floating point semantics. A double is a
double. All RVO does is allow two objects with basically
non-overlapping lifetimes to be merged, but the double object is
still there. And the question isn't which clause says that the
compiler can't do this; the question is which clause allows the
compiler to use additional bits in an actual object. A return
value is an object, with a specific type.

(Footnote 55 makes it clear that an explicit type conversion or
an assignment must revert to standard precision. Are you saying
that an initialization needed, although an assignment must?)
[snip]
Not with g++, unless you specify -ffloat-store.
I don't understand: are you implying that a write-reread cycle
would not be sufficient to truncate the values _or_ that with
g++ one needs the option -ffloat-store to force a write-reread
cycle?

The latter, but at the hardware level. In C++, returning a
double is a write (the return value is a object with a specific
type which must be initialized), and using it is a read. G++
does not respect this unless you specify -ffloat-store.

And of course, in practice, I think that the Intel FPU has an
instruction to round to floating precision, without actually
doint the write-reread. Which is all that is really required.
You make it sound like you contradict what I am saying
whereas, in fact, you might not.

The question is what you mean by write-reread. The C++
operations of initializing or modifying an object, then
accessing that object, or hardware operations.
The original, has
sin(x)+cos(y) == sin(x)+cos(y)
where the amended version has
equal( sin(x)+cos(y), sin(x)+cos(y) )
The difference is that equal() is a function call that
requires initialization of the parameters.

Yes. And the original compared the results of the built-in
operator+. My mistake---there is a definite difference.
I don't think that that matters, and it is not what I meant to
imply by using the term write-reread cycle. What _does_ matter
is that an object is initialized under certain circumstances
and that that initialization requires truncation.

I'm beginning to get confused about what I wrote myself.
Basically, what I was trying to say is that g++ doesn't always
respect the write-rereads required of the abstract machine by
the standard. Whether you use pass by value or pass by
reference, for example, shouldn't change anything in your
example, since in both cases, the compiler must initialize an
object. In the original, though, there weren't any intermediate
objects declared, so that the extended precision could be
maintained.
 
K

Kai-Uwe Bux

James said:
James said:
Thomas J. Gritzan wrote: [snip]
The x86 floating point co-processor operates on 80 bit
floating point values. Temporaries stored in memory are
in 64 bit double precision format. So by storing the
first result into a temporary, it was truncated and
rounded to 64 bit, and by loading it back, it was
extended to 80 bit again. The second result still is 80
bit when the two are compared.
Yup, and that's ok with the standard by [5/10].
Within an expression. As far as I know, the value with
extended precision *must* be rounded to double when it is
used to initialize a double, e.g. the return value.
Which clause says that? As said elsethread, I am worried that
RVO might _allow_ the compiler to cut corners in some
unexpected cases.

RVO doesn't change floating point semantics. A double is a
double. All RVO does is allow two objects with basically
non-overlapping lifetimes to be merged, but the double object is
still there. And the question isn't which clause says that the
compiler can't do this; the question is which clause allows the
compiler to use additional bits in an actual object. A return
value is an object, with a specific type.

(Footnote 55 makes it clear that an explicit type conversion or
an assignment must revert to standard precision. Are you saying
that an initialization needed, although an assignment must?)
[snip]

Yes :) I guess that's what I'm saying.

The problem is wether a return statement always implies object
initialization. If I look at a composite expression such as

sin(x) + cos(y)

then sin() and cos() are function calls. I am not certain that the standard
requires truncation of the returned values because in the expression they
occur, those return values are not treated as objects but just as values.
The standard allows RVO and I understand that as a way to get rid of
temporaries. If the object is gone, there is no need for it to be
initialized. I also think that RVO may change the semantics of a program.
I.e., eliding the copy-constructor is allowed to change the semantics, why
couldn't that mean that truncation is elided? In fact, I would think that
[5/10] looses much of its appeal should it _not_ apply in a case like
above. However, in

equal( sin(x), cos(y) )

the return values are used to initialize parameters, and in this case, I can
see that the standard would require the abstract machine to perform an
object initialization.

None of this is firm belief in any way. I find the standard very confusing
with regard to floating point arithmetic.


Best

Kai-Uwe Bux
 
P

Philip Pemberton

Out of curiosity, I suggested elsethread the following implementation
for Pixel::eek:perator<:

bool operator<(const Pixel &rhs) const {
return std::less< double >()( get_luminance(), rhs.get_luminance()
);
}

Could you try that?

That fixed it -- the segfaults have stopped, and the reference image
actually looks reasonably sane now (which is hardly surprising). I can't
say I've managed to figure out exactly why the implementation of
std::sort in libstdc++ kept dying like that, but at least it works now...

Looks like it was a rounding issue of some description, though -- and
there is a mention about it being the "most frequently reported non-bug"
on the gcc website... nice how I found that /now/...

What I'm not 100% sure on is why feeding this through a comparison
functor stops the segfaults. Is it just down to the fact that gcc is
forced to typecast both sides of the comparison into doubles (applying
rounding in the process) to pass to less_than::eek:perator(), or am I
missing something?

(As you can probably tell, I'm quite new to STL -- I did a few modules on
object oriented programming in C++ in college and university, but none of
them covered much of STL beyond containers and algorithms...)

Thanks,
Phil.
 
J

Juha Nieminen

James said:
Note the two inner loops. If __comp always returns true (or
even if it returns true from time to time when it shouldn't),
you'll get an array bounds violation.

I suppose you are right.

Given that if the comparator does not work as required the compiler is
free to do whatever it pleases, skipping the extra conditionals in the
name of efficiency is a valid strategy.
 
K

Kai-Uwe Bux

Philip said:
That fixed it -- the segfaults have stopped, and the reference image
actually looks reasonably sane now (which is hardly surprising). I can't
say I've managed to figure out exactly why the implementation of
std::sort in libstdc++ kept dying like that, but at least it works now...

Looks like it was a rounding issue of some description, though -- and
there is a mention about it being the "most frequently reported non-bug"
on the gcc website... nice how I found that /now/...

What I'm not 100% sure on is why feeding this through a comparison
functor stops the segfaults. Is it just down to the fact that gcc is
forced to typecast both sides of the comparison into doubles (applying
rounding in the process) to pass to less_than::eek:perator(), or am I
missing something?

Elsethread, there is an ongoing, extensive discussion of what might cause
the segfault. One hypothesis was a comparison predicate that yields
inconsistent answers due to internal excess precision. As you point out,
the ideal behind using std::less instead of operator< is to force
truncation. What is still under discussion is whether the original code is
required to work by the standard.

(As you can probably tell, I'm quite new to STL -- I did a few modules on
object oriented programming in C++ in college and university, but none of
them covered much of STL beyond containers and algorithms...)

Well, the issue is not so much with the STL. The issue is with floating
point arithmetic. That is just hard.


Best

Kai-Uwe Bux
 
J

James Kanze

James said:
James Kanze wrote:
Thomas J. Gritzan wrote:
[snip]
The x86 floating point co-processor operates on 80 bit
floating point values. Temporaries stored in memory are
in 64 bit double precision format. So by storing the
first result into a temporary, it was truncated and
rounded to 64 bit, and by loading it back, it was
extended to 80 bit again. The second result still is 80
bit when the two are compared.
Yup, and that's ok with the standard by [5/10].
Within an expression. As far as I know, the value with
extended precision *must* be rounded to double when it is
used to initialize a double, e.g. the return value.
Which clause says that? As said elsethread, I am worried that
RVO might _allow_ the compiler to cut corners in some
unexpected cases.
RVO doesn't change floating point semantics. A double is a
double. All RVO does is allow two objects with basically
non-overlapping lifetimes to be merged, but the double
object is still there. And the question isn't which clause
says that the compiler can't do this; the question is which
clause allows the compiler to use additional bits in an
actual object. A return value is an object, with a specific
type.
(Footnote 55 makes it clear that an explicit type conversion
or an assignment must revert to standard precision. Are you
saying that an initialization needed, although an assignment
must?)

Yes :) I guess that's what I'm saying.
The problem is wether a return statement always implies object
initialization.

I don't think there's any question about that. §8.5/12 says
that the "initialization that occurs in [...], function return,
[...] is called copy initialization [...]". On the other hand,
in §6.6.3, all it says is "[return with value...], the value of
the expression is returned to the caller", which is rather vague
about how, and in other contexts, the standard does distinguish
between class type and non-class type return values; the return
value is a rvalue, which only behaves like an object if it has
class type.

IMHO, the fact that there is "copy initialization" (and that the
standard doesn't seem to address the question of what it means
to return a value otherwise) means that the semantics of
initializing an object must apply (and an object of type double
may not have extended precision). Whether this is intentional,
or what was intended, however, is another question---I think
it's probably worth taking up in comp.std.c++. And regardless
of what was intended, the standard should be clearer about it.
If I look at a composite expression such as
sin(x) + cos(y)
then sin() and cos() are function calls. I am not certain that
the standard requires truncation of the returned values
because in the expression they occur, those return values are
not treated as objects but just as values.

That's a good example, especially as I expect that on some
architectures, sin and cos (or if not, maybe abs(double) or some
such function) in fact resolve to compiler built-ins, generating
only short sequence of machine instructions, leaving the value
in a floating point register.
The standard allows RVO and I understand that as a way to get
rid of temporaries. If the object is gone, there is no need
for it to be initialized.

I'm not sure, but as I understand it, RVO only allows merging a
temporary with another object. There's always an object
involved. (But admittedly, I've only thought about it in terms
of class types.)
I also think that RVO may change the semantics of a program.
I.e., eliding the copy-constructor is allowed to change the
semantics, why couldn't that mean that truncation is elided?
In fact, I would think that [5/10] looses much of its appeal
should it _not_ apply in a case like above. However, in
equal( sin(x), cos(y) )
the return values are used to initialize parameters, and in
this case, I can see that the standard would require the
abstract machine to perform an object initialization.

I think the same logic should apply to value arguments as to
return values. For example, if sin(x) really does use extended
precision, which is allowed to be kept as a return value, why
should the extended precision of multiplication be lost in
something like "sin(x*x)"?

Your example used double const&, of course, which does change
things. A reference always refers to an object, by definition,
so there can be no doubt that the standard requires the "value"
to be converted to an object. (The value has been lvaluized, to
coin a word.)
None of this is firm belief in any way. I find the standard
very confusing with regard to floating point arithmetic.

Agreed. I think, partially, this is intentional. The authors
of the C standard wanted to give implementations as much leeway
as possible, leaving most of the issues as "quality of
implementation". (The authors of the C standard did introduce
some restrictions with regards to K&R C, which allowed the
compiler to implement a+(b+c) exactly as if it were (a+b)+c, or
even (a+c)+b, even when the results might be different---usually
the case in floating point.)
 
G

Greg Herlihy

  Sounds like a compiler bug to me. Especially the fact that compiling
with optimizations results in different behavior than without
optimizations is rather telling.

The particular floating point values in this case are influenced by
register "pressure", that is, how many floating point registers happen
to be available at a particular point in the program's execution flow.
The gcc manual refers to this behavior as the numerical instability of
'387 code. Nonetheless, the problem is gcc's:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=323

The most straightforward solution is to use SSE instructions for
floating point operations. The SSE registers (available since the
Pentium III and Athlon) are IEEE conformant - and do not have the same
instability issues as the old 387 registers.
  On the other hand, I suppose this is, once again, one of those things
where compilers are free to do whatever they want, so even if one call
to f() results in -2.1 and a second call results in pi, that would still
be "allowed". After all, does the standard impose any requirements on
the correctness of floating point calculations?

C99 does not allow any floating point values in registers to "spill".
Moreover any assignment or cast to a floating point type must truncate
any additional precision in the value being assigned or converted.
  I actually find it rather ironic that it gives the incorrect answer
without optimizations and the correct answer with optimizations. One
would think that, if it makes any difference, it would be the optimized
version which cuts corners at the cost of accuracy, and the unoptimized
version would do all steps in the same way (because speed is not, after
all, a concern).

In this case, any change in how floating point registers are allocated
(whether for optimizations or other reasons) can affect the result. So
any code that depends on stable floating points computations (as
should be the case) is clearly in trouble in this situation.

Greg
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,163
Messages
2,570,897
Members
47,434
Latest member
TobiasLoan

Latest Threads

Top