Float comparisons: how to do them properly

G

Guest

Hello,

I have a problem with float comparisons. Yes, I have read FAQ point
14.4ff.

Assume this little code sniplet :

#include <stdio.h>

int main(int argc, char *argv[]) {
double a = 3 ;
double b = 5 ;

if( (a/b) > 0.6)
printf(">\n") ;
else
printf("<=\n") ;

return EXIT_SUCCESS ;
}

As you might guess, it sometimes returns ">", and sometimes "<=".

Now it is of much importance for me that the program in a really, really
repeatable fashion. The threshold may be arbitrary, but what I want to
avoid at all costs are different results depending on what compiler
options I use, for example.

In my "real world" program, a and b are ints, so I could solve the
problem by doing

if( (a * 10) > (b * 6) )

But my question is: if it _were_ doubles, what should I do then? How to
get a fully deterministic statement, independent of architecture,
compilers and stuff?

Regards,

January
 
J

John Cochran

Hello,

I have a problem with float comparisons. Yes, I have read FAQ point
14.4ff.

Assume this little code sniplet :

#include <stdio.h>

int main(int argc, char *argv[]) {
double a = 3 ;
double b = 5 ;

if( (a/b) > 0.6)
printf(">\n") ;
else
printf("<=\n") ;

return EXIT_SUCCESS ;
}

As you might guess, it sometimes returns ">", and sometimes "<=".

Now it is of much importance for me that the program in a really, really
repeatable fashion. The threshold may be arbitrary, but what I want to
avoid at all costs are different results depending on what compiler
options I use, for example.

In my "real world" program, a and b are ints, so I could solve the
problem by doing

if( (a * 10) > (b * 6) )

But my question is: if it _were_ doubles, what should I do then? How to
get a fully deterministic statement, independent of architecture,
compilers and stuff?

From what I can see, you are attempting to perform an equality check against
a floating point number. That is a fundamentally broken idea.

You may want to write a function that compares two floating point numbers
and returns -1, 0, 1 for the cases of
A definitely less than B
A approximately equal to B
A definitely greater than B
where the "approximately equal to" is true if A and B are within 2 or 3 ULPs
of each other (you decide the threshold). Then whenever you want to compare
two floating point numbers, you call this function and check its return
result.
 
G

Guest

John Cochran said:
From what I can see, you are attempting

I'm attepting to find out what the proper way of solving such problems is.
In my program, I did manage to forget the floating point numbers and use
integers, but that might not always be the case -- and that is why I ask.
to perform an equality check against
a floating point number. That is a fundamentally broken idea.

Well, I know. But somehow sometimes you need to check e.g. whether your
calculated statistics p is greater than, for example, 0.95. My question
is: what do you do then, if you want deterministic behaviour?
You may want to write a function that compares two floating point numbers
and returns -1, 0, 1 for the cases of
A definitely less than B
A approximately equal to B
A definitely greater than B
where the "approximately equal to" is true if A and B are within 2 or 3 ULPs
of each other (you decide the threshold). Then whenever you want to compare
two floating point numbers, you call this function and check its return
result.

Yes, I know, but this still does not give me deterministic behaviour. Say,
I have a population of 1000 data points, my program runs and finds that 123
of them are statistically significant. Now someone else takes it, and
finds around twice as much statistically significant data points, because
of one f...loating comparison! How do you deal with such problems -- this
is my question.

j.
 
A

Andre Charron

I'm attepting to find out what the proper way of solving such problems is.
In my program, I did manage to forget the floating point numbers and use
integers, but that might not always be the case -- and that is why I ask.

What you want to do is compare the absolute value of their difference to
some threshold.

#define EPSILON 0.00001

if (fabs(d2-d1) < EPSILON)

{

// They're essentially equal

}

else

{

// They're not equal

}

Andre
 
G

Guest

Andre Charron said:
What you want to do is compare the absolute value of their difference to
some threshold.
#define EPSILON 0.00001
if (fabs(d2-d1) < EPSILON)

You think this will be deterministic? Look: the problem is not in finding
whether or not two values are equal.

The problem is
1) deciding, whether a value is greater then a threshold
2) doing so in a _deterministic_, compiler-independent way

Adding an epsilon value does not change this problem, it just shifts it's
boundaries.

OK, let me try to explain it like this.

What happens if, mathematically, a + epsilon = b? Then a - b = epsilon. In
your program, you compare fabs(a - b) with epsilon. This is not a
deterministic operation -- you are comparing two floats. In one compiler
fabs(a - b) might be smaller than epsilon, in an other -- greater.

See? I don't really want to know for sure whether a > b -- b is a
threshold, b is arbitrarily chosen. It can as well be b + epsilon.

Instead, I want the program to make always the same decision, independently
of whether I use compiler X or Y.

j.
 
J

John Cochran

What you want to do is compare the absolute value of their difference to
some threshold.

#define EPSILON 0.00001

if (fabs(d2-d1) < EPSILON)

{

// They're essentially equal

}

else

{

// They're not equal

}

Andre

Not quite. I would suggest

if (fabs((d2-d1)/max(fabs(d1),fabs(d2))) < EPSILON) {
// Equal
} else {
// Not equal
}

as the comparison. This allows it to adjust with the scale of the numbers.

But, given the original posting. The poster seems to want the exact same
behavior for equality checks. This can not be guaranteed even if you perform
a close range check. The indeterminate behaivor will be shifted from when
the numbers are near equal to where the numbers are at the edges of the
range checks.

Overall, use the <, >, <=, >= operators and be aware that the edge between
< and >= is a bit "fuzzy" when it comes to floating point numbers. You can
not guarantee identical behavior between compilers or compiler options.
 
G

Gordon Burditt

What you want to do is compare the absolute value of their difference to
You think this will be deterministic? Look: the problem is not in finding
whether or not two values are equal.

The problem is
1) deciding, whether a value is greater then a threshold
2) doing so in a _deterministic_, compiler-independent way

Take a step back from this problem. You want to *RE-MEASURE* your
data, then recalculate it whether it is greater than a threshold.
Are you going to get deterministic results? No. Every time you
take a ruler and measure something you're going to get slightly
different results, even if the length of it is NOT changing with
time.

You're never going to get rid of edge-effect problems unless you
measure AND calculate with infinite precision. And infinite-precision
measurements are very difficult (and expensive) to do. A guy named
Heisenberg had some interesting things to say on this subject.
Adding an epsilon value does not change this problem, it just shifts it's
boundaries.

It also says that, if compiler differences make a significant difference
in the results, your results are complete mush and should be discarded.
OK, let me try to explain it like this.

What happens if, mathematically, a + epsilon = b? Then a - b = epsilon. In
your program, you compare fabs(a - b) with epsilon. This is not a
deterministic operation -- you are comparing two floats. In one compiler
fabs(a - b) might be smaller than epsilon, in an other -- greater.

See? I don't really want to know for sure whether a > b -- b is a
threshold, b is arbitrarily chosen. It can as well be b + epsilon.

Instead, I want the program to make always the same decision, independently
of whether I use compiler X or Y.

In other words, you want IDENTICAL garbage, rather than different-smelling
garbage. But it's still garbage. And I prefer to KNOW my results are
garbage rather than covering it up.

Gordon L. Burditt
 
R

Rouben Rostamian

Not quite. I would suggest

if (fabs((d2-d1)/max(fabs(d1),fabs(d2))) < EPSILON) {
as the comparison. This allows it to adjust with the scale of the numbers.

I would suggest:

if (fabs(d2-d1)/(1.0+max(fabs(d1),fabs(d2))) < EPSILON) {

That will take care of the case when d1 and d2 are both zero or
almost zero.
 
J

John Cochran

I would suggest:

if (fabs(d2-d1)/(1.0+max(fabs(d1),fabs(d2))) < EPSILON) {

That will take care of the case when d1 and d2 are both zero or
almost zero.
Reasonable.
However, I suspect that zero needs to be treated independently. Your
solution would make all comparisions where both numbers have an absolute
value below EPSILON claim to be equal. This is most likely not what you
want.

For example, I have an EPSILON of 1e-12 (eg. I will treat as equal any
two numbers that match to within 12 significant digits). This will allow
for a reasonable amount of slop in the lower digits assuming that my
floating point math is precise to within 15 digits.

What you gave would consider 1e-13 and 1e-14 to be equal even though the
floating point math is well within its ability to represent both numbers
to within 15 digits and it's going nowhere near any of its limits.
 
K

Ken Turkowski

double a = 3 ;
double b = 5 ;

if( (a/b) > 0.6)
printf(">\n") ;
else
printf("<=\n") ;

As you might guess, it sometimes returns ">", and sometimes "<=".

Now it is of much importance for me that the program in a really, really
repeatable fashion

There is something seriously wrong with your computer if the same
computer returns ">", and "<=" for the same expression. Are you running
on a Pentium? There was a bug with Pentium's division several years ago
-- perhaps you've got a bad chip. Intel said tha they would replace the
chips if the user was affected, but most people are not.

It is understandable if you get a different result on different types of
machines.
 
R

Rouben Rostamian

Reasonable.
However, I suspect that zero needs to be treated independently. Your
solution would make all comparisions where both numbers have an absolute
value below EPSILON claim to be equal. This is most likely not what you
want.

For example, I have an EPSILON of 1e-12 (eg. I will treat as equal any
two numbers that match to within 12 significant digits). This will allow
for a reasonable amount of slop in the lower digits assuming that my
floating point math is precise to within 15 digits.

What you gave would consider 1e-13 and 1e-14 to be equal even though the
floating point math is well within its ability to represent both numbers
to within 15 digits and it's going nowhere near any of its limits.

I see what you mean. Perhaps, as you suggest, the separate treatment
of the zero is the best solution. It's ugly, but that's the way it is.
 
D

Dik T. Winter

> I have a problem with float comparisons. Yes, I have read FAQ point
> 14.4ff. ....
> int main(int argc, char *argv[]) {
> double a = 3 ;
> double b = 5 ;
> if( (a/b) > 0.6)
> printf(">\n") ;
> else
> printf("<=\n") ; ....
> Now it is of much importance for me that the program in a really, really
> repeatable fashion. The threshold may be arbitrary, but what I want to
> avoid at all costs are different results depending on what compiler ....
> But my question is: if it _were_ doubles, what should I do then? How to
> get a fully deterministic statement, independent of architecture,
> compilers and stuff?

The short answer is: you can't. The long answer is: because there are
(still) many floating point applications and many compilers around there
is something non-deterministic in floating point arithmetic when you
cross compilers or architectures. There are too many things not
specified:
1. How is the value 0.6 converted by the compiler to a floating-point value.
2. How is a/b rounded to a proper floating-point value.
3. Is a/b kept in higher precision?
to mention three.

The better question is on *why* you do want to have a deterministic
division of floating point values. I think your problem is probably
better solved with scaled integers.
 
D

Dik T. Winter

>
> Well, I know. But somehow sometimes you need to check e.g. whether your
> calculated statistics p is greater than, for example, 0.95. My question
> is: what do you do then, if you want deterministic behaviour?

Do you want deterministic behaviour if you are doing statistics?
> Yes, I know, but this still does not give me deterministic behaviour. Say,
> I have a population of 1000 data points, my program runs and finds that 123
> of them are statistically significant. Now someone else takes it, and
> finds around twice as much statistically significant data points, because
> of one f...loating comparison! How do you deal with such problems -- this
> is my question.

You see immediately that about 123 data points are on the edge of being
statistically significant or not. I would consider that quite a large
number in a population of 1000 data points! And I would distrust any
statistics made on either of the assumptions of significance. When so
many data points are on the edge, there is something seriously wrong
with the methodology.
 
G

Guest

Do you want deterministic behaviour if you are doing statistics?

Oh, definitely. I expect a deterministic behaviour from my software.
Because if it is not, it will bias the statistics. I might even find a
statistical significance because of some biased randomness in floating
operations.
You see immediately that about 123 data points are on the edge of being
statistically significant or not. I would consider that quite a large
number in a population of 1000 data points! And I would distrust any

OK, let's say there are 1e9 data points, you feel better now? :)
statistics made on either of the assumptions of significance. When so
many data points are on the edge, there is something seriously wrong
with the methodology.

Gosh, you need not to take an abstract example so seriously. Forget about
the examples.

Is there, generally speaking, a way to make deterministic floating point
calculations in C? It is just a theoretical question. Maybe the answer
is: no, there is no such way in any language, because no computer can make
real number calculation, it can only approximate. Or, more likely, the
answer is: it can be done, but it is hard to do, and you lose precision by
orders of magnitude.

j.
 
G

Guest

Dik T. Winter said:
The better question is on *why* you do want to have a deterministic
division of floating point values.

Short answer is: because I'm a curious person, that's why. I just want to
know.
I think your problem is probably
better solved with scaled integers.

As I mentioned in the original posting, that is precisely what I did.

j.
 
R

Richard Bos

Is there, generally speaking, a way to make deterministic floating point
calculations

No. Never mind the language. Not unless you juggle the bits yourself.

Richard
 
G

Guest

Gordon Burditt said:
Take a step back from this problem. You want to *RE-MEASURE* your
data, then recalculate it whether it is greater than a threshold.
Are you going to get deterministic results? No. Every time you
take a ruler and measure something you're going to get slightly
different results, even if the length of it is NOT changing with
time.

This is not true. Count your fingers. Count them again. And? Got
different results? :) Ever heard of non-parametric statistics? Your
measurements can be infinitely precise, since they can be integer.
However, your statistics is not integer.

Anyway -- I had my problem solved before I sent the first posting; my
interest is purely academic.
You're never going to get rid of edge-effect problems unless you
measure AND calculate with infinite precision. And infinite-precision
measurements are very difficult (and expensive) to do. A guy named
Heisenberg had some interesting things to say on this subject.

What can I say. I am a biologist. I do infinite-precision measurements on
a daily basis. Like: counting the number of amino-acids in a sequence.
Or: calculating the number of sequences with the feature x.

This does not change a bit the need for floating precision, you know, since
statistical reasoning may require it.
It also says that, if compiler differences make a significant difference
in the results, your results are complete mush and should be discarded.

Really? Shouldn't think so. Rather that the way I'm approaching the
problem is mush.

j.
 
A

Anthony Roberts

But my question is: if it _were_ doubles, what should I do then? How to
get a fully deterministic statement, independent of architecture,
compilers and stuff?

Not only do compilers differ, but you might even get different results
on different architechtures.

You'd have to write your own floating point math routines. And you'd
have to control everything down to the bit level to be sure it would
work properly, especially on multiple architechtures. A few years ago I
did some messing around with floating point stuff on a few
architechtures. As I recall, SPARC and PowerPC produced identical
results, but x86 (not a Pentium! :) ) was different.

Having your own routines would give you the freedom to get as much
precision as you want, you'd be able to ensure that the results are
deterministic and work the same on all architechtures, but it would be
pretty slow.
 
C

Christian Bau

Not quite. I would suggest

if (fabs((d2-d1)/max(fabs(d1),fabs(d2))) < EPSILON) {
// Equal
} else {
// Not equal
}

as the comparison. This allows it to adjust with the scale of the numbers.

And if you are lucky, your program will crash when d1 = d2 = 0. That's
if you are lucky; if you are out of luck it will cause your program the
give the wrong answer in a rare case, which leads to an error costing a
customer of your company millions, they find the error, sue your
employer and you get fired.

Why not

fabs (d2 - d1) <= EPSILON * max (fabs (d1), fabs (d2))

And then you can think a moment whether it makes any difference whether
you use fabs (d1), fabs (d2) or the larger of both on the right side -
if they are close then it makes no difference, if the values are far
apart then it makes no difference either! So just write

fabs (d2 - d1) <= EPSILON * fabs (d1)
 
C

Christian Bau

Ken Turkowski said:
There is something seriously wrong with your computer if the same
computer returns ">", and "<=" for the same expression. Are you running
on a Pentium? There was a bug with Pentium's division several years ago
-- perhaps you've got a bad chip. Intel said tha they would replace the
chips if the user was affected, but most people are not.

It is understandable if you get a different result on different types of
machines.

1. He never mentioned identical or different implementations. The C code
itself can produce different results, depending on the implementation.

2. The same compiler with different compiler options may produce
different results as well.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,145
Messages
2,570,825
Members
47,371
Latest member
Brkaa

Latest Threads

Top