float("nan") in set or as key

C

Chris Angelico

Careful with the attributions, Carl was quoting me when he posted that :)
You seem to be implying that python only provides a single non-integer
numeric type.  That's not true.  Python ships with a bunch of
different numeric types, including a rational type.  Off the top of my
head, we have:

IEEE floating point numbers
(http://docs.python.org/library/stdtypes.html#numeric-types-int-float-long-complex)
Rationals (http://docs.python.org/library/fractions.html)
Base-10 fixed and floating point numbers
(http://docs.python.org/library/decimal.html)
Complex numbers
(http://docs.python.org/library/stdtypes.html#numeric-types-int-float-long-complex
plus http://docs.python.org/library/cmath.html)
Integers (both ints and longs, which are pretty well unified by now)

I know Python does support all of the above. Leave off int/long and
complex, which are obviously not trying to store real numbers
(although I guess you could conceivably make 'complex' the vehicle for
reals too), there's three: float, fraction, decimal. Of them, one is a
built-in type and the other two are imported modules. Hence my
question about why this one and not that one should be the "default"
that people will naturally turn to as soon as they need non-integers.
(Or, phrasing it another way: Only one of them is the type that "3.2"
in your source code will be represented as.)
Floats have far and away the best performance in most common
situations, so they end up being the default, but if you want to use
something different, it's usually not hard to do.

And that, right there, is the answer.

ChrisA
 
N

Nobody

If so, I've never heard it, and I cannot imagine what such a good
argument would be. Please give it.

Exceptions allow you to write more natural code by ignoring the awkward
cases. E.g. writing "x * y + z" rather than first determining whether
"x * y" is even defined then using a conditional.
True, but what's your point? Testing two floats for equality is not an
exceptional condition.

NaN itself is an exceptional condition which arises when a result is
undefined or not representable. When an operation normally returns a
number but a specific case cannot do so, it returns not-a-number.

The usual semantics for NaNs are practically identical to those for
exceptions. If any intermediate result in a floating-point expression is
NaN, the overall result is NaN. Similarly, if any intermediate calculation
throws an exception, the calculation as a whole throws an exception.

If x is NaN, then "x + y" is NaN, "x * y" is NaN, pretty much anything
involving x is NaN. By this reasoning both "x == y" and "x != y" should
also be NaN. But only the floating-point types have a NaN value, while
bool doesn't. However, all types have exceptions.
Why on earth not?

Why should there be a correct answer? What does NaN actually mean?

Apart from anything else, defining "NaN == NaN" as False means that
"x == x" is False if x is NaN, which violates one of the fundamental
axioms of an equivalence relation (and, in every other regard, "==" is
normally intended to be an equivalence relation).

The creation of NaN was a pragmatic decision on how to handle exceptional
conditions in hardware. It is not holy writ, and there's no fundamental
reason why a high-level language should export the hardware's behaviour
verbatim.
I cannot imagine what that argument would be. Please explain.

A result of NaN means that the result of the calculation is undefined, so
the value is "unknown". If x is unknown and y is unknown, then whether x
is equal to y is itself unknown, and whether x differs from y is also
unknown.
 
G

Grant Edwards

That's overstating it. There's a good argument to be made for raising
an exception. Bear in mind that an exception is not necessarily an
error, just an "exceptional" condition.


There is no correct answer to "nan == nan".

For those of us who have to deal with the real world (that means
complying with IEEE-754), there _is_ a correct answer. IIRC, the IEEE
standard requires nan == nan is false, and nan != nan is true.

That said, I don't remember what the other comparisons are supposed to
do...
Defining it to be false is just the "least wrong" answer.

Arguably, "nan != nan" should also be false, but that would violate
the invariant "(x != y) == !(x == y)".

And it would violate the IEEE standard. IEEE-754 has it's warts, but
we're far better off than we were with dozens of incompatible,
undocumented, vendor-specific schemes (most of which had more warts
than IEEE-754).
 
S

Steven D'Aprano

But, "real numbers" and "IEEE float" are so different that I don't think
that it would be a wise decision for people to pretend they're working
with real numbers when in fact they are working with IEEE floats.

People pretend that *all the time*.

Much of the opposition to NANs, for example, is that it violates
properties of the reals. But so do ordinary floats! People just pretend
otherwise.

For reals, a + b - a = b, always without exception. For floats, not so
much.

For reals, a*(b + c) = a*b + a*c, always without exception. For floats,
not so much.

For reals, 1/(1/x) = x, except for 0, always. For floats, not so much.
For IEEE floats with proper support for INF, 0 is one of the cases which
does work!

These sorts of violations are far more likely to bite you than the NAN
boogey, that x != x when x is a NAN. But people go into paroxysms of
concern over the violation that they will probably never see, and ignore
the dozens that they trip over day after day.

Compiler optimizations are some of the worst and most egregious
violations of the rule Floats Are Not Reals. Large numbers of numeric
algorithms are simply broken due to invalid optimizations written by C
programmers who think that because they have a high school understanding
of real-value math they therefore understand floats.
 
S

Steven D'Aprano

Exceptions allow you to write more natural code by ignoring the awkward
cases. E.g. writing "x * y + z" rather than first determining whether "x
* y" is even defined then using a conditional.

You've quoted me out of context. I wasn't asking for justification for
exceptions in general. There's no doubt that they're useful. We were
specifically talking about NAN == NAN raising an exception rather than
returning False.

NaN itself is an exceptional condition which arises when a result is
undefined or not representable. When an operation normally returns a
number but a specific case cannot do so, it returns not-a-number.

I'm not sure what "not representable" is supposed to mean, but if you
"undefined" you mean "invalid", then correct.

The usual semantics for NaNs are practically identical to those for
exceptions. If any intermediate result in a floating-point expression is
NaN, the overall result is NaN.

Not necessarily. William Kahan gives an example where passing a NAN to
hypot can justifiably return INF instead of NAN. While it's certainly
true that *mostly* any intermediate NAN results in a NAN, that's not a
guarantee or requirement of the standard. A function is allowed to
convert NANs back to non-NANs, if it is appropriate for that function.

Another example is the Kronecker delta:

def kronecker(x, y):
if x == y: return 1
return 0

This will correctly consume NAN arguments. If either x or y is a NAN, it
will return 0.

(As an aside, this demonstrates that having NAN != any NAN, including
itself, is useful, as kronecker(x, x) will return 0 if x is a NAN.)

Similarly, if any intermediate
calculation throws an exception, the calculation as a whole throws an
exception.

This is certainly true... the exception cannot look into the future and
see that it isn't needed because a later calculation cancels it out.

Exceptions, or hardware traps, stop the calculation. NANs allow the
calculation to proceed. Both behaviours are useful, and the standard
allows for both.

If x is NaN, then "x + y" is NaN, "x * y" is NaN, pretty much anything
involving x is NaN. By this reasoning both "x == y" and "x != y" should
also be NaN.

NAN is a sentinel for an invalid operation. NAN + NAN returns a NAN
because it is an invalid operation, not because NANs are magical goop
that spoil everything they touch.

For example, print(NAN) does not return a NAN or raise an exception, nor
is there any need for it to. Slightly more esoteric: the signbit and
copysign functions both accept NANs without necessarily returning NANs.

Equality comparison is another such function. There's no need for
NAN == NAN to fail, because the equality operation is perfectly well
defined for NANs.

But only the floating-point types have a NaN value, while
bool doesn't. However, all types have exceptions.

What relevance does bool have?


Why should there be a correct answer? What does NaN actually mean?

NAN means "this is a sentinel marking that an invalid calculation was
attempted". For the purposes of numeric calculation, it is often useful
to allow those sentinels to propagate through your calculation rather
than to halt the program, perhaps because you hope to find that the
invalid marker ends up not being needed and can be ignored, or because
you can't afford to halt the program.

Does INVALID == INVALID? There's no reason to think that the question
itself is an invalid operation. If you can cope with the question "Is an
apple equal to a puppy dog?" without shouting "CANNOT COMPUTE!!!" and
running down the street, there's no reason to treat NAN == NAN as
anything worse.

So what should NAN == NAN equal? Consider the answer to the apple and
puppy dog comparison. Chances are that anyone asked that will give you a
strange look and say "Of course not, you idiot". (In my experience, and
believe it or not I have actually tried this, some people will ask you to
define equality. But they're a distinct minority.)

If you consider "equal to" to mean "the same as", then the answer is
clear and obvious: apples do not equal puppies, and any INVALID sentinel
is not equal to any other INVALID. (Remember, NAN is not a value itself,
it's a sentinel representing the fact that you don't have a valid number.)

So NAN == NAN should return False, just like the standard states, and
NAN != NAN should return True. "No, of course not, they're not equal."

Apart from anything else, defining "NaN == NaN" as False means that "x
== x" is False if x is NaN, which violates one of the fundamental axioms
of an equivalence relation (and, in every other regard, "==" is normally
intended to be an equivalence relation).

Yes, that's a consequence of NAN behaviour. I can live with that.

The creation of NaN was a pragmatic decision on how to handle
exceptional conditions in hardware. It is not holy writ, and there's no
fundamental reason why a high-level language should export the
hardware's behaviour verbatim.

There is a good, solid reason: it's a *useful* standard that *works*,
proven in practice, invented by people who have forgotten more about
floating point than you or I will ever learn, and we dismiss their
conclusions at our peril.

A less good reason: its a standard. Better to stick to a not-very-good
standard than to have the Wild West, where everyone chooses their own
behaviour. You have NAN == NAN raise ValueError, Fred has it return True,
George has it return False, Susan has it return a NAN, Michelle makes it
raise MathError, somebody else returns Maybe ...

But IEEE-754 is not just a "not-very-good" standard. It is an extremely
good standard.


A result of NaN means that the result of the calculation is undefined,
so the value is "unknown".

Incorrect. NANs are not "unknowns", or missing values.
 
G

Grant Edwards

But IEEE-754 is not just a "not-very-good" standard. It is an
extremely good standard.

I get the distinct impression that the people arguing that IEEE-754 is
somehow "wrong" about the value of 'NaN == NaN' are the people who
don't actually use floating point. Those of us that do use floating
point and depend on the predictable behavior of NaNs seem to be happy
enough with the standard.

Two of my perennial complaints about Python's handling of NaNs and
Infs:

1) They weren't handle by pickle et al.

2) The string representations produced by repr() and accepted by
float() weren't standardized across platforms.

I think the latter has finally been fixed, hasn't it?
 
R

Robert Kern

Two of my perennial complaints about Python's handling of NaNs and
Infs:

1) They weren't handle by pickle et al.

2) The string representations produced by repr() and accepted by
float() weren't standardized across platforms.

I think the latter has finally been fixed, hasn't it?

And the former!

Python 2.7.1 |EPD 7.0-2 (32-bit)| (r271:86832, Dec 3 2010, 15:41:32)
[GCC 4.0.1 (Apple Inc. build 5488)] on darwin
Type "help", "copyright", "credits" or "license" for more information.inf

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco
 
N

Nobody

You've quoted me out of context. I wasn't asking for justification for
exceptions in general. There's no doubt that they're useful. We were
specifically talking about NAN == NAN raising an exception rather than
returning False.

It's arguable that NaN itself simply shouldn't exist in Python; if the FPU
ever generates a NaN, Python should raise an exception at that point.

But given that NaNs propagate in almost the same manner as exceptions,
you could "optimise" this by treating a NaN as a special-case
implementation of exceptions, and turn it into a real exception at the
point where you can no longer use a NaN (e.g. when using a comparison
operator).

This would produce the same end result as raising an exception
immediately, but would reduce the number of isnan() tests.
I'm not sure what "not representable" is supposed to mean,

Consider sqrt(-1). This is defined (as "i" aka "j"), but not representable
as a floating-point "real". Making root/log/trig/etc functions return
complex numbers when necessary probably be inappropriate for a language
such as Python.
but if you "undefined" you mean "invalid", then correct.

I mean undefined, in the sense that 0/0 is undefined (I note that Python
actually raises an exception for "0.0/0.0").
Not necessarily. William Kahan gives an example where passing a NAN to
hypot can justifiably return INF instead of NAN.

Hmm. Is that still true if the NaN signifies "not representable" (e.g.
known but complex) rather than undefined (e.g. unknown value but known to
be real)?
While it's certainly
true that *mostly* any intermediate NAN results in a NAN, that's not a
guarantee or requirement of the standard. A function is allowed to
convert NANs back to non-NANs, if it is appropriate for that function.

Another example is the Kronecker delta:

def kronecker(x, y):
if x == y: return 1
return 0

This will correctly consume NAN arguments. If either x or y is a NAN, it
will return 0. (As an aside, this demonstrates that having NAN != any
NAN, including itself, is useful, as kronecker(x, x) will return 0 if x
is a NAN.)

How is this useful? On the contrary, I'd suggest that the fact that
kronecker(x, x) can return 0 is an argument against the "NaN != NaN" axiom.

A case where the semantics of exceptions differ from those of NaN is:

def cond(t, x, y):
if t:
return x
else:
return y

as cond(True, x, nan()) will return x, while cond(True, x, raise()) will
raise an exception.

But this is a specific instance of a more general problem with strict
languages, i.e. strict functions violate referential transparency.

This is why even strict languages (i.e. almost everything except for a
handful of functional languages which value mathematical purity, e.g.
Haskell) have non-strict conditionals. If you remove the conditional from
the function and write it in-line, then:

if True:
return x
else:
raise()

behaves like NaN.

Also, note that the "convenience" of NaN (e.g. not propagating from the
untaken branch of a conditional) is only available for floating-point
types. If it's such a good idea, why don't we have it for other types?
Equality comparison is another such function. There's no need for
NAN == NAN to fail, because the equality operation is perfectly well
defined for NANs.

The definition is entirely arbitrary. You could just as easily define that
(NaN == NaN) is True. You could just as easily define that "1 + NaN" is 27.

Actually, "NaN == NaN" makes more sense than "NaN != NaN", as the former
upholds the equivalence axioms and is consistent with the normal behaviour
of "is" (i.e. "x is y" => "x == y", even if the converse isn't necessarily
true).

If you're going to argue that "NaN == NaN" should be False on the basis
that the values are sentinels for unrepresentable values (which may be
*different* unrepresentable values), it follows that "NaN != NaN" should
also be False for the same reason.
What relevance does bool have?

The result of comparisons is a bool.
NAN means "this is a sentinel marking that an invalid calculation was
attempted". For the purposes of numeric calculation, it is often useful
to allow those sentinels to propagate through your calculation rather
than to halt the program, perhaps because you hope to find that the
invalid marker ends up not being needed and can be ignored, or because
you can't afford to halt the program.

Does INVALID == INVALID?

Either True or INVALID. You can make a reasonable argument for either.
Making a reasonable argument that it should be False is much harder.
If you can cope with the question "Is an apple equal to a puppy dog?"

It depends upon your definition of equality, but it's not a particularly
hard question. And completely irrelevant here.
So what should NAN == NAN equal? Consider the answer to the apple and
puppy dog comparison. Chances are that anyone asked that will give you a
strange look and say "Of course not, you idiot". (In my experience, and
believe it or not I have actually tried this, some people will ask you to
define equality. But they're a distinct minority.)

If you consider "equal to" to mean "the same as", then the answer is
clear and obvious: apples do not equal puppies,

This is "equality" as opposed to "equivalence", i.e. x and y are equal if
and only if f(x) and f(y) are equal for all f.
and any INVALID sentinel is not equal to any other INVALID.

This does not follow. Unless you explicity define the sentinel to be
unequal to itself, the strict equality definition holds, as NaN tends to
be a specific bit pattern (multiple bit patterns are interpreted as NaN,
but operations which result in a NaN will use a specific pattern, possibly
modulo the sign bit).

If you want to argue that "NaN == NaN" should be False, then do so. Simply
asserting that it should be False won't suffice (nor will citing the IEEE
FP standard *unless* you're arguing that "because the standard says so" is
the only reason required).
(Remember, NAN is not a value itself, it's a sentinel representing the
fact that you don't have a valid number.)

i'm aware of that.
So NAN == NAN should return False,
Why?

just like the standard states, and NAN != NAN should return True.

Why?

In both cases, the more obvious result should be some kind of sentinel
indicating that we don't have a valid boolean. Why should this sentinel
propagate through arithmetic operations but not through logical operations?
Yes, that's a consequence of NAN behaviour.

Another consequence:
x = float("nan")
x is x True
x == x
False

Ordinarily, you would consider this behaviour a bug in the class' __eq__
method.
I can live with that.

I can *live* with it (not that I have much choice), but that doesn't meant
that it's correct or even anything short of downright stupid.
There is a good, solid reason: it's a *useful* standard
Debatable.

that *works*,
Debatable.

proven in practice,

If anything, it has proven to be a major nuisance. It takes a lot of
effort to create (or even specify) code which does the right thing in the
presence of NaNs.

Turning NaNs into exceptions at their source wouldn't make it
significantly harder to write correct code (there are a handful of cases
where the existing behaviour produces the right answer almost by accident,
far more where it doesn't), and would mean that "simple" code (where NaN
hasn't been explicitly considered) raises an exception rather than
silently producing a wrong answer.
invented by people who have forgotten more about
floating point than you or I will ever learn, and we dismiss their
conclusions at our peril.

I'm not aware that they made any conclusions about Python. I don't
consider any conclusions about the most appropriate behaviour for hardware
(which may have no choice beyond exactly /which/ bit pattern to put into a
register) to automatically determine what is the most appropriate
behaviour for a high-level language.
A less good reason: its a standard. Better to stick to a not-very-good
standard than to have the Wild West, where everyone chooses their own
behaviour. You have NAN == NAN raise ValueError, Fred has it return True,
George has it return False, Susan has it return a NAN, Michelle makes it
raise MathError, somebody else returns Maybe ...

This isn't an issue if you have the language deal with it.
Incorrect. NANs are not "unknowns", or missing values.

You're contradicting yourself here.
 
G

Gregory Ewing

Steven said:
def kronecker(x, y):
if x == y: return 1
return 0

This will correctly consume NAN arguments. If either x or y is a NAN, it
will return 0.

I'm far from convinced that this result is "correct". For one
thing, the Kronecker delta is defined on integers, not reals,
so expecting it to deal with NaNs at all is nonsensical.
For another, this function as written is numerically suspect,
since it relies on comparing floats for exact equality.

But the most serious problem is, given that
NAN is a sentinel for an invalid operation. NAN + NAN returns a NAN
because it is an invalid operation,

if kronecker(NaN, x) or kronecker(x, Nan) returns anything
other than NaN or some other sentinel value, then you've
*lost* the information that an invalid operation occurred
somewhere earlier in the computation.

You can't get a valid result from data produced by an
invalid computation. Garbage in, garbage out.
not because NANs are magical goop that spoil everything they touch.

But that's exactly how the *have* to behave if they truly
indicate an invalid operation.

SQL has been mentioned in relation to all this. It's worth
noting that the result of comparing something to NULL in
SQL is *not* true or false -- it's NULL!
 
S

Steven D'Aprano

I'm far from convinced that this result is "correct". For one thing, the
Kronecker delta is defined on integers, not reals, so expecting it to
deal with NaNs at all is nonsensical.

Fair point. Call it an extension of the Kronecker Delta to the reals then.

For another, this function as
written is numerically suspect, since it relies on comparing floats for
exact equality.

Well, it is a throw away function demonstrating a principle, not battle-
hardened production code.

But it's hard to say exactly what alternative there is, if you're going
to accept floats. Should you compare them using an absolute error? If so,
you're going to run into trouble if your floats get large. It is very
amusing when people feel all virtuous for avoiding equality and then
inadvertently do something like this:

y = 2.1e12
if abs(x - y) <= 1e-9:
# x is equal to y, within exact tolerance
...

Apart from being slower and harder to read, how is this different from
the simpler, more readable x == y?

What about a relative error? Then you'll get into trouble when the floats
are very small. And how much error should you accept? What's good for
your application may not be good for mine.

Even if you define your equality function to accept some limited error
measured in Units in Last Place (ULP), "equal to within 2 ULP" (or any
other fixed tolerance) is no better, or safer, than exact equality, and
very likely worse.

In practice, either the function needs some sort of "how to decide
equality" parameter, so the caller can decide what counts as equal in
their application, or you use exact floating point equality and leave it
up to the caller to make sure the arguments are correctly rounded so that
values which should compare equal do compare equal.

But the most serious problem is, given that


if kronecker(NaN, x) or kronecker(x, Nan) returns anything other than
NaN or some other sentinel value, then you've *lost* the information
that an invalid operation occurred somewhere earlier in the computation.

If that's the most serious problem, then I'm laughing, because of course
I haven't lost anything.

x = result_of_some_computation(a, b, c) # may return NAN
y = kronecker(x, 42)

How have I lost anything? I still have the result of the computation in
x. If I throw that value away, it is because I no longer need it. If I do
need it, it is right there, where it always was.

You seem to have fallen for the myth that NANs, once they appear, may
never disappear. This is a common, but erroneous, misapprehension, e.g.:

"NaN is like a trap door that once you have fallen in you cannot
come back out. Otherwise, the possibility exists that a calculation
will have gone off course undetectably."

http://www.rhinocerus.net/forum/lang-fortran/94839-fortran-ieee-754-
maxval-inf-nan-2.html#post530923

Certainly if you, the function writer, has any reasonable doubt about the
validity of a NAN input, you should return a NAN. But that doesn't mean
that NANs are "trap doors". It is fine for them to disappear *if they
don't matter* to the final result of the calculation. I quote:

"The key result of these rules is that once you get a NaN during
a computation, the NaN has a STRONG TENDENCY [emphasis added] to
propagate itself throughout the rest of the computation..."

http://www.savrola.com/resources/NaN.html

Another couple of good examples:

- from William Kahan, and the C99 standard: hypot(INF, x) is always INF
regardless of the value of x, hence hypot(INF, NAN) returns INF.

- since pow(x, 0) is always 1 regardless of the value of x, pow(NAN, 0)
is also 1.

In the case of the real-valued Kronecker delta, I argue that the NAN
doesn't matter, and it is reasonable to allow it to disappear.

Another standard example where NANs get thrown away is the max and min
functions. The latest revision of IEEE-754 (2008) allows for max and min
to ignore NANs.

You can't get a valid result from data produced by an invalid
computation. Garbage in, garbage out.

Of course you can. Here's a trivial example:

def f(x):
return 1

It doesn't matter what value x takes, the result of f(x) should be 1.
What advantage is there in having f(NAN) return NAN?

But that's exactly how the *have* to behave if they truly indicate an
invalid operation.

SQL has been mentioned in relation to all this. It's worth noting that
the result of comparing something to NULL in SQL is *not* true or false
-- it's NULL!

I'm sure they have their reasons for that. Whether they are good reasons
or not, I don't know. I do know that the 1999 SQL standard defined *four*
results for boolean comparisons, true/false/unknown/null, but allowed
implementations to treat unknown and null as the same.
 
C

Chris Angelico

Of course you can. Here's a trivial example:

def f(x):
   return 1

If your incoming x is garbage, your outgoing 1 is also garbage. Later
on, you can use 'isgarbage(x)' to find out whether anything went
wrong. You can also use 'isinsane(self)', which is defined as follows:

class Programmer:
def isinsane(self):
return True if float("nan")==float("nan") else True

Chris Angelico
 
S

Steven D'Aprano

If your incoming x is garbage, your outgoing 1 is also garbage.

If there were non-garbage input where f(x) would return something other
than 1, then you might argue that "well, we can't be sure what value
f(x) should return, so we better return a NAN". But there is no such
input.

NANs are a tool, not poison. They indicate an invalid calculation. Not
all calculations are critical. What do you do when you reach an invalid
calculation and you can't afford to just give up and halt the program
with an error? You try to fix it with another calculation!

If you're in the fortunate situation that you can say "this bad input
does not matter", then *that input does not matter*. Regardless of
whether your input is a NAN, or you've just caught an exception, you have
the opportunity to decide what the appropriate response is.

You might not be able to fix the situation, in which case it is
appropriate to return a NAN to signal to the next function that you don't
have a valid result. But sometimes one bad value is not the end of the
world. Perhaps you try again with a smaller step size, or you skip this
iteration of the calculation, or you throw away the current value and
start again from a different starting point, or do whatever is needed to
get the result you want.

In the case of my toy function, whatever is needed is... nothing at all.
Just return 1, the same as you would for any other input, because the
input literally does not matter for the output.
 
G

Grant Edwards

It's arguable that NaN itself simply shouldn't exist in Python; if
the FPU ever generates a NaN, Python should raise an exception at
that point.

Sorry, I just don't "get" that argument. I depend on compliance with
IEEE-754, and I find the current NaN behavior very useful, and
labor-saving.
But given that NaNs propagate in almost the same manner as
exceptions, you could "optimise" this by treating a NaN as a
special-case implementation of exceptions, and turn it into a real
exception at the point where you can no longer use a NaN (e.g. when
using a comparison operator).

This would produce the same end result as raising an exception
immediately, but would reduce the number of isnan() tests.

I've never found the number of isnan() checks in my code to be an
issue -- there just arent that many of them, and when they are there,
it provides an easy to read and easy to maintain way to handle things.
I mean undefined, in the sense that 0/0 is undefined

But 0.0/0.0 _is_ defined. It's NaN. ;)
(I note that Python actually raises an exception for "0.0/0.0").

IMHO, that's a bug. IEEE-754 states explicit that 0.0/0.0 is NaN.
Pythons claims it implements IEEE-754. Python got it wrong.
Also, note that the "convenience" of NaN (e.g. not propagating from
the untaken branch of a conditional) is only available for
floating-point types. If it's such a good idea, why don't we have it
for other types?
The definition is entirely arbitrary.

I don't agree, but even if was entirely arbitrary, that doesn't make
the decision meaningless. IEEE-754 says it's True, and standards
compliance is valuable. Each country's decision to drive on the
right/left side of the road is entire arbitrary, but once decided
there's a huge benefit to everybody following the rule.
You could just as easily define that (NaN == NaN) is True. You could
just as easily define that "1 + NaN" is 27.

I don't think that would be "just as easy" to use.
Actually, "NaN == NaN" makes more sense than "NaN != NaN", as the
former upholds the equivalence axioms

You seem to be talking about reals. We're talking about floats.
If you're going to argue that "NaN == NaN" should be False on the
basis that the values are sentinels for unrepresentable values (which
may be *different* unrepresentable values), it follows that "NaN !=
NaN" should also be False for the same reason.

Mostly I just want Python to follow the IEEE-754 standard [which I
happen to find to be very well thought out and almost always behaves
in a practical, useful manner].
If you want to argue that "NaN == NaN" should be False, then do so.
Simply asserting that it should be False won't suffice (nor will
citing the IEEE FP standard *unless* you're arguing that "because the
standard says so" is the only reason required).

For those of us who have to accomplish real work and interface with
real devices "because the standard says so" is actaully a darned good
reason. Years of experience has also shown to me that it's a very
practical decision.
If anything, it has proven to be a major nuisance. It takes a lot of
effort to create (or even specify) code which does the right thing in
the presence of NaNs.

That's not been my experience. NaNs save a _huge_ amount of effort
compared to having to pass value+status info around throughout complex
caclulations.
I'm not aware that they made any conclusions about Python.

They made some very informed (and IMO valid) conclusions about
scientific computing using binary floating point arithmatic. Those
conclusions apply largly to Python.
 
C

Chris Torek

IMHO, that's a bug. IEEE-754 states explicit that 0.0/0.0 is NaN.
Pythons claims it implements IEEE-754. Python got it wrong.

Indeed -- or at least, inconsistent. (Again I would not mind at
all if Python had "raise exception on NaN-result" mode *as well
as* "quietly make NaN", perhaps using signalling vs quiet NaN to
tell them apart in most cases, plus some sort of floating-point
context control, for instance.)

Mostly because for integers it's "too late" and there is no standard
for it. For others, well:
import decimal
decimal.Decimal('nan') Decimal("NaN")
_ + 1 Decimal("NaN")
decimal.setcontext(decimal.ExtendedContext)
print decimal.Decimal(1) / 0 Infinity
[etc]

(Note that you have to set the decimal context to one that does
not produce a zero-divide exception, such as the pre-loaded
decimal.ExtendedContext. On my one Python 2.7 system -- all the
rest are earlier versions, with 2.5 the highest I can count on,
and that only by upgrading it on the really old work systems --
I note that fractions.Fraction(0,0) raises a ZeroDivisionError,
and there is no fractions.ExtendedContext or similar.)
I don't agree, but even if was entirely arbitrary, that doesn't make
the decision meaningless. IEEE-754 says it's True, and standards
compliance is valuable. Each country's decision to drive on the
right/left side of the road is entire arbitrary, but once decided
there's a huge benefit to everybody following the rule.

This analogy perhaps works better than expected. Whenever I swap
between Oz or NZ and the US-of-A, I have a brief mental clash that,
if I am not careful, could result in various bad things. :)
 
N

Nobody

Sorry, I just don't "get" that argument. I depend on compliance with
IEEE-754, and I find the current NaN behavior very useful, and
labor-saving.

If you're "fluent" in IEEE-754, then you won't find its behaviour
unexpected. OTOH, if you are approach the issue without preconceptions,
you're likely to notice that you effectively have one exception mechanism
for floating-point and another for everything else.
I've never found the number of isnan() checks in my code to be an
issue -- there just arent that many of them, and when they are there,
it provides an easy to read and easy to maintain way to handle things.

I think that you misunderstood. What I was saying here was that, if you
wanted exception-on-NaN behaviour from Python, the interpreter wouldn't
need to call isnan() on every value received from the FPU, but rely upon
NaN-propagation and only call it at places where a NaN might disappear
(e.g. comparisons).
But 0.0/0.0 _is_ defined. It's NaN. ;)

Mathematically, it's undefined.
IMHO, that's a bug. IEEE-754 states explicit that 0.0/0.0 is NaN.
Pythons claims it implements IEEE-754. Python got it wrong.

But then IEEE-754 considers integers and floats to be completely different
beasts, while Python makes some effort to maintain a unified "numeric"
interface. If you really want IEEE-754 to-the-letter, that's undesirable,
although I'd question the choice of Python in such situations.
I don't agree, but even if was entirely arbitrary, that doesn't make
the decision meaningless. IEEE-754 says it's True, and standards
compliance is valuable.

True, but so are other things. People with a background in mathematics (as
opposed to arithmetic and numerical methods) would probably consider
following the equivalence axioms to be valuable. Someone more used to
Python than IEEE-754 might consider following the "x is y => x == y" axiom
to be valuable.

As for IEEE-754 saying that it's True: they only really had two
choices: either it's True or it's False. NaNs provide "exceptions"
even if the hardware or the language lacks them, but that falls down once
you leave the scope of floating-point. It wouldn't have been within
IEEE-754's ambit to declare that comparing NaNs should return NaB
(Not A Boolean).
You seem to be talking about reals. We're talking about floats.

Floats are supposed to approximate reals. They're also a Python
data type, and should make some effort to fit in with the rest of
the language.
That's not been my experience. NaNs save a _huge_ amount of effort
compared to having to pass value+status info around throughout complex
caclulations.

That's what exceptions are for. NaNs probably save a huge amount of effort
in languages which lack exceptions, but that isn't applicable to Python.
In Python, they result in floats not "fitting in".

Let's remember that the thread started with an oddity relating to using
floats as dictionary keys, which mostly works but fails for NaN because of
the (highly unusual) property that "x == x" is False for NaNs.

Why did the Python developers choose this behaviour? It's quite likely
that they didn't choose it, but just overlooked the fact that NaN
creates this corner-case which breaks code which works for every other
primitive type except floats and even every other float except NaN.

In any case, I should probably re-iterate at this point that I'm not
actually arguing *for* exception-on-NaN or NaN==NaN or similar, just
pointing out that IEEE-754 is not the One True Approach and that other
approaches are not necessarily heresy and may have some merit. To go back
to the point where I entered this thread:
 
C

Chris Angelico

Floats are supposed to approximate reals. They're also a Python
data type, and should make some effort to fit in with the rest of
the language.

That's what I thought a week ago. But that's not really true. Floats
are supposed to hold non-integral values, but the data type is "IEEE
754 floating point", not "real number". There's several ways to store
real numbers, and not one of them is (a) perfectly accurate, or (b)
plausibly fast to calculate. Using rationals (fractions) with infinite
range leads to exponential performance costs, and still doesn't
properly handle irrationals like pi. And if you cap the denominator to
a power of 2 and cap the length of the mantissa, err I mean numerator,
then you have IEEE 754 floating point. Python offers you a way to
store and manipulate floating point numbers, not real numbers.

Chris Angelico
 
G

Gregory Ewing

Steven said:
Fair point. Call it an extension of the Kronecker Delta to the reals then.

That's called the Dirac delta function, and it's a bit different --
instead of a value of 1, it has an infinitely high spike of zero
width at the origin, whose integral is 1. (Which means it's not
strictly a function, because it's impossible for a true function
on the reals to have those properties.)

You don't normally use it on its own; usually it turns up as part
of an integral. I find it difficult to imagine a numerical algorithm
that relies on directly evaluating it. Such an algorithm would be
numerically unreliable. You just wouldn't do it that way; you'd
find some other way to calculate the integral that avoids evaluating
the delta.
y = 2.1e12
if abs(x - y) <= 1e-9:
# x is equal to y, within exact tolerance
...

If you expect your numbers to be on the order of 1e12, then 1e-9
is obviously not a sensible choice of tolerance. You don't just
pull tolerances out of thin air, you justify them based on
knowledge of the problem at hand.
In practice, either the function needs some sort of "how to decide
equality" parameter,

If it's general purpose library code, then yes, that's exactly
what it needs.
or you use exact floating point equality and leave it
up to the caller to make sure the arguments are correctly rounded

Not really a good idea. Trying to deal with this kind of thing
by rounding is fraught with difficulties and pitfalls. It can
only work when you're not really using floats as approximations
of reals, but as some set of discrete values, in which case
it's probably safer to use appropriately-scaled integers.
- from William Kahan, and the C99 standard: hypot(INF, x) is always INF
regardless of the value of x, hence hypot(INF, NAN) returns INF.

- since pow(x, 0) is always 1 regardless of the value of x, pow(NAN, 0)
is also 1.

These are different from your kronecker(), because the result
*never* depends on the value of x, whether it's NaN or not.
But kronecker() clearly does depend on the value of x sometimes.

The reasoning appears to be based on the idea that NaN means
"some value, we just don't know what it is". Accepting that
interpretation, the argument doesn't apply to kronecker().
You can't say that the NaN in kronecker(NaN, 42) doesn't
matter, because if you don't know what value it represents,
you can't be sure that it *isn't* meant to be 42.
Another standard example where NANs get thrown away is the max and min
functions. The latest revision of IEEE-754 (2008) allows for max and min
to ignore NANs.

Do they provide a justification for that? I'm having trouble
seeing how it makes sense.
 
S

Steven D'Aprano

That's called the Dirac delta function, and it's a bit different

Yes, I'm familiar with the Dirac delta. As you say, it's not really
relevant to the question on hand.

In any case, my faux Kronecker was just a throw away example. If you
don't like it, throw it away! The specific example doesn't matter, since
the principle applies: functions may throw away NANs if they are not
relevant to the calculation. The presence of a NAN is not intended to be
irreversible, merely *usually* irreversible.


[...]
If you expect your numbers to be on the order of 1e12, then 1e-9 is
obviously not a sensible choice of tolerance. You don't just pull
tolerances out of thin air, you justify them based on knowledge of the
problem at hand.

Exactly. But that's precisely what people do! Hence my comment (which you
snipped) about people feeling virtuous because they avoid "testing floats
for equality", but then they go and do an operation like the above.

I'm sure you realise this, but for anyone reading who doesn't understand
why the above is silly, there are no floats less than 1e-9 from y above.
 
R

rusi

If you're "fluent" in IEEE-754, then you won't find its behaviour
unexpected. OTOH, if you are approach the issue without preconceptions,
you're likely to notice that you effectively have one exception mechanism
for floating-point and another for everything else.

Three actually: None, nan and exceptions
Furthermore in boolean contexts nan behaves like True whereas None
behaves like false.
 
N

Nobody

Three actually: None, nan and exceptions

None isn't really an exception; at least, it shouldn't be used like that.
Exceptions are for conditions which are in some sense "exceptional". Cases
like dict.get() returning None when the key isn't found are meant for
the situation where the key not existing is unexceptional. If you "expect"
the key to exist, you'd use dict[key] instead (and get an exception if it
doesn't).
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,161
Messages
2,570,892
Members
47,427
Latest member
HildredDic

Latest Threads

Top