float("nan") in set or as key

M

MRAB

Here's a curiosity. float("nan") can occur multiple times in a set or as
a key in a dict:
{nan, nan}

except that sometimes it can't:
{nan}
 
S

Steven D'Aprano

Here's a curiosity. float("nan") can occur multiple times in a set or as
a key in a dict:

{nan, nan}

That's an implementation detail. Python is free to reuse the same object
when you create an immutable object twice on the same line, but in this
case doesn't. (I don't actually know if it ever does, but it could.)

And since NAN != NAN always, you can get two NANs in the one set, since
they're unequal.

when you write float('nan')

except that sometimes it can't:

{nan}

But in this case, you try to put the same NAN in the set twice. Since
sets optimize element testing by checking for identity before equality,
the NAN only goes in once.
 
G

Gregory Ewing

MRAB said:
float("nan") can occur multiple times in a set or as
a key in a dict:

{nan, nan}

except that sometimes it can't:

{nan}

NaNs are weird. They're not equal to themselves:

Python 2.7 (r27:82500, Oct 15 2010, 21:14:33)
[GCC 4.2.1 (Apple Inc. build 5664)] on darwin
Type "help", "copyright", "credits" or "license" for more information.False

This confuses the daylights out of Python's dict lookup machinery,
which assumes that two references to the same object can't possibly
compare unequal, so it doesn't bother calling __eq__ on them.
 
J

John Nagle

MRAB said:
float("nan") can occur multiple times in a set or as a key in a dict:

{nan, nan}

except that sometimes it can't:

{nan}

NaNs are weird. They're not equal to themselves:

Python 2.7 (r27:82500, Oct 15 2010, 21:14:33)
[GCC 4.2.1 (Apple Inc. build 5664)] on darwin
Type "help", "copyright", "credits" or "license" for more information.False

This confuses the daylights out of Python's dict lookup machinery,
which assumes that two references to the same object can't possibly
compare unequal, so it doesn't bother calling __eq__ on them.

Right.

The correct answer to "nan == nan" is to raise an exception, because
you have asked a question for which the answer is nether True nor False.

The correct semantics for IEEE floating point look something like
this:

1/0 INF
INF + 1 INF
INF - INF NaN
INF == INF unordered
NaN == NaN unordered

INF and NaN both have comparison semantics which return
"unordered". The FPU sets a bit for this, which most language
implementations ignore. But you can turn on floating point
exception traps, and on x86 machines, they're exact - the
exception will occur exactly at the instruction which
triggered the error. In superscalar CPUs, a sizable part of
the CPU handles the unwinding necessary to do that. x86 does
it, because it's carefully emulating non-superscalar machines.
Most RISC machines don't bother.

Python should raise an exception on unordered comparisons.
Given that the language handles integer overflow by going to
arbitrary-precision integers, checking the FPU status bits is
cheap.

The advantage of raising an exception is that the logical operations
still work. For example,

not (a == b)
a != b

will always return the same results if exceptions are raised for
unordered comparison results. Also, exactly one of

a = b
a < b
a > b

is always true - something sorts tend to assume.

If you get an unordered comparison exception, your program
almost certainly was getting wrong answers.

(I used to do dynamics simulation engines, where this mattered.)

John Nagle
 
S

Steven D'Aprano

The correct answer to "nan == nan" is to raise an exception, because
you have asked a question for which the answer is nether True nor False.

Wrong.

The correct answer to "nan == nan" is False, they are not equal. Just as
None != "none", and 42 != [42], or a teacup is not equal to a box of
hammers.

Asking whether NAN < 0 could arguably either return "unordered" (raise an
exception) or return False ("no, NAN is not less than zero; neither is it
greater than zero"). The PowerPC Macintishes back in the 1990s supported
both behaviours. But that's different to equality tests.

The correct semantics for IEEE floating point look something like
this:

1/0 INF
INF + 1 INF
INF - INF NaN
INF == INF unordered

Wrong. Equality is not an order comparison.
 
N

Nobody


That's overstating it. There's a good argument to be made for raising an
exception. Bear in mind that an exception is not necessarily an error,
just an "exceptional" condition.
The correct answer to "nan == nan" is False, they are not equal.

There is no correct answer to "nan == nan". Defining it to be false is
just the "least wrong" answer. Arguably, "nan != nan" should also be
false, but that would violate the invariant "(x != y) == !(x == y)".
 
S

Steven D'Aprano

That's overstating it. There's a good argument to be made for raising an
exception.

If so, I've never heard it, and I cannot imagine what such a good
argument would be. Please give it.

(I can think of *bad* arguments, like "NANs confuse me and I don't
understand the reason for their existence, therefore I'll give them
behaviours that make no sense and aren't useful". But you did state there
is a *good* argument.)


Bear in mind that an exception is not necessarily an error,
just an "exceptional" condition.

True, but what's your point? Testing two floats for equality is not an
exceptional condition.

There is no correct answer to "nan == nan".

Why on earth not?

Defining it to be false is just the "least wrong" answer.

So you say, but I think you are incorrect.

Arguably, "nan != nan" should also be false,
but that would violate the invariant "(x != y) == !(x == y)".

I cannot imagine what that argument would be. Please explain.
 
C

Chris Torek

Incidentally, note:

$ python
...False

The correct answer to "nan == nan" is to raise an exception, because
you have asked a question for which the answer is nether True nor False.

Well, in some sense, the "correct answer" depends on which question
you *meant* to ask. :) Seriously, some (many?) instruction sets
have two kinds of comparison instructions: one that raises an
exception here, and one that does not.
The correct semantics for IEEE floating point look something like
this:

1/0 INF
INF + 1 INF
INF - INF NaN
INF == INF unordered
NaN == NaN unordered

INF and NaN both have comparison semantics which return
"unordered". The FPU sets a bit for this, which most language
implementations ignore.

Again, this depends on the implementation.

This is similar to (e.g.) the fact that on the MIPS, there are two
different integer add instructions ("addi" and "addiu"): one
raises an overflow exception, the other performs C "unsigned"
style arithmetic (where, e.g., 0xffffffff + 1 = 0, in 32 bits).
Python should raise an exception on unordered comparisons.
Given that the language handles integer overflow by going to
arbitrary-precision integers, checking the FPU status bits is
cheap.

I could go for that myself. But then you also need a "don't raise
exception but give me an equality test result" operator (for various
special-case purposes at least) too. Of course a simple "classify
this float as one of normal, subnormal, zero, infinity, or NaN"
operator would suffice here (along with the usual "extract sign"
and "differentiate between quiet and signalling NaN" operations).
 
C

Chris Angelico

If exceptions had commonly existed in that environment there's no chance they would have chosen that behavior; comparison against NaN (or any operation with NaN) would have signaled a floating point exception.  That is the correct way to handle exceptional conditions.

The only reason to keep NaN's current behavior is to adhere to IEEE, but given that Python has trailblazed a path of correcting arcane mathematical behavior, I definitely see an argument that Python should do the same for NaN, and if it were done Python would be a better language.

If you're going to change behaviour, why have a floating point value
called "nan" at all? Other than being a title for one's grandmother,
what meaning does that string have, and why should it be able to be
cast as floating point?

Lifting from http://en.wikipedia.org/wiki/NaN a list of things that
can return a NaN (I've removed non-ASCII characters from this
snippet):
* Operations with a NaN as at least one operand.
(you need to bootstrap that somehow, so we can ignore this - it just
means that nan+1 = nan)

* The divisions 0/0 and infinity/infinity
* The multiplications 0*infinity and infinity*0
* The additions +inf + (-inf), (-inf) + +inf and equivalent subtractions
* The standard pow function and the integer exponent pown function
define 0**0, 1**inf, and inf**0 as 1.
* The powr function define all three indeterminate forms as invalid
operations and so returns NaN.
* The square root of a negative number.
* The logarithm of a negative number
* The inverse sine or cosine of a number that is less than -1 or
greater than +1.

Rather than having comparisons with NaN trigger exceptions, wouldn't
it be much cleaner to have all these operations trigger exceptions?
And, I would guess that they probably already do.

NaN has an additional use in that it can be used like a "null
pointer"; a floating-point variable can store 1.0, or 0.000000000005,
or "no there's no value that I'm storing in this variable". Since a
Python variable can contain None instead of a float, this use is
unnecessary too.

So, apart from float("nan"), are there actually any places where real
production code has to handle NaN? I was unable to get a nan by any of
the above methods, except for operations involving inf; for instance,
float("inf")-float("inf") == nan. All the others raised an exception
rather than return nan.

Chris Angelico
 
C

Chris Angelico

If I were designing a new floating-point standard for hardware, I would consider getting rid of NaN.  However, with the floating point standard that exists, that almost all floating point hardware mostly conforms to, there are certain bit pattern that mean NaN.

Python could refuse to construct float() objects out of NaN (I doubt it would even be a major performance penalty), but there's reasons why you wouldn't, the main one being to interface with other code that does use NaN.  It's better, then, to recognize the NaN bit patterns and do something reasonable when trying to operate on it.

Okay, here's a question. The Python 'float' value - is it meant to be
"a Python representation of an IEEE double-precision floating point
value", or "a Python representation of a real number"? For the most
part, Python's data types are defined by their abstract concepts - a
list isn't defined as a linked list of pointers, it's defined as an
ordered collection of objects. Python 3 removes the distinction
between 'int' and 'long', where 'int' is <2**32 and 'long' isn't, so
now a Py3 integer is... any integer.

The sys.float_info struct exposes details of floating point
representation. In theory, a Python implementation that uses bignum
floats could quite happily set all those values to extremes and work
with enormous precision (or could use a REXX-style "numeric digits
100" command to change the internal rounding, and automatically update
sys.float_info). And in that case, there would be no NaN value.

If Python is interfacing with some other code that uses NaN, that code
won't be using Python 'float' objects - it'll be using IEEE binary
format, probably. So all it would need to entail is a special return
value from an IEEE Binary to Python Float converter function (maybe
have it return None), and NaN is no longer a part of Python.

The real question is: Would NaN's removal be beneficial? And if so,
would it be worth the effort?

Chris Angelico
 
R

rusi

Okay, here's a question. The Python 'float' value - is it meant to be
"a Python representation of an IEEE double-precision floating point
value", or "a Python representation of a real number"? For the most
part, Python's data types are defined by their abstract concepts - a
list isn't defined as a linked list of pointers, it's defined as an
ordered collection of objects. Python 3 removes the distinction
between 'int' and 'long', where 'int' is <2**32 and 'long' isn't, so
now a Py3 integer is... any integer.

The sys.float_info struct exposes details of floating point
representation. In theory, a Python implementation that uses bignum
floats could quite happily set all those values to extremes and work
with enormous precision (or could use a REXX-style "numeric digits
100" command to change the internal rounding, and automatically update
sys.float_info). And in that case, there would be no NaN value.

If Python is interfacing with some other code that uses NaN, that code
won't be using Python 'float' objects - it'll be using IEEE binary
format, probably. So all it would need to entail is a special return
value from an IEEE Binary to Python Float converter function (maybe
have it return None), and NaN is no longer a part of Python.

The real question is: Would NaN's removal be beneficial? And if so,
would it be worth the effort?

Chris Angelico

nan in floating point is like null in databases
It may be worthwhile to have a look at what choices SQL has made
http://en.wikipedia.org/wiki/Null_(SQL)
 
S

Steven D'Aprano

So, apart from float("nan"), are there actually any places where real
production code has to handle NaN? I was unable to get a nan by any of
the above methods, except for operations involving inf; for instance,
float("inf")-float("inf") == nan. All the others raised an exception
rather than return nan.

That's Python's poor design, due to reliance on C floating point
libraries that have half-hearted support for IEEE-754, and the
obstruction of people who don't understand the usefulness of NANs. They
shouldn't raise unless the caller specifies that he wants exceptions. The
default behaviour should be the most useful one, namely quiet
(propagating) NANs, rather than halting the calculation because of
something which may or may not be an error and may or may not be
recoverable.

Even Apple's Hypertalk supported them better in the late 1980s than
Python does now, and that was a language aimed at non-programmers!

The Decimal module is a good example of what floats should do. All flags
are supported, so you can choose whether you want exceptions or NANs. I
don't like Decimal's default settings, but at least they can be changed.
 
S

Steven D'Aprano

Okay, here's a question. The Python 'float' value - is it meant to be "a
Python representation of an IEEE double-precision floating point value",
Yes.

or "a Python representation of a real number"?

No.

Floats are not real numbers. Many fundamental properties of the reals are
violated by floats, and I'm not talking about "weird" things like NANs
and INFs, but ordinary numbers:
False



For the most part,
Python's data types are defined by their abstract concepts - a list
isn't defined as a linked list of pointers,

Nor is it implemented as a linked list of pointers.

The sys.float_info struct exposes details of floating point
representation. In theory, a Python implementation that uses bignum
floats could quite happily set all those values to extremes and work
with enormous precision (or could use a REXX-style "numeric digits 100"
command to change the internal rounding, and automatically update
sys.float_info). And in that case, there would be no NaN value.

NANs aren't for overflow, that's what INFs are for. Even if you had
infinite precision floats and could get rid of INFs, you would still need
NANs.

The real question is: Would NaN's removal be beneficial?

No, it would be another step backwards to the bad old days before the
IEEE standard.
 
J

John Nagle

Yes. I used to write dynamic simulation engines. There were
situations that produced floating point overflow, leading to NaN
values. This wasn't an error; it just meant that the timestep
had to be reduced to handle some heavy object near the moment of
first collision.

Note that the difference between two INF values is a NaN.

It's important that ordered comparisons involving NaN and INF
raise exceptions so that you don't lose an invalid value. If
you're running with non-signaling NaNs, the idea is supposed to
be that, at the end of the computation, you check all your results
for INF and NaN values, to make sure you didn't overflow somewhere
during the computation. If, within the computation, there are
branches based on ordered comparisons, and those don't raise an
exception when the comparison result is unordered, you can reach
the end of the computation with valid-looking but wrong values.

John Nagle
 
R

Raymond Hettinger

Here's a curiosity. float("nan") can occur multiple times in a set or as
a key in a dict:

Which is by design.

NaNs intentionally have multiple possible instances (some
implementations even include distinct payload values).

Sets and dicts intentionally recognize an instance as being equal to
itself (identity-implies-equality); otherwise, you could put a NaN in
a set/dict but not be able to retrieve it. Basic invariants would
fail -- such as: assert all(elem in container for elem in container).

The interesting thing is that people experimenting with exotic objects
(things with random hash functions, things with unusual equality or
ordering relations, etc) are "surprised" when those objects display
their exotic behaviors.

To me, the "NaN curiousities" are among the least interesting. It's
more fun to break sort algorithms with sets (which override the
ordering relations with subset/superset relations) or with an object
that mutates a list during the sort. Now, that is curious :)

Also, Dr Mertz wrote a Charming Python article full of these
curiosities:
http://gnosis.cx/publish/programming/charming_python_b25.txt

IMO, equality and ordering are somewhat fundamental concepts. If a
class is written that twists those concepts around a bit, then it
should be no surprise if curious behavior emerges. Heck, I would
venture to guess that something as simple as assuming the speed of
light is constant might yield twin paradoxes and other
curiousities ;-)

Raymond
 
S

Steven D'Aprano

That's Python's poor design, due to reliance on C floating point
libraries that have half-hearted support for IEEE-754, and the
obstruction of people who don't understand the usefulness of NANs.

That last comment mine is a bit harsh, and I'd like to withdraw it as
unnecessarily confrontational.
 
C

Chris Angelico

The former.  Unlike the case with integers, there is no way that I knowof to represent an abstract real number on a digital computer.

This seems peculiar. Normally Python seeks to define its data types in
the abstract and then leave the concrete up to the various
implementations - note, for instance, how Python 3 has dispensed with
'int' vs 'long' and just made a single 'int' type that can hold any
integer. Does this mean that an implementation of Python on hardware
that has some other type of floating point must simulate IEEE
double-precision in all its nuances?

I'm glad I don't often need floating point numbers. They can be so annoying!

Chris Angelico
 
C

Chris Angelico

I think you misunderstood what I was saying.

It's not *possible* to represent a real number abstractly in any digital computer.  Python couldn't have an "abstract real number" type even it wanted to.

True, but why should the "non-integer number" type be floating point
rather than (say) rational? Actually, IEEE floating point could mostly
be implemented in a two-int rationals system (where the 'int' is
arbitrary precision, so it'd be Python 2's 'long' rather than its
'int'); in a sense, the mantissa is the numerator, and the scale
defines the denominator (which will always be a power of 2). Yes,
there are very good reasons for going with the current system. But are
those reasons part of the details of implementation, or are they part
of the definition of the data type?
(Math aside: Real numbers are not countable, meaning they cannot be put into one-to-one correspondence with integers.  A digital computer can onlyrepresent countable things exactly, for obvious reasons; therefore, to model non-countable things like real numbers, one must use a countable approximation like floating-point.)

Right. Obviously a true 'real number' representation can't be done.
But there are multiple plausible approximations thereof (the best
being rationals).

Not asking for Python to be changed, just wondering why it's defined
by what looks like an implementation detail. It's like defining that a
'character' is an 8-bit number using the ASCII system, which then
becomes problematic with Unicode. (Ohai, C, didn't notice you standing
there.)

Chris Angelico
 
J

Jerry Hill

True, but why should the "non-integer number" type be floating point
rather than (say) rational?

You seem to be implying that python only provides a single non-integer
numeric type. That's not true. Python ships with a bunch of
different numeric types, including a rational type. Off the top of my
head, we have:

IEEE floating point numbers
(http://docs.python.org/library/stdtypes.html#numeric-types-int-float-long-complex)
Rationals (http://docs.python.org/library/fractions.html)
Base-10 fixed and floating point numbers
(http://docs.python.org/library/decimal.html)
Complex numbers
(http://docs.python.org/library/stdtypes.html#numeric-types-int-float-long-complex
plus http://docs.python.org/library/cmath.html)
Integers (both ints and longs, which are pretty well unified by now)

Floats have far and away the best performance in most common
situations, so they end up being the default, but if you want to use
something different, it's usually not hard to do.
 
G

Grant Edwards

This seems peculiar. Normally Python seeks to define its data types
in the abstract and then leave the concrete up to the various
implementations - note,

But, "real numbers" and "IEEE float" are so different that I don't
think that it would be a wise decision for people to pretend they're
working with real numbers when in fact they are working with IEEE
floats.
for instance, how Python 3 has dispensed with 'int' vs 'long' and
just made a single 'int' type that can hold any integer.

Those concepts are much closer than "real numbers" and "IEEE floats".
Does this mean that an implementation of Python on hardware that has
some other type of floating point must simulate IEEE double-precision
in all its nuances?

I certainly hope so. I depend on things like propogation of
non-signalling nans, the behavior of infinities, etc.
I'm glad I don't often need floating point numbers. They can be so
annoying!

They can be -- especially if one pretends one is working with real
numbers instead of fixed-length binary floating point numbers. Like
any tool, floating point has to be used properly. Screwdrivers make
very annoying hammers.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,161
Messages
2,570,892
Members
47,426
Latest member
MrMet

Latest Threads

Top