Annoying behaviour of the != operator

  • Thread starter Jordan Rastrick
  • Start date
D

Dan Sommers

Dan said:
Rocco Moretti wrote:

The main problem is that Python is trying to stick at least three
different concepts onto the same set of operators: equivalence (are
these two objects the same?), ordering (in a sorted list, which comes
first?), and mathematical "size".
This gives the wacky world where
"[(1,2), (3,4)].sort()" works, whereas "[1+2j, 3+4j].sort()" doesn't.
Python inherits that wackiness directly from (often wacky) world of
Mathematics.
IMO, the true wackiness is that
[ AssertionError, (vars, unicode), __name__, apply ].sort( )
"works," too. Python refusing to sort my list of complex numbers is a
Good Thing.
The "wackyness" I refered to wasn't that a list of complex numbers isn't
sortable, but the inconsistent behaviour of list sorting. As you
mentioned, an arbitraty collection of objects in a list is sortable, but
as soon as you throw a complex number in there, you get an exception.

Yes, I agree: it is inconsistent.
One way to handle that is to refuse to sort anything that doesn't have
a "natural" order. But as I understand it, Guido decided that being
able to sort arbitrary lists is a feature, not a bug. But you can't
sort ones with complex numbers in them, because you also want
'1+3j<3+1j' to raise an error.

As George Sakkis noted, Guido has since recanted. Unfortunately, in
this case, the time machine would have broken too much existing code.
What "conflict"? Where are you getting the doesn't/can't/shouldn't
prescription from?

Perhaps "conflict" wasn't quite the right word. For example, if I
define __ne__ and __equal__ and __lt__, then which method(s) should
Python use if I later use a <= or a >= operator?

"Doesn't" and "can't" (call me pessimistic) comes from all the issues
and disagreements we're having in this thread, and "shouldn't" comes
from the Zen:

Explicit is better than implicit.
In the face of ambiguity, refuse the temptation to guess.
Which method you use depends on what you want to achieve:
(Hypothetical Scheme)
Object Identity? - use 'is'
Mathematical Ordering? - use '__eq__' & friends
Object Equivalence? - use '__equiv__'
Arbitrary Ordering? (e.g. for list sorting) - use '__order__'

So which method would python use when sorting a list that happens to
consist only of numbers? or a list that contains mostly integers and a
few complex numbers?
The only caveat is to define sensible defaults for the cases where one
fuction is not defined. But that shouldn't be too hard.

At the risk of repeating myself:

Explicit is better than implicit.
In the face of ambiguity, refuse the temptation to guess.

Also, as I noted in a previous discussion on an unrelated topic, it
seems that we all have our own notions and limitations of expliciticity
and impliciticity.
__eqiv__ -> __eq__ -> is
__order__ -> __lt__/__cmp__
Except if you want the situation where "[1+2j, 3+4j].sort()" works, and
'1+3j < 3+1j' fails.

I'm sticking with my position that both should fail, unless you
*explicity* tell sort what to do (since for now, we all seem to agree
that the other one should fail). If I have an application that thinks
it has to sort a list of arbitrary objects, then I have to be clever
enough to help.
I think the issue is you thinking along the lines of Mathematical
numbers, where the four different comparisons colapse to one. Object
identity? There is only one 'two' - heck, in pure mathematics, there
isn't even a 'float two'/'int two' difference. Equivalence *is*
mathematical equality, and the "arbitrary ordering" is easily defined
as "true" ordering. It's only when you break away from mathematics do
you see the divergance in behavior.

IIRC, there was a discussion about overhauling of all of Python's
numbers to make them act more like the mathematical entities that they
represent rather than the Python objects that they are. The long/int
"consolidation," better handling of integer division, and the Decimal
class came out of that discussion. But that still leaves the issue of
what to do with 1 < "foo" and "bar" > 2j and 3j < 4j.

Regards,
Dan
 
D

David M. Cooke

Robert Kern said:
greg said:
David said:
To solve that, I would suggest a fourth category of "arbitrary
ordering", but that's probably Py3k material.

We've got that: use hash().
[1+2j, 3+4j].sort(key=hash)
What about objects that are not hashable?
The purpose of arbitrary ordering would be to provide
an ordering for all objects, whatever they might be.

How about id(), then?

And so the circle is completed...

Or something like

def uniquish_id(o):
try:
return hash(o)
except TypeError:
return id(o)

hash() should be the same across interpreter invocations, whereas id()
won't.
 
P

pmaupin

If a behavior change is possible at all, I think a more reasonable
behavior would be:

if any rich comparison methods are defined, always use rich comparisons
(and throw an exception if the required rich comparison method is not
available).

This would at least have the benefit of letting users know what code it
had broken when they try to run it :)

Regards,
Pat
 
R

Rocco Moretti

Before I answer, let me clarify my position. I am NOT advocating any
change for the 2.x series. I'm not even forwarding any particular
proposal for 3.0/3000. My key (and close to sole) point is that behavior
of > & < is conceptually distinct from ordering in a sorted list, even
though the behaviors coincide for the most common case of numbers.
As George Sakkis noted, Guido has since recanted. Unfortunately, in
this case, the time machine would have broken too much existing code.

As I mentioned in response, the referenced email only mentions the
ability to use < & > in comparing arbitrary objects. My key point is
that this is conceptually different than disallowing sorting on
heterogeneous list. There are ways to disallow one, while still allowing
the other.
"Doesn't" and "can't" (call me pessimistic) comes from all the issues
and disagreements we're having in this thread, and "shouldn't" comes
from the Zen:

Explicit is better than implicit.
In the face of ambiguity, refuse the temptation to guess.

Well, Python already "guesses implicitly", everytime you do a "1 + 2.0"
or a "a + b", where 'a' doesn't define '__add__' and 'b' defines
'__radd__' - The trick is to be clear and explicit (in the documetation)
everytine you are implicit (in the running program). It's not guessing -
it's a precisely defined part of the language.
AIUI, __cmp__ exists for backwards compatibility, and __eq__ and friends
are flexible enough to cover any possible comparison scheme.

Except if you want the situation where "[1+2j, 3+4j].sort()" works, and
'1+3j < 3+1j' fails.


I'm sticking with my position that both should fail, unless you
*explicity* tell sort what to do (since for now, we all seem to agree
that the other one should fail). If I have an application that thinks
it has to sort a list of arbitrary objects, then I have to be clever
enough to help.

Even if you decide to disallow sorting heterogenous lists, you still
have the problem of what to do a user defined homogeneous list where
__lt__ doesn't return a boolean. Moreso:
>>Except if you want the situation where "[a, b].sort()" works, and
>>'a < b' fails.

If you combine list sorting and >/< together, there is no way someone
who wants >/< on a specific class to return a non-boolean value to have
a homogeneous list of those objects sort the way they want them too.

If, on the other hand, you split the two and provide a sensible and
explicitly defined fall-back order, then someone who doesn't care about
the distinction can carry on as if they were combined. In addition,
those people who want to sort lists of objects which return non-booleans
for >/< can do that too.

BTW, the optional parameter for the sort function is not a suitable
alternative. The main problem with it is where the sort order is
encoded. The extra parameter in the sort has to be provided *every
place* where sort is called, instead of a single place in the definition
of the object. And you can't subclass the sort function in your user
defined fuction, because sort is an operation on the list, not the
objects in it.

And please don't say that always explicitly specifying the sort order is
a good thing, unless you want the sort order to *always* be specified,
even with builtins. (Sorting numbers? Ascending/Descending/Magnitude? -
Strings? ASCII/EBDIC/Alphabetical/Case Insensitive/Accents at the end or
interspersed? -- Okay, a little petulant, but the real issue is
lists/tuples/dicts. Why isn't the sort order on those explicit?)


All that said, I'm not Guido, and in his wisdom he may decide that
having a sort order different from that given by >/< is an attractive
nuisance, even with user defined objects. He may then deside to disallow
it like he disallows goto's and free-form indentation.
 
M

Max

Jordan said:
I don't want to order the objects. I just want to be able to say if one
is equal to the other.

Here's the justification given:

The == and != operators are not assumed to be each other's
complement (e.g. IEEE 754 floating point numbers do not satisfy
this). It is up to the type to implement this if desired.
Similar for < and >=, or > and <=; there are lots of examples
where these assumptions aren't true (e.g. tabnanny).

Well, never, ever use equality or inequality operations with floating
point numbers anyway, in any language, as they are notoriously
unreliable due to the inherent inaccuracy of floating point. Thats
another pitfall, I'll grant, but its a pretty well known one to anyone
with programming experience. So I don't think thats a major use case.

I think this is referring to IEEE 754's NaN equality test, which
basically states that x==x is false if-and-only-if x.isNaN() is true.
 
S

Steven D'Aprano

Max said:
I think this is referring to IEEE 754's NaN equality test, which
basically states that x==x is false if-and-only-if x.isNaN() is true.

No. He means exactly what he says: when using floats,
it is risky to compare one float to another with
equality, not just NaNs.

This is platform-dependent: I remember the old Standard
Apple Numerics Environment" (SANE) making the claim
that testing equality on Macintoshes was safe. And I've
just spent a fruitless few minutes playing around with
Python on Solaris trying to find a good example. So it
is quite possible to work with floats for *ages* before
being bitten by this.

In general, the problem occurs like this:

Suppose your floating point numbers have six decimal
digits of accuracy. Then a loop like this may never
terminate:

x = 1.0/3
while x != 1.0: # three times one third equals one
print x
x += 1.0/3

It will instead print:
0.333333
0.666666
0.999999
1.333332
1.666665
and keep going.

(You can easily see the result yourself with one of
those cheap-and-nasty calculators with 8 significant
figures. 1/3*3 is not 1.)

It is a lot harder to find examples on good, modern
systems with lots of significant figures, but it can
happen.

Here is a related problem. Adding two floats together
should never give one of the original numbers unless
the other one is zero, correct? Then try this:

py> x = 1.0
py> y = 1e-16 # small, but not *that* small
py> y == 0.0
False
py> x+y == x
True
py> x-x+y = x+y-x
False

(Again, platform dependent, your milage may vary.)

Or try the same calculations with x=1e12 and y=1e-6.

In general, the work-around to these floating point
issues is to avoid floating point in favour of exact
algebraic calculations whenever possible. If you can't
avoid floats:

- always sum numbers from smallest to largest;

- never compare equality but always test whether some
number is within a small amount of your target;

- try to avoid adding or subtracting numbers of wildly
differing magnitudes; and

- be aware of the risk of errors blowing out and have
strategies in place to manage the size of the error.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,135
Messages
2,570,783
Members
47,340
Latest member
orhankaya

Latest Threads

Top