An oddity in list comparison and element assignment

  • Thread starter michael.f.ellis
  • Start date
M

michael.f.ellis

The following script puzzles me. It creates two nested lists that
compare identically. After identical element assignments, the lists
are different. In one case, a single element is replaced. In the
other, an entire column is replaced.

---------------------------------------------------------------------------------------

'''
An oddity in the behavior of lists of lists. Occurs under
Python 2.4.3 (#69, Mar 29 2006, 17:35:34) [MSC v.1310 32 bit (Intel)]
on win32.
Not tested on other platforms or builds.
'''
a =[[[1,2],[1,2]],[[1,2],[1,2]]]
b = [[range(1,3)]*2]*2
assert(a==b)
print "Initially, python reports that the lists are equal"
a[1][1]=[5]
b[1][1]=[5]
try:
assert(a==b)
except AssertionError:
print "After identical element assignments, the lists are not equal"
print "a is now ", a
print "b is now ", b
-------------------------------------------------------------------------------------

Here's the output on my system.

------------------------------------------------------------------------------------
Initially, python reports that the lists are equal
After identical element assignments, the lists are not equal
a is now [[[1, 2], [1, 2]], [[1, 2], [5]]]
b is now [[[1, 2], [5]], [[1, 2], [5]]]
------------------------------------------------------------------------------------

This seems contrary to one of my fundamental expectations, namely that
objects which compare equally must remain equal after identical
operations. I think what must be going on is that the 'b' list
contains replicated references instead of copies of [range(1,3)]*2 .
IMO, python's == operator should detect this difference in list
structure since it leads to different behavior under element
assignments.

Mike Ellis
 
A

Alex Martelli

operations. I think what must be going on is that the 'b' list
contains replicated references instead of copies of [range(1,3)]*2 .
Right.

IMO, python's == operator should detect this difference in list
structure since it leads to different behavior under element
assignments.

Wrong; equality does not imply any check on identity. You can consider
the definition of "list A equals list B" as:

-- len(A) == len(B), AND,
-- for each valid index i, A == B

This is an extremely natural definition of "equality" for containers:
"they have EQUAL items" [[in the same order, for containers for which
order is relevant]]. Nowhere in this extremely natural definition does
the IDENTITY of the items come into play.

Therefore, your expectations about the effects of item alterations (for
alterable items) are ill-founded.

Try concisely expressing your "should" -- constructively, as pseudocode
that one could use to check for your "strengthened equality", not in
abstract terms of constraints -- and if (as I strongly suspect) you
cannot find a definition that is as simple, concise and natural as the
two-liner above, this might help convince you that your desired
definition would NOT be the most obvious, natural and fundamental, and
therefore would not be appropriate to pick as part of the language's
core. Indeed, it's an interesting problem to code up, if one wants any
generality (for example, identity of immutable items _whose items or
attributes are in turn immutable_ probably should not matter even for
your "strengthened" equality... but that's pretty hard to express!-).


Alex
 
M

michael.f.ellis

Hi Alex,
With all due respect to your well-deserved standing in the Python
community, I'm not convinced that equality shouldn't imply invariance
under identical operations.

Perhaps the most fundamental notion is mathematics is that the left and
right sides of an equation remain identical after any operation applied
to both sides. Our experience of the physical world is similar. If I
make identical modifications to the engines of two identical
automobiles, I expect the difference in performance to be identical.
If my expectation is met, I would assert that either the two vehicles
were not identical to begin with or that my modifications were not
performed identically.

As to containers, would you say that envelope containing five $100
bills is the same as an envelope containing a single $100 bill and 4
xerox copies of it? If so, I'd like to engage in some envelope
exchanges with you :)

As I see it, reference copying is a very useful performance and memory
optimization. But I don't think it should undermine the validity of
assert(a==b) as a predictor of invariance under identical operations.

Cheers,
Mike Ellis
 
M

michael.f.ellis

oops! last sentence of 2nd paragraph in previous message should read
"If my expectation is NOT met ..."
 
T

Tim Peters

[[email protected]]
...
As I see it, reference copying is a very useful performance and memory
optimization. But I don't think it should undermine the validity of
assert(a==b) as a predictor of invariance under identical operations.

So, as Alex said last time,

Try concisely expressing your "should" -- constructively, as
pseudocode that one could use to check for your "strengthened
equality", not in abstract terms of constraints -- and if (as I
strongly suspect) you cannot find a definition that is as simple,
concise and natural as the two-liner above, this might help
convince you that your desired definition would NOT be the most
obvious, natural and fundamental, and therefore would not be
appropriate to pick as part of the language's core. Indeed,
it's an interesting problem to code up, if one wants any generality
(for example, identity of immutable items _whose items or
attributes are in turn immutable_ probably should not matter even
for your "strengthened" equality... but that's pretty hard to express!-).

So try that. In reality, you can either learn to change your
expectations, or avoid virtually all object-oriented programming
languages. Object identity is generally fundamental to the intended
semantics of such languages, not just an optimization.

Think about a simpler case:

a = [1]
b = a
assert(a == b)
a.remove(1)
b.remove(1)

Oops. The last line dies with an exception, despite that a==b at the
third statement and that ".remove(1)" is applied to both a and b. If
you think a should not equal b at the third statement "because" of
this, you're going to lead a life of increasing but needless despair
;-)
 
T

Tim Chase

As to containers, would you say that envelope containing five $100
bills is the same as an envelope containing a single $100 bill and 4
xerox copies of it? If so, I'd like to engage in some envelope
exchanges with you :)

if len(set([bill.serialnumber for bill in envelope])) !=
len(envelope): refuseMichaelsExchange()

Though the way references work, you would have an envelope
containing only 5 slips of paper that all say "I have a $100
bill"...

:)

-tkc
 
F

Fredrik Lundh

As to containers, would you say that envelope containing five $100
bills is the same as an envelope containing a single $100 bill and 4
xerox copies of it? If so, I'd like to engage in some envelope
exchanges with you :)

if you spent as much time *learning* stuff as you spend making up irrelevant examples,
you might end up learning how assignments, repeat operators, and references work in
Python.

it's only hard to understand if you don't want to understand it.

</F>
 
S

Scott David Daniels

As to containers, would you say that envelope containing five $100
bills is the same as an envelope containing a single $100 bill and 4
xerox copies of it? If so, I'd like to engage in some envelope
exchanges with you :)

Would you say that envelope containing five $100 bills is equal to
an envelope containing five $100 bills with different serial numbers?

--Scott David Daniels
(e-mail address removed)
 
J

Jim Segrave

Hi Alex,
With all due respect to your well-deserved standing in the Python
community, I'm not convinced that equality shouldn't imply invariance
under identical operations.

Perhaps the most fundamental notion is mathematics is that the left and
right sides of an equation remain identical after any operation applied
to both sides. Our experience of the physical world is similar. If I
make identical modifications to the engines of two identical
automobiles, I expect the difference in performance to be identical.
If my expectation is met, I would assert that either the two vehicles
were not identical to begin with or that my modifications were not
performed identically.

As to containers, would you say that envelope containing five $100
bills is the same as an envelope containing a single $100 bill and 4
xerox copies of it? If so, I'd like to engage in some envelope
exchanges with you :)

As I see it, reference copying is a very useful performance and memory
optimization. But I don't think it should undermine the validity of
assert(a==b) as a predictor of invariance under identical operations.

I think you end up with a totally unwieldy definition of '==' for
containers, since you have to check for _identical_ aliasing to
whatever depth it might occur, and, if you insist on equality
modeling the physical world, two lists can only be equal if:

for each corresponding element in the two lists, either the element is
a reference to the same underlying object or the corresponding
elements are references to objects which do not have and never will
have other references bound to them.

For example:

ra = ['a', 'a']
rb = ['b', 'b']

l1 = [ra, rb]
l2 = [ra, rb]

This will be equal by your definition and will preserve equality over
identical operations on l1 and l2

l3 = [ ['a', 'b'], ['a', 'b']]

This list will be identical, under your definition, so long as we
don't have anyone doing anything to the references ra or rb. Your
equality test has to claim that l1 and l3 are not equal, since ra
could be changed and that's not an operation on l1 or l3

This also leaves out the complexity of analysiing nested structures -
if you have a tuple, containing tuples which contain lists, then are
those top level tuples 'equal' if there are aliases in the lists? How
many levels deep should an equality test go?

Does the more common test, to see if the elements of a sequence are
identical at the time of comparision need a new operator or hand
coding, since most of the time programmers aren't interested in future
equality or not of the structures.
 
M

michael.f.ellis

Hi Tim,
In your example, a & b are references to the same object. I agree they
should compare equally. But please note that a==b is True at every
point in your example, even after the ValueError raised by b.remove(1).
That's good consistent behavior.

My original example is a little different. a and b never referred to
the same object. They were containers created by different expressions
with no object in common. The problem arises because the overloaded *
operator makes row level copies of the lists in b. There's nothing
wrong with that, but the fact remains that a and b are different in a
very significant way.

I agree with Alex that checking for this type of inequality is not a
trivial programming exercise. It requires (at least) a parallel
recursion that counts references with the containers being compared .
At the same time, I know that much harder programming problems have
been solved. Where I disagree with Alex is in the labeling of the
existing behavior as 'natural'.

Alternatively, it might make sense to disallow == for containers by
raising a TypeError although that would eliminate a largely useful
feature.

Realistically, I know that Python is unlikely to adopt either
alternative. It would probably break a lot of existing code. My
point in the original post was to raise what I felt was a useful topic
for discussion and to help others avoid a pitfall that cost me a couple
of hours of head-scratching.

By the way, I've been programming professionally for over 25 years and
have used at least 30 different languages. During the past few years,
Python has become my language of choice for almost everything because
it helps me deliver more productivity and value to my clients.

Cheers,
Mike
 
M

michael.f.ellis

Yes. You stated it quite precisely. I believe l1==l2 should always
return True and l1==l3 should always be False. (unless l3 is reassigned
as l3=l1). Your idea of a separate operator for 'all elements have
numerically equal values at the moment of comparision' is a good one.
For want of a better name, it could be called DeepCopyEquality(a,b) and
would be equivalent to a byte-by-byte comparison of two distinct
regions in memory created by a deep copies of a and b.

Cheers,
Mike

Jim said:
Hi Alex,
With all due respect to your well-deserved standing in the Python
community, I'm not convinced that equality shouldn't imply invariance
under identical operations.

Perhaps the most fundamental notion is mathematics is that the left and
right sides of an equation remain identical after any operation applied
to both sides. Our experience of the physical world is similar. If I
make identical modifications to the engines of two identical
automobiles, I expect the difference in performance to be identical.
If my expectation is met, I would assert that either the two vehicles
were not identical to begin with or that my modifications were not
performed identically.

As to containers, would you say that envelope containing five $100
bills is the same as an envelope containing a single $100 bill and 4
xerox copies of it? If so, I'd like to engage in some envelope
exchanges with you :)

As I see it, reference copying is a very useful performance and memory
optimization. But I don't think it should undermine the validity of
assert(a==b) as a predictor of invariance under identical operations.

I think you end up with a totally unwieldy definition of '==' for
containers, since you have to check for _identical_ aliasing to
whatever depth it might occur, and, if you insist on equality
modeling the physical world, two lists can only be equal if:

for each corresponding element in the two lists, either the element is
a reference to the same underlying object or the corresponding
elements are references to objects which do not have and never will
have other references bound to them.

For example:

ra = ['a', 'a']
rb = ['b', 'b']

l1 = [ra, rb]
l2 = [ra, rb]

This will be equal by your definition and will preserve equality over
identical operations on l1 and l2

l3 = [ ['a', 'b'], ['a', 'b']]

This list will be identical, under your definition, so long as we
don't have anyone doing anything to the references ra or rb. Your
equality test has to claim that l1 and l3 are not equal, since ra
could be changed and that's not an operation on l1 or l3

This also leaves out the complexity of analysiing nested structures -
if you have a tuple, containing tuples which contain lists, then are
those top level tuples 'equal' if there are aliases in the lists? How
many levels deep should an equality test go?

Does the more common test, to see if the elements of a sequence are
identical at the time of comparision need a new operator or hand
coding, since most of the time programmers aren't interested in future
equality or not of the structures.
 
K

Kent Johnson

Hi Alex,
With all due respect to your well-deserved standing in the Python
community, I'm not convinced that equality shouldn't imply invariance
under identical operations.

Perhaps the most fundamental notion is mathematics is that the left and
right sides of an equation remain identical after any operation applied
to both sides. Our experience of the physical world is similar. If I
make identical modifications to the engines of two identical
automobiles, I expect the difference in performance to be identical.
If my expectation is met, I would assert that either the two vehicles
were not identical to begin with or that my modifications were not
performed identically.

But programming is not mathematics and assignment is not an equation.
How about this:

In [1]: a=3.0

In [2]: b=3

In [3]: a==b
Out[3]: True

In [4]: a/2 == b/2
Out[4]: False

Kent
 
J

Jim Segrave

Yes. You stated it quite precisely. I believe l1==l2 should always
return True and l1==l3 should always be False. (unless l3 is reassigned
as l3=l1). Your idea of a separate operator for 'all elements have
numerically equal values at the moment of comparision' is a good one.
For want of a better name, it could be called DeepCopyEquality(a,b) and
would be equivalent to a byte-by-byte comparison of two distinct
regions in memory created by a deep copies of a and b.

The operator which works at the moment of comaprision is already there
- that's what == does.
If you really think there's a need for a comparision which includes
dealing with aliasing, then it seems to me a python module with a
set of functions for comparisions would make more sense.
 
M

michael.f.ellis

Considering the number of new programmers who get bit by automatic
coercion, I wish Dennis Ritchie had made some different choices when he
designed C. But then I doubt he ever dreamed it would become so wildly
successful.

Being a curmudgeon purist I'd actually prefer it if Python raised a
TypeError on float vs integer comparisons.

Cheers,
Mike

Kent said:
Hi Alex,
With all due respect to your well-deserved standing in the Python
community, I'm not convinced that equality shouldn't imply invariance
under identical operations.

Perhaps the most fundamental notion is mathematics is that the left and
right sides of an equation remain identical after any operation applied
to both sides. Our experience of the physical world is similar. If I
make identical modifications to the engines of two identical
automobiles, I expect the difference in performance to be identical.
If my expectation is met, I would assert that either the two vehicles
were not identical to begin with or that my modifications were not
performed identically.

But programming is not mathematics and assignment is not an equation.
How about this:

In [1]: a=3.0

In [2]: b=3

In [3]: a==b
Out[3]: True

In [4]: a/2 == b/2
Out[4]: False

Kent
 
M

michael.f.ellis

Yes (unless I was testing the assertion that the second envelope did
not contain counterfeits of the first)
 
E

Erik Max Francis

With all due respect to your well-deserved standing in the Python
community, I'm not convinced that equality shouldn't imply invariance
under identical operations.

Doo you really want

2 == 2.0

to be False?
 
M

Maric Michaud

Le Jeudi 01 Juin 2006 18:00, (e-mail address removed) a écrit :
Perhaps the most fundamental notion is mathematics is that the left and
right sides of an equation remain identical after any operation applied
to both sides.

IMHO, you are not aware that the '=' symbol of mathematics exists in python,
it's the 'is' assertion.

a is b
and then, do what you want with a (or b), a is b remains True.

THIS is the meaning of expr1 = expr2, but in computer science, this is not as
important as it is in pure logic (most languages do not even provide the 'is'
assertion).

--
_____________

Maric Michaud
_____________

Aristote - www.aristote.info
3 place des tapis
69004 Lyon
Tel: +33 426 880 097
 
S

Slawomir Nowaczyk

#> Scott David Daniels wrote:
#> > Would you say that envelope containing five $100 bills is equal to
#> > an envelope containing five $100 bills with different serial numbers?

#> Yes (unless I was testing the assertion that the second envelope did
#> not contain counterfeits of the first)

So, what if Bank of America later decided that bills with serial
numbers containing "7" are no longer valid?

In other word, *if* you assume equality must be preserved by future
modifications, than no two different (modifiable) objects can ever be
really equal.

--
Best wishes,
Slawomir Nowaczyk
( (e-mail address removed) )

I believe that math illiteracy affects 7 out of every 5 people.
 
M

michael.f.ellis

Truthfully, I wouldn't mind it at all. In Python, I frequently write
things like
i == int(f)
or vice versa just to avoid subtle bugs that sometimes creep in when
later modifications to code change the original assumptions.

When working in C, I always set the compiler for maximum warnings and
do my damndest to make them all go away. In the long run, time spent on
rigorous coding always repays itself with interest in time saved
debugging.

Mike
 
M

michael.f.ellis

I believe that 'is' tests equality of reference, such that
False

The 'is' operator tells you whether a and b refer to the same object.
What I've been discussing is whether == should test for "structural"
equality so that a and b remain equivalent under parallel mutations
(and also under single mutations to common references)

Cheers,
Mike
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,297
Messages
2,571,536
Members
48,283
Latest member
SherriP988

Latest Threads

Top