An oddity in list comparison and element assignment

michael.f.ellis · Jun 1, 2006

The following script puzzles me. It creates two nested lists that
compare identically. After identical element assignments, the lists
are different. In one case, a single element is replaced. In the
other, an entire column is replaced.

---------------------------------------------------------------------------------------

'''
An oddity in the behavior of lists of lists. Occurs under
Python 2.4.3 (#69, Mar 29 2006, 17:35:34) [MSC v.1310 32 bit (Intel)]
on win32.
Not tested on other platforms or builds.
'''
a =[[[1,2],[1,2]],[[1,2],[1,2]]]
b = [[range(1,3)]*2]*2
assert(a==b)
print "Initially, python reports that the lists are equal"
a[1][1]=[5]
b[1][1]=[5]
try:
assert(a==b)
except AssertionError:
print "After identical element assignments, the lists are not equal"
print "a is now ", a
print "b is now ", b
-------------------------------------------------------------------------------------

Here's the output on my system.

------------------------------------------------------------------------------------
Initially, python reports that the lists are equal
After identical element assignments, the lists are not equal
a is now [[[1, 2], [1, 2]], [[1, 2], [5]]]
b is now [[[1, 2], [5]], [[1, 2], [5]]]
------------------------------------------------------------------------------------

This seems contrary to one of my fundamental expectations, namely that
objects which compare equally must remain equal after identical
operations. I think what must be going on is that the 'b' list
contains replicated references instead of copies of [range(1,3)]*2 .
IMO, python's == operator should detect this difference in list
structure since it leads to different behavior under element
assignments.

Mike Ellis

Alex Martelli · Jun 1, 2006

operations. I think what must be going on is that the 'b' list
contains replicated references instead of copies of [range(1,3)]*2 .
Right.

IMO, python's == operator should detect this difference in list
structure since it leads to different behavior under element
assignments.

Wrong; equality does not imply any check on identity. You can consider
the definition of "list A equals list B" as:

-- len(A) == len(B), AND,
-- for each valid index i, A == B

This is an extremely natural definition of "equality" for containers:
"they have EQUAL items" [[in the same order, for containers for which
order is relevant]]. Nowhere in this extremely natural definition does
the IDENTITY of the items come into play.

Therefore, your expectations about the effects of item alterations (for
alterable items) are ill-founded.

Try concisely expressing your "should" -- constructively, as pseudocode
that one could use to check for your "strengthened equality", not in
abstract terms of constraints -- and if (as I strongly suspect) you
cannot find a definition that is as simple, concise and natural as the
two-liner above, this might help convince you that your desired
definition would NOT be the most obvious, natural and fundamental, and
therefore would not be appropriate to pick as part of the language's
core. Indeed, it's an interesting problem to code up, if one wants any
generality (for example, identity of immutable items _whose items or
attributes are in turn immutable_ probably should not matter even for
your "strengthened" equality... but that's pretty hard to express!-).

Alex

michael.f.ellis · Jun 1, 2006

Hi Alex,
With all due respect to your well-deserved standing in the Python
community, I'm not convinced that equality shouldn't imply invariance
under identical operations.

Perhaps the most fundamental notion is mathematics is that the left and
right sides of an equation remain identical after any operation applied
to both sides. Our experience of the physical world is similar. If I
make identical modifications to the engines of two identical
automobiles, I expect the difference in performance to be identical.
If my expectation is met, I would assert that either the two vehicles
were not identical to begin with or that my modifications were not
performed identically.

As to containers, would you say that envelope containing five $100
bills is the same as an envelope containing a single $100 bill and 4
xerox copies of it? If so, I'd like to engage in some envelope
exchanges with you

As I see it, reference copying is a very useful performance and memory
optimization. But I don't think it should undermine the validity of
assert(a==b) as a predictor of invariance under identical operations.

Cheers,
Mike Ellis

michael.f.ellis · Jun 1, 2006

oops! last sentence of 2nd paragraph in previous message should read
"If my expectation is NOT met ..."

Tim Peters · Jun 1, 2006

[[email protected]]

...
As I see it, reference copying is a very useful performance and memory
optimization. But I don't think it should undermine the validity of
assert(a==b) as a predictor of invariance under identical operations.

So, as Alex said last time,

Try concisely expressing your "should" -- constructively, as
pseudocode that one could use to check for your "strengthened
equality", not in abstract terms of constraints -- and if (as I
strongly suspect) you cannot find a definition that is as simple,
concise and natural as the two-liner above, this might help
convince you that your desired definition would NOT be the most
obvious, natural and fundamental, and therefore would not be
appropriate to pick as part of the language's core. Indeed,
it's an interesting problem to code up, if one wants any generality
(for example, identity of immutable items _whose items or
attributes are in turn immutable_ probably should not matter even
for your "strengthened" equality... but that's pretty hard to express!-).

So try that. In reality, you can either learn to change your
expectations, or avoid virtually all object-oriented programming
languages. Object identity is generally fundamental to the intended
semantics of such languages, not just an optimization.

Think about a simpler case:

a = [1]
b = a
assert(a == b)
a.remove(1)
b.remove(1)

Oops. The last line dies with an exception, despite that a==b at the
third statement and that ".remove(1)" is applied to both a and b. If
you think a should not equal b at the third statement "because" of
this, you're going to lead a life of increasing but needless despair
;-)

Tim Chase · Jun 1, 2006

As to containers, would you say that envelope containing five $100

bills is the same as an envelope containing a single $100 bill and 4
xerox copies of it? If so, I'd like to engage in some envelope
exchanges with you

if len(set([bill.serialnumber for bill in envelope])) !=
len(envelope): refuseMichaelsExchange()

Though the way references work, you would have an envelope
containing only 5 slips of paper that all say "I have a $100
bill"...

-tkc

Fredrik Lundh · Jun 1, 2006

As to containers, would you say that envelope containing five $100
bills is the same as an envelope containing a single $100 bill and 4
xerox copies of it? If so, I'd like to engage in some envelope
exchanges with you

if you spent as much time *learning* stuff as you spend making up irrelevant examples,
you might end up learning how assignments, repeat operators, and references work in
Python.

it's only hard to understand if you don't want to understand it.

</F>

Scott David Daniels · Jun 1, 2006

As to containers, would you say that envelope containing five $100
bills is the same as an envelope containing a single $100 bill and 4
xerox copies of it? If so, I'd like to engage in some envelope
exchanges with you

Would you say that envelope containing five $100 bills is equal to
an envelope containing five $100 bills with different serial numbers?

--Scott David Daniels
(e-mail address removed)

Jim Segrave · Jun 1, 2006

Hi Alex,
With all due respect to your well-deserved standing in the Python
community, I'm not convinced that equality shouldn't imply invariance
under identical operations.

Perhaps the most fundamental notion is mathematics is that the left and
right sides of an equation remain identical after any operation applied
to both sides. Our experience of the physical world is similar. If I
make identical modifications to the engines of two identical
automobiles, I expect the difference in performance to be identical.
If my expectation is met, I would assert that either the two vehicles
were not identical to begin with or that my modifications were not
performed identically.

As to containers, would you say that envelope containing five $100
bills is the same as an envelope containing a single $100 bill and 4
xerox copies of it? If so, I'd like to engage in some envelope
exchanges with you

As I see it, reference copying is a very useful performance and memory
optimization. But I don't think it should undermine the validity of
assert(a==b) as a predictor of invariance under identical operations.

I think you end up with a totally unwieldy definition of '==' for
containers, since you have to check for _identical_ aliasing to
whatever depth it might occur, and, if you insist on equality
modeling the physical world, two lists can only be equal if:

for each corresponding element in the two lists, either the element is
a reference to the same underlying object or the corresponding
elements are references to objects which do not have and never will
have other references bound to them.

For example:

ra = ['a', 'a']
rb = ['b', 'b']

l1 = [ra, rb]
l2 = [ra, rb]

This will be equal by your definition and will preserve equality over
identical operations on l1 and l2

l3 = [ ['a', 'b'], ['a', 'b']]

This list will be identical, under your definition, so long as we
don't have anyone doing anything to the references ra or rb. Your
equality test has to claim that l1 and l3 are not equal, since ra
could be changed and that's not an operation on l1 or l3

This also leaves out the complexity of analysiing nested structures -
if you have a tuple, containing tuples which contain lists, then are
those top level tuples 'equal' if there are aliases in the lists? How
many levels deep should an equality test go?

Does the more common test, to see if the elements of a sequence are
identical at the time of comparision need a new operator or hand
coding, since most of the time programmers aren't interested in future
equality or not of the structures.

michael.f.ellis · Jun 1, 2006

Hi Tim,
In your example, a & b are references to the same object. I agree they
should compare equally. But please note that a==b is True at every
point in your example, even after the ValueError raised by b.remove(1).
That's good consistent behavior.

My original example is a little different. a and b never referred to
the same object. They were containers created by different expressions
with no object in common. The problem arises because the overloaded *
operator makes row level copies of the lists in b. There's nothing
wrong with that, but the fact remains that a and b are different in a
very significant way.

I agree with Alex that checking for this type of inequality is not a
trivial programming exercise. It requires (at least) a parallel
recursion that counts references with the containers being compared .
At the same time, I know that much harder programming problems have
been solved. Where I disagree with Alex is in the labeling of the
existing behavior as 'natural'.

Alternatively, it might make sense to disallow == for containers by
raising a TypeError although that would eliminate a largely useful
feature.

Realistically, I know that Python is unlikely to adopt either
alternative. It would probably break a lot of existing code. My
point in the original post was to raise what I felt was a useful topic
for discussion and to help others avoid a pitfall that cost me a couple
of hours of head-scratching.

By the way, I've been programming professionally for over 25 years and
have used at least 30 different languages. During the past few years,
Python has become my language of choice for almost everything because
it helps me deliver more productivity and value to my clients.

Cheers,
Mike

michael.f.ellis · Jun 1, 2006

Yes. You stated it quite precisely. I believe l1==l2 should always
return True and l1==l3 should always be False. (unless l3 is reassigned
as l3=l1). Your idea of a separate operator for 'all elements have
numerically equal values at the moment of comparision' is a good one.
For want of a better name, it could be called DeepCopyEquality(a,b) and
would be equivalent to a byte-by-byte comparison of two distinct
regions in memory created by a deep copies of a and b.

Cheers,
Mike

Jim said:
Hi Alex,
With all due respect to your well-deserved standing in the Python
community, I'm not convinced that equality shouldn't imply invariance
under identical operations.

Perhaps the most fundamental notion is mathematics is that the left and
right sides of an equation remain identical after any operation applied
to both sides. Our experience of the physical world is similar. If I
make identical modifications to the engines of two identical
automobiles, I expect the difference in performance to be identical.
If my expectation is met, I would assert that either the two vehicles
were not identical to begin with or that my modifications were not
performed identically.

As to containers, would you say that envelope containing five $100
bills is the same as an envelope containing a single $100 bill and 4
xerox copies of it? If so, I'd like to engage in some envelope
exchanges with you

As I see it, reference copying is a very useful performance and memory
optimization. But I don't think it should undermine the validity of
assert(a==b) as a predictor of invariance under identical operations.

Click to expand...

I think you end up with a totally unwieldy definition of '==' for
containers, since you have to check for _identical_ aliasing to
whatever depth it might occur, and, if you insist on equality
modeling the physical world, two lists can only be equal if:

for each corresponding element in the two lists, either the element is
a reference to the same underlying object or the corresponding
elements are references to objects which do not have and never will
have other references bound to them.

For example:

ra = ['a', 'a']
rb = ['b', 'b']

l1 = [ra, rb]
l2 = [ra, rb]

This will be equal by your definition and will preserve equality over
identical operations on l1 and l2

l3 = [ ['a', 'b'], ['a', 'b']]

This list will be identical, under your definition, so long as we
don't have anyone doing anything to the references ra or rb. Your
equality test has to claim that l1 and l3 are not equal, since ra
could be changed and that's not an operation on l1 or l3

This also leaves out the complexity of analysiing nested structures -
if you have a tuple, containing tuples which contain lists, then are
those top level tuples 'equal' if there are aliases in the lists? How
many levels deep should an equality test go?

Does the more common test, to see if the elements of a sequence are
identical at the time of comparision need a new operator or hand
coding, since most of the time programmers aren't interested in future
equality or not of the structures.

Kent Johnson · Jun 1, 2006

Hi Alex,
With all due respect to your well-deserved standing in the Python
community, I'm not convinced that equality shouldn't imply invariance
under identical operations.

Perhaps the most fundamental notion is mathematics is that the left and
right sides of an equation remain identical after any operation applied
to both sides. Our experience of the physical world is similar. If I
make identical modifications to the engines of two identical
automobiles, I expect the difference in performance to be identical.
If my expectation is met, I would assert that either the two vehicles
were not identical to begin with or that my modifications were not
performed identically.

But programming is not mathematics and assignment is not an equation.
How about this:

In [1]: a=3.0

In [2]: b=3

In [3]: a==b
Out[3]: True

In [4]: a/2 == b/2
Out[4]: False

Kent

Jim Segrave · Jun 1, 2006

Yes. You stated it quite precisely. I believe l1==l2 should always
return True and l1==l3 should always be False. (unless l3 is reassigned
as l3=l1). Your idea of a separate operator for 'all elements have
numerically equal values at the moment of comparision' is a good one.
For want of a better name, it could be called DeepCopyEquality(a,b) and
would be equivalent to a byte-by-byte comparison of two distinct
regions in memory created by a deep copies of a and b.

The operator which works at the moment of comaprision is already there
- that's what == does.
If you really think there's a need for a comparision which includes
dealing with aliasing, then it seems to me a python module with a
set of functions for comparisions would make more sense.

michael.f.ellis · Jun 1, 2006

Considering the number of new programmers who get bit by automatic
coercion, I wish Dennis Ritchie had made some different choices when he
designed C. But then I doubt he ever dreamed it would become so wildly
successful.

Being a curmudgeon purist I'd actually prefer it if Python raised a
TypeError on float vs integer comparisons.

Cheers,
Mike

Kent said:
Hi Alex,
With all due respect to your well-deserved standing in the Python
community, I'm not convinced that equality shouldn't imply invariance
under identical operations.

Perhaps the most fundamental notion is mathematics is that the left and
right sides of an equation remain identical after any operation applied
to both sides. Our experience of the physical world is similar. If I
make identical modifications to the engines of two identical
automobiles, I expect the difference in performance to be identical.
If my expectation is met, I would assert that either the two vehicles
were not identical to begin with or that my modifications were not
performed identically.

Click to expand...

But programming is not mathematics and assignment is not an equation.
How about this:

In [1]: a=3.0

In [2]: b=3

In [3]: a==b
Out[3]: True

In [4]: a/2 == b/2
Out[4]: False

Kent

michael.f.ellis · Jun 1, 2006

Yes (unless I was testing the assertion that the second envelope did
not contain counterfeits of the first)

Erik Max Francis · Jun 1, 2006

With all due respect to your well-deserved standing in the Python
community, I'm not convinced that equality shouldn't imply invariance
under identical operations.

Doo you really want

2 == 2.0

to be False?

Maric Michaud · Jun 1, 2006

Le Jeudi 01 Juin 2006 18:00, (e-mail address removed) a écrit :

Perhaps the most fundamental notion is mathematics is that the left and
right sides of an equation remain identical after any operation applied
to both sides.

IMHO, you are not aware that the '=' symbol of mathematics exists in python,
it's the 'is' assertion.

a is b
and then, do what you want with a (or b), a is b remains True.

THIS is the meaning of expr1 = expr2, but in computer science, this is not as
important as it is in pure logic (most languages do not even provide the 'is'
assertion).

--
_____________

Maric Michaud
_____________

Aristote - www.aristote.info
3 place des tapis
69004 Lyon
Tel: +33 426 880 097

Slawomir Nowaczyk · Jun 1, 2006

#> Scott David Daniels wrote:
#> > Would you say that envelope containing five $100 bills is equal to
#> > an envelope containing five $100 bills with different serial numbers?

#> Yes (unless I was testing the assertion that the second envelope did
#> not contain counterfeits of the first)

So, what if Bank of America later decided that bills with serial
numbers containing "7" are no longer valid?

In other word, *if* you assume equality must be preserved by future
modifications, than no two different (modifiable) objects can ever be
really equal.

--
Best wishes,
Slawomir Nowaczyk
( (e-mail address removed) )

I believe that math illiteracy affects 7 out of every 5 people.

michael.f.ellis · Jun 1, 2006

Truthfully, I wouldn't mind it at all. In Python, I frequently write
things like
i == int(f)
or vice versa just to avoid subtle bugs that sometimes creep in when
later modifications to code change the original assumptions.

When working in C, I always set the compiler for maximum warnings and
do my damndest to make them all go away. In the long run, time spent on
rigorous coding always repays itself with interest in time saved
debugging.

Mike

michael.f.ellis · Jun 1, 2006

I believe that 'is' tests equality of reference, such that
False

The 'is' operator tells you whether a and b refer to the same object.
What I've been discussing is whether == should test for "structural"
equality so that a and b remain equivalent under parallel mutations
(and also under single mutations to common references)

Cheers,
Mike

Range / empty list issues??	1	Dec 11, 2023
Average of MultiMode of a list of a list	1	Oct 28, 2022
Count each unique element in list of lists	1	Nov 8, 2013
Lowest Value in List	5	Oct 2, 2013
[newbie] confusion concerning fetching an element in a 2d-array	13	Mar 25, 2014
Modifying Element In For List	3	Nov 15, 2010
Tic Tac Toe Game	2	Mar 10, 2024
An empty initializer is invalid for an array with unspecified bound	0	Jul 1, 2020

An oddity in list comparison and element assignment

michael.f.ellis

Alex Martelli

michael.f.ellis

michael.f.ellis

Tim Peters

Tim Chase

Fredrik Lundh

Scott David Daniels

Jim Segrave

michael.f.ellis

michael.f.ellis

Kent Johnson

Jim Segrave

michael.f.ellis

michael.f.ellis

Erik Max Francis

Maric Michaud

Slawomir Nowaczyk

michael.f.ellis

michael.f.ellis

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads