Unexpected Behavior Iterating over a Mutating Object

D

Dave Hansen

OK, first, I don't often have the time to read this group, so
apologies if this is a FAQ, though I couldn't find anything at
python.org.

Second, this isn't my code. I wouldn't do this. But a colleague did,
got an unexpected result, and asked me why. I think I can infer what
is occurring, and I was able to find a simple work-around. But I
thought I'd ask about it anyway.

I've been pushing Python at work for use as a scripting language to
get simple tasks done. My colleague is modifying a parts library for
a schematic capture program with a Python script. The library is
entirely in ASCII text.

He's trying to delete a group of parts from the library. The simple
code below shows what is occurring (taken from an IDLE session, Python
2.4.1#65 for Windoze). The "parts" to be deleted in the example
contain the string 'DEL':

---begin included file---
data = [ 'First', 'Second DEL', 'Third', 'Fourth',
'Fifth DEL', 'DEL Sixth', 'Seventh DEL', 'Eighth DEL',
'Ninth DEL', 'Tenth', 'Eleventh', 'Twelfth']
if item.find('DEL') >= 0:
bfr.append(item)
data.remove(item)

['First', 'Third', 'Fourth', 'DEL Sixth', 'Eighth DEL', 'Tenth',
'Eleventh', 'Twelfth']['Second DEL', 'Fifth DEL', 'Seventh DEL', 'Ninth DEL']
---end included file---

It seems when an item is 'remove'd from data, the rest of the list
'shifts' over one, so what was 'next' now occupies the slot of the
'remove'd item. When the next 'item' is selected from 'data', the
_desired_ 'next' item is skipped. So when 'data' has several
consecutive items to be deleted, only every other one is 'remove'd.

The workaround is very simple:

---begin included file--- if item.find('DEL') >= 0:
bfr.append(item)

data.remove(item)

data ['First', 'Third', 'Fourth', 'Tenth', 'Eleventh', 'Twelfth']
bfr
['Second DEL', 'Fifth DEL', 'DEL Sixth', 'Seventh DEL', 'Eighth DEL',
'Ninth DEL']
---end included file---

Again, iterating over an item that is mutating seems like a Bad
Idea(tm) to me. But I was curious: is this the intended behavior, or
does this fall under what C programmers would call 'undefined
behavior.'

Thanks,

-=Dave
 
L

Larry Bates

You are correct if you remove items from a list that you
are iterating over you get the results you see. You either
need to iterate over the list in reverse order or better yet
create a new list from the old one with the items removed.

In Python 2.4 you can do:

data = [ 'First', 'Second DEL', 'Third', 'Fourth',
'Fifth DEL', 'DEL Sixth', 'Seventh DEL', 'Eighth DEL',
'Ninth DEL', 'Tenth', 'Eleventh', 'Twelfth']

bfr=[x for x in data if 'DEL' not in x]

Note: I'm not sure why 'DEL' is at the end of all the strings
EXCEPT the Sixth one where it is at the beginning, but if the
"DEL" was consistently at the end of the strings I would do
this instead (which would also work in Python 2.2+):

bfr=[x for x in data if not x.endswith('DEL')]

This also is much easier to read (IMHO) than the loops, etc.
of the original code. In Python 2.4 list comprehensions are
also quite efficient.

Larry Bates

Dave said:
OK, first, I don't often have the time to read this group, so
apologies if this is a FAQ, though I couldn't find anything at
python.org.

Second, this isn't my code. I wouldn't do this. But a colleague did,
got an unexpected result, and asked me why. I think I can infer what
is occurring, and I was able to find a simple work-around. But I
thought I'd ask about it anyway.

I've been pushing Python at work for use as a scripting language to
get simple tasks done. My colleague is modifying a parts library for
a schematic capture program with a Python script. The library is
entirely in ASCII text.

He's trying to delete a group of parts from the library. The simple
code below shows what is occurring (taken from an IDLE session, Python
2.4.1#65 for Windoze). The "parts" to be deleted in the example
contain the string 'DEL':

---begin included file---
data = [ 'First', 'Second DEL', 'Third', 'Fourth',

'Fifth DEL', 'DEL Sixth', 'Seventh DEL', 'Eighth DEL',
'Ninth DEL', 'Tenth', 'Eleventh', 'Twelfth']

bfr = []
for item in data:

if item.find('DEL') >= 0:
bfr.append(item)
data.remove(item)



['First', 'Third', 'Fourth', 'DEL Sixth', 'Eighth DEL', 'Tenth',
'Eleventh', 'Twelfth']

['Second DEL', 'Fifth DEL', 'Seventh DEL', 'Ninth DEL']
---end included file---

It seems when an item is 'remove'd from data, the rest of the list
'shifts' over one, so what was 'next' now occupies the slot of the
'remove'd item. When the next 'item' is selected from 'data', the
_desired_ 'next' item is skipped. So when 'data' has several
consecutive items to be deleted, only every other one is 'remove'd.

The workaround is very simple:

---begin included file---

if item.find('DEL') >= 0:
bfr.append(item)


data.remove(item)

['First', 'Third', 'Fourth', 'Tenth', 'Eleventh', 'Twelfth']

['Second DEL', 'Fifth DEL', 'DEL Sixth', 'Seventh DEL', 'Eighth DEL',
'Ninth DEL']
---end included file---

Again, iterating over an item that is mutating seems like a Bad
Idea(tm) to me. But I was curious: is this the intended behavior, or
does this fall under what C programmers would call 'undefined
behavior.'

Thanks,

-=Dave
 
S

Scott David Daniels

Dave said:
...
data = [ 'First', 'Second DEL', 'Third', 'Fourth',
'Fifth DEL', 'DEL Sixth', 'Seventh DEL', 'Eighth DEL',
'Ninth DEL', 'Tenth', 'Eleventh', 'Twelfth']
bfr = []
for item in data:
if item.find('DEL') >= 0:
bfr.append(item)
data.remove(item)

assuming Python 2.4:

kept = []
for position, text in reversed(list(enumerate(data))):
if 'DEL' in text:
removed.append(text)
else:
del data[position]

This uses reversed so positions remain valid as the list is pruned.

--Scott David (e-mail address removed)
(e-mail address removed)
 
R

Raymond Hettinger

[Dave Hansen]
It seems when an item is 'remove'd from data, the rest of the list
'shifts' over one, so what was 'next' now occupies the slot of the
'remove'd item. When the next 'item' is selected from 'data', the
_desired_ 'next' item is skipped. So when 'data' has several
consecutive items to be deleted, only every other one is 'remove'd.

Your analysis is dead-on. Instead of being the "desired" item, the
iterator fetches a[0], a[1], a[2], ... as determined by the state of
the list when the index is retrieved.


Again, iterating over an item that is mutating seems like a Bad
Idea(tm) to me.
Right.


But I was curious: is this the intended behavior, or
does this fall under what C programmers would call 'undefined
behavior.'

It is intended. The behavior is defined as above (a series of indexed
lookups).

BTW, the usual ways to deal with this are to:
1. iterating from the right using: for item in reversed(data)
2. making a copy of the original list before iterating
3. creating a new result list: data = [x for x in data if 'DEL' not in
x]


Raymond
 
F

Fredrik Lundh

Dave said:
Again, iterating over an item that is mutating seems like a Bad
Idea(tm) to me. But I was curious: is this the intended behavior, or
does this fall under what C programmers would call 'undefined
behavior.'

a C programmer wouldn't have much problem with this, of course,
since "for in" behaves like:

for (tmp = 0; var = getitem(seq, tmp); tmp++) {
...
}

</F>
 
B

bruno modulix

Dave Hansen wrote:
(snip code snippets and sensible explanations)
Again, iterating over an item that is mutating seems like a Bad
Idea(tm) to me.

It as *always* been a bad idea to modify a list in place (I mean adding
or removing items) while iterating over it, whatever the language. If
you *really* need to do such a thing (for effeciency reasons - and then
it's usually in low-level C code), you'd better use indexed access, and
adjust the index as needed - but this results in tricky, hard to
maintain code.
But I was curious: is this the intended behavior, or
does this fall under what C programmers would call 'undefined
behavior.'

Not being a Language Lawyer(tm), I can't tell for sure, but I'd think
it's the expected behavior. Anyway it's not a behavior I'd relie upon,
since it would be too much of a dirty trick anyway.

My 2 cents
 
D

Dave Hansen

OK, first, I don't often have the time to read this group, so
apologies if this is a FAQ, though I couldn't find anything at
python.org.

Thanks to everyone who responded. All is clear now. And I know I
need to look deeper into list comprehensions...

Regards,

-=Dave
 
B

Bengt Richter

Dave Hansen wrote:
(snip code snippets and sensible explanations)


It as *always* been a bad idea to modify a list in place (I mean adding
or removing items) while iterating over it, whatever the language. If
you *really* need to do such a thing (for effeciency reasons - and then
it's usually in low-level C code), you'd better use indexed access, and
adjust the index as needed - but this results in tricky, hard to
maintain code.
Do you consider this to be tricky, or likely to be hard to maintain?
(not specifically tested beyond what you see ;-)
>>> data = [ 'First', 'Second DEL', 'Third', 'Fourth',
... 'Fifth DEL', 'DEL Sixth', 'Seventh DEL', 'Eighth DEL',
... 'Ninth DEL', 'Tenth', 'Eleventh', 'Twelfth'] ... if 'DEL' in item: continue
... data[iw] = item
... iw += 1
...
['First', 'Third', 'Fourth', 'Tenth', 'Eleventh', 'Twelfth']

Or that maybe a utility function like the following would be hard to maintain?
... iw = 0
... for item in L:
... if keep(item):
... L[iw] = item
... iw += 1
... del L[iw:]
...
>>> data = [ 'First', 'Second DEL', 'Third', 'Fourth',
... 'Fifth DEL', 'DEL Sixth', 'Seventh DEL', 'Eighth DEL',
... 'Ninth DEL', 'Tenth', 'Eleventh', 'Twelfth'] ['First', 'Second DEL', 'Third', 'Fourth', 'Fifth DEL', 'DEL Sixth', 'Seventh DEL', 'Eighth DEL'
, 'Ninth DEL', 'Tenth', 'Eleventh', 'Twelfth'] ['First', 'Third', 'Fourth', 'Tenth', 'Eleventh', 'Twelfth']

Not being a Language Lawyer(tm), I can't tell for sure, but I'd think
it's the expected behavior. Anyway it's not a behavior I'd relie upon,
since it would be too much of a dirty trick anyway.
I think it depends on the kind of mutation involved. A mutation of the _structure_
of a container being iterated over effectively modifies the initial parameters of
the iteration control, so there you need either to have intimate knowledge of how
the iterator does its thing, or you have to insulate yourself from that need.
OTOH, a mutation of the content can be safe, if it doesn't modify the content
yet to be accessed during the iteration.

In the case of a list, the order can be depended on, so it's not that hard.
Other iterable containers are not so clear, and of course even in a list you
could have cases of access to early items having side effects on upcoming items,
e.g., if the "keep" evaluation had side effects on deep data shared between early
and later list items. But that would be nasty any which way ;-)

Regards,
Bengt Richter
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,982
Messages
2,570,186
Members
46,739
Latest member
Clint8040

Latest Threads

Top