can't delete from a dictionary in a loop

D

Dan Upton

This might be more information than necessary, but it's the best way I
can think of to describe the question without being too vague.

The task:

I have a list of processes (well, strings to execute said processes)
and I want to, roughly, keep some number N running at a time. If one
terminates, I want to start the next one in the list, or otherwise,
just wait.

The attempted solution:

Using subprocess, I Popen the next executable in the list, and store
it in a dictionary, with keyed on the pid:
(outside the loop)
procs_dict={}

(inside a while loop)
process = Popen(benchmark_exstring[num_started], shell=true)
procs_dict[process.pid]=process

Then I sleep for a while, then loop through the dictionary to see
what's terminated. For each one that has terminated, I decrement a
counter so I know how many to start next time, and then try to remove
the record from the dictionary (since there's no reason to keep
polling it since I know it's terminated). Roughly:

for pid in procs_dict:
if procs_dict[pid].poll() != None
# do the counter updates
del procs_dict[pid]

The problem:

RuntimeError: dictionary changed size during iteration

So, the question is: is there a way around this? I know that I can
just /not/ delete from the dictionary and keep polling each time
around, but that seems sloppy and like it could keep lots of memory
around that I don't need, since presumably the dictionary holding a
reference to the Popen object means the garbage collector could never
reclaim it. Is the only reasonable solution to do something like
append all of those pids to a list, and then after I've iterated over
the dictionary, iterate over the list of pids to delete?

(Also, from the implementation side, is there a reason the dictionary
iterator can't deal with that? If I was deleting from in front of the
iterator, maybe, but since I'm deleting from behind it...)
 
H

Hans Nowak

Dan said:
for pid in procs_dict:
if procs_dict[pid].poll() != None
# do the counter updates
del procs_dict[pid]

The problem:

RuntimeError: dictionary changed size during iteration

I don't know if the setup with the pids in a dictionary is the best way to
manage a pool of processes... I'll leave it others, presumably more
knowledgable, to comment on that. :) But I can tell you how to solve the
immediate problem:

for pid in procs_dict.keys():
...

Hope this helps!

--Hans
 
B

bruno.desthuilliers

Dan said:
for pid in procs_dict:
if procs_dict[pid].poll() != None
# do the counter updates
del procs_dict[pid]
The problem:
RuntimeError: dictionary changed size during iteration

I don't know if the setup with the pids in a dictionary is the best way to
manage a pool of processes... I'll leave it others, presumably more
knowledgable, to comment on that. :) But I can tell you how to solve the
immediate problem:

for pid in procs_dict.keys():

I'm afraid this will do the same exact thing. A for loop on a dict
iterates over the dict keys, so both statements are strictly
equivalent from a practical POV.
 
B

bruno.desthuilliers

I'm afraid this will do the same exact thing. A for loop on a dict
iterates over the dict keys, so both statements are strictly
equivalent from a practical POV.

Hem. Forget it. I should think twice before posting - this will
obviously make a big difference here. Sorry for the noise.
 
G

Gary Herron

Dan said:
for pid in procs_dict:
if procs_dict[pid].poll() != None
# do the counter updates
del procs_dict[pid]

The problem:

RuntimeError: dictionary changed size during iteration
I don't know if the setup with the pids in a dictionary is the best way to
manage a pool of processes... I'll leave it others, presumably more
knowledgable, to comment on that. :) But I can tell you how to solve the
immediate problem:

for pid in procs_dict.keys():

No, keys() produces a list (which is what is wanted here).

It's iterkeys() that produces an iterator which would reproduce the OP's
problem.

And then, in Python3, keys() produces something else altogether (call a
view of the dictionary) which would provoke the same problem, so yet
another solution would have to be found then.

Gary Herron
 
H

Hans Nowak

Hem. Forget it. I should think twice before posting - this will
obviously make a big difference here. Sorry for the noise.

:) It appears that you would be right if this was Python 3.0, though:

Python 3.0a5 (r30a5:62856, May 16 2008, 11:43:33)
[GCC 4.0.1 (Apple Computer, Inc. build 5367)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> d = {1: 2, 3: 4, 5: 6}
>>> for i in d.keys(): del d

....
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
RuntimeError: dictionary changed size during iteration

Maybe 'for i in d' and 'for i in d.keys()' *are* functionally equivalent in 3.0,
as d.keys() returns an object that iterates over d's keys... but I haven't read
enough about it yet to be sure. In any case, the problem goes away when we
force a list:
>>> d = {1: 2, 3: 4, 5: 6}
>>> for i in list(d.keys()): del d ....
>>> d

{}

--Hans
 
C

castironpi

Hem. Forget it. I should think twice before posting - this will
obviously make a big difference here.  Sorry for the noise.

:)  It appears that you would be right if this was Python 3.0, though:

Python 3.0a5 (r30a5:62856, May 16 2008, 11:43:33)
[GCC 4.0.1 (Apple Computer, Inc. build 5367)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
 >>> d = {1: 2, 3: 4, 5: 6}
 >>> for i in d.keys(): del d
...
Traceback (most recent call last):
   File "<stdin>", line 1, in <module>
RuntimeError: dictionary changed size during iteration

Maybe 'for i in d' and 'for i in d.keys()' *are* functionally equivalent in 3.0,
as d.keys() returns an object that iterates over d's keys... but I haven't read
enough about it yet to be sure.  In any case, the problem goes away when we
force a list:

 >>> d = {1: 2, 3: 4, 5: 6}
 >>> for i in list(d.keys()): del d
...
 >>> d
{}

--Hans- Hide quoted text -

- Show quoted text -


You may be searching for:

for i in d.keys()[:]:
del d[ i ]
 
E

Eduardo O. Padoan

Dan Upton wrote:


for pid in procs_dict:
if procs_dict[pid].poll() != None
# do the counter updates
del procs_dict[pid]
The problem:
RuntimeError: dictionary changed size during iteration


I don't know if the setup with the pids in a dictionary is the best way
to
manage a pool of processes... I'll leave it others, presumably more
knowledgable, to comment on that. :) But I can tell you how to solve
the
immediate problem:

for pid in procs_dict.keys():

No, keys() produces a list (which is what is wanted here).
It's iterkeys() that produces an iterator which would reproduce the OP's
problem.

And then, in Python3, keys() produces something else altogether (call a view
of the dictionary) which would provoke the same problem, so yet another
solution would have to be found then.

In Python 3.0, list(procs_dict.keys()) would have the same effect.
 
M

MRAB

This might be more information than necessary, but it's the best way I
can think of to describe the question without being too vague.

The task:

I have a list of processes (well, strings to execute said processes)
and I want to, roughly, keep some number N running at a time. If one
terminates, I want to start the next one in the list, or otherwise,
just wait.

The attempted solution:

Using subprocess, I Popen the next executable in the list, and store
it in a dictionary, with keyed on the pid:
(outside the loop)
procs_dict={}

(inside a while loop)
process = Popen(benchmark_exstring[num_started], shell=true)
procs_dict[process.pid]=process

Then I sleep for a while, then loop through the dictionary to see
what's terminated. For each one that has terminated, I decrement a
counter so I know how many to start next time, and then try to remove
the record from the dictionary (since there's no reason to keep
polling it since I know it's terminated). Roughly:

for pid in procs_dict:
if procs_dict[pid].poll() != None
# do the counter updates
del procs_dict[pid]

The problem:

RuntimeError: dictionary changed size during iteration

So, the question is: is there a way around this? I know that I can
just /not/ delete from the dictionary and keep polling each time
around, but that seems sloppy and like it could keep lots of memory
around that I don't need, since presumably the dictionary holding a
reference to the Popen object means the garbage collector could never
reclaim it. Is the only reasonable solution to do something like
append all of those pids to a list, and then after I've iterated over
the dictionary, iterate over the list of pids to delete?

(Also, from the implementation side, is there a reason the dictionary
iterator can't deal with that? If I was deleting from in front of the
iterator, maybe, but since I'm deleting from behind it...)
Why do you need a counter? len(procs_dict) will tell you how many are
in the dictionary.

You can rebuild the dictionary, excluding those that are no longer
active, with:

procs_dict = dict((id, process) for id, process in
procs_dict.iteritems() if process.poll() != None)

and then start N - len(procs_dict) new processes.
 
G

George Sakkis

This might be more information than necessary, but it's the best way I
can think of to describe the question without being too vague.

The task:

I have a list of processes (well, strings to execute said processes)
and I want to, roughly, keep some number N running at a time.  If one
terminates, I want to start the next one in the list, or otherwise,
just wait.

The attempted solution:

Using subprocess, I Popen the next executable in the list, and store
it in a dictionary, with keyed on the pid:
(outside the loop)
procs_dict={}

(inside a while loop)
process = Popen(benchmark_exstring[num_started], shell=true)
procs_dict[process.pid]=process

Then I sleep for a while, then loop through the dictionary to see
what's terminated.  For each one that has terminated, I decrement a
counter so I know how many to start next time, and then try to remove
the record from the dictionary (since there's no reason to keep
polling it since I know it's terminated).  Roughly:

for pid in procs_dict:
  if procs_dict[pid].poll() != None
   # do the counter updates
   del procs_dict[pid]

Since you don't look up processes by pid, you don't need a dictionary
here. A cleaner and efficient solution is use a deque to pop processes
from one end and push them to the other if still alive, something like
this:

from collections import deque

processes = deque()
# start processes and put them in the queue

while processes:
for i in xrange(len(processes)):
p = processes.pop()
if p.poll() is None: # not finished yet
processes.append_left(p)
time.sleep(5)


HTH,
George
 
C

castironpi

This might be more information than necessary, but it's the best way I
can think of to describe the question without being too vague.
The task:
I have a list of processes (well, strings to execute said processes)
and I want to, roughly, keep some number N running at a time.  If one
terminates, I want to start the next one in the list, or otherwise,
just wait.
The attempted solution:
Using subprocess, I Popen the next executable in the list, and store
it in a dictionary, with keyed on the pid:
(outside the loop)
procs_dict={}
(inside a while loop)
process = Popen(benchmark_exstring[num_started], shell=true)
procs_dict[process.pid]=process
Then I sleep for a while, then loop through the dictionary to see
what's terminated.  For each one that has terminated, I decrement a
counter so I know how many to start next time, and then try to remove
the record from the dictionary (since there's no reason to keep
polling it since I know it's terminated).  Roughly:
for pid in procs_dict:
  if procs_dict[pid].poll() != None
   # do the counter updates
   del procs_dict[pid]

Since you don't look up processes by pid, you don't need a dictionary
here. A cleaner and efficient solution is use a deque to pop processes
from one end and push them to the other if still alive, something like
this:

from collections import deque

processes = deque()
# start processes and put them in the queue

while processes:
    for i in xrange(len(processes)):
        p = processes.pop()
        if p.poll() is None: # not finished yet
            processes.append_left(p)
    time.sleep(5)

HTH,
George- Hide quoted text -

- Show quoted text -

No underscore in appendleft.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,982
Messages
2,570,185
Members
46,738
Latest member
JinaMacvit

Latest Threads

Top