"RuntimeError: dictionary changed size during iteration" ; Good atomiccopy operations?

R

robert

In very rare cases a program crashes (hard to reproduce) :

* several threads work on an object tree with dict's etc. in it. Items
are added, deleted, iteration over .keys() ... ). The threads are "good"
in such terms, that this core data structure is changed only by atomic
operations, so that the data structure is always consistent regarding
the application. Only the change-operations on the dicts and lists
itself seem to cause problems on a Python level ..

* one thread periodically pickle-dumps the tree to a file:
"RuntimeError: dictionary changed size during iteration" is raised by
..dump ( or a similar "..list changed ..." )

What can I do about this to get a stable pickle-dump without risiking
execution error or even worse - errors in the pickled file ?

Is a copy.deepcopy ( -> "cPickle.dump(copy.deepcopy(obj),f)" ) an
atomic opertion with a guarantee to not fail?

Or can I only retry several times in case of RuntimeError? (which would
apears to me as odd gambling; retry how often?)

Robert


PS: Zope dumps thread exposed data structes regularly. How does the ZODB
in Zope handle dict/list changes during its pickling operations?
 
R

robert

Is a copy.deepcopy ( -> "cPickle.dump(copy.deepcopy(obj),f)" ) an
atomic opertion with a guarantee to not fail?

Or can I only retry several times in case of RuntimeError? (which would
apears to me as odd gambling; retry how often?)

For an intermediate solution, I'm playing roulette:

for i in 1,2,3:
try:
cPickle.dump(obj, f)
break
except RuntimeError,v:
pass


I hope this works for some million years ...
 
R

robert

robert said:
For an intermediate solution, I'm playing roulette:

for i in 1,2,3:
try:
cPickle.dump(obj, f)
break
except RuntimeError,v:
pass

hmm..

for i in 1,2,3:
try:
cPickle.dump(obj, f)
break
except RuntimeError,v:
f.seek(0);f.truncate(0)


Meanwhile I think this is a bug of cPickle.dump: It should use .keys()
instead of free iteration internally, when pickling elementary dicts.
I'd file a bug if no objection.

Robert
 
F

Felipe Almeida Lessa

Em Sáb, 2006-03-11 às 12:49 +0100, robert escreveu:
Meanwhile I think this is a bug of cPickle.dump: It should use .keys()
instead of free iteration internally, when pickling elementary dicts.
I'd file a bug if no objection.

AFAICS, it's a problem with your code. You should lock your object while
using it. That's what Threading.Lock is supposed to work for. If you
want to use threads, you have to know in what parts of your code there
should be locks.

Cya,
Felipe.

--
"Quem excele em empregar a força militar subjulga os exércitos dos
outros povos sem travar batalha, toma cidades fortificadas dos outros
povos sem as atacar e destrói os estados dos outros povos sem lutas
prolongadas. Deve lutar sob o Céu com o propósito primordial da
'preservação'. Desse modo suas armas não se embotarão, e os ganhos
poderão ser preservados. Essa é a estratégia para planejar ofensivas."

-- Sun Tzu, em "A arte da guerra"
 
E

EleSSaR^

robert si è profuso/a a scrivere su comp.lang.python tutte queste
elucubrazioni:

[cut]

I don't know what's your code like, but a similar error occurred in some of
my software and it was my fault indeed. I think you should either use a
lock, or implement a deepcopy method of your own.
 
R

robert

Felipe said:
Em Sáb, 2006-03-11 às 12:49 +0100, robert escreveu:



AFAICS, it's a problem with your code. You should lock your object while
using it. That's what Threading.Lock is supposed to work for. If you
want to use threads, you have to know in what parts of your code there
should be locks.

99.99% no. I would have to use a lock everywhere, where I add or remove
something into a dict or list of the struct. Thats not the purpose of
big thread locks. Such simple operations are already atomic by the
definition of Python - and thanks to the global interpreter lock.
(Otherwise I would leave the Python language, God beware ... :) )

I'm of course aware, where to use locks for resons of the application.
But this is an issue on Python level. And it can be solved gracly and
simple in Python - I guess:

If cPickle.dump (and maybe also copy/deepcopy?) is corrected to work
atomic on dicts (use .keys()) and list-copies or locks python threads)
the problem is solved gracely and generally.

Robert
 
R

robert

EleSSaR^ said:
robert si è profuso/a a scrivere su comp.lang.python tutte queste
elucubrazioni:

[cut]

I don't know what's your code like, but a similar error occurred in some of
my software and it was my fault indeed. I think you should either use a
lock, or implement a deepcopy method of your own.

100s of locks? no (see other message). It should be

own deepcopy: thus, do you already know if the existing deepcopy has the
same problem as cPickle.dump ? (as the problem araises rarely, it is
difficult for me to test it out)

Robert

PS: how does ZODB work with this kind of problem? I thought is uses cPickle?
 
A

Alex Martelli

robert said:
99.99% no. I would have to use a lock everywhere, where I add or remove
something into a dict or list of the struct. Thats not the purpose of
big thread locks. Such simple operations are already atomic by the
definition of Python - and thanks to the global interpreter lock.
(Otherwise I would leave the Python language, God beware ... :) )

You have misread the Python Language Reference -- if you can give the
URL on which you have read any such promise of atomicity, I will be glad
to fix the docs to make that unambiguous.

There is no such promise (there may be implementation accidents in some
specific implementation which happen to make some operation atomic, but
NO guarantee even there that the next bugfix won't break that).

Farwell and best of luck in finding other languages which support
threads in a way that is more to your liking than Python -- maybe Ruby
suits you, I don't know for sure though.


Alex
 
E

EleSSaR^

robert si è profuso/a a scrivere su comp.lang.python tutte queste
elucubrazioni:
own deepcopy: thus, do you already know if the existing deepcopy has the
same problem as cPickle.dump ? (as the problem araises rarely, it is
difficult for me to test it out)

I don't know the exact specs of your object, and I don't know what
operations are you performing on that object, nor the way they're atomic.

It seems like you're trying to save periodically the state of such object
while it is being modified (a sort of backup?), and Python complains about
that. A self-implemented deepcopy might raise anomalies (i.e. your dumped
object may be partly a 'before' object and partly an 'after' object ) as
well.

By the way, you could try employing locks from other threads to dump the
object as well... this would prevent additional locking.
PS: how does ZODB work with this kind of problem? I thought is uses cPickle?

I have no idea about this.
 
E

EleSSaR^

robert si è profuso/a a scrivere su comp.lang.python tutte queste
elucubrazioni:

[cut]

P.S.
I'm very bad at threaded programming. Please verify any of my suggestions
^_^
 
T

Tim Peters

[robert]
...
PS: how does ZODB work with this kind of problem? I thought is uses cPickle?

It does. Each thread in a ZODB application typically uses its own
connection to a database. As a result, each thread gets its own
consistent view of database objects, which can (and routinely does)
vary across threads. No app-level synchronization is necessary
because no sharing of in-memory objects occurs. When N threads each
load a single persistent object from its own connection, N distinct
in-memory revisions of that object are created (one per connection ==
one per thread). If more than one thread modifies the same persistent
object, the first thread to commit its changes "wins", and later
threads that try to commit changes to the same object may suffer a
ConflictError exception at commit time. Between transaction
boundaries, each thread has an independent view of database state.
Pragmatically, it's much more like programming with multiple processes
than with multiple threads.
 
R

robert

Alex said:
You have misread the Python Language Reference -- if you can give the
URL on which you have read any such promise of atomicity, I will be glad
to fix the docs to make that unambiguous.

There is no such promise (there may be implementation accidents in some
specific implementation which happen to make some operation atomic, but
NO guarantee even there that the next bugfix won't break that).

What? When I add/del an item to a dict or list, this is not an atomic
thread-safe operation?
E.g.:
One thread does things like d['x']='y'
Another thread reads d['z'] or sets d['z']='w' or dels something.

If those operations are not atomic, then you'd have to use locks all the
time to not get RuntimeErrors and worse !?

Infact I rely on that all the time and standard python modules also do
so AFAIK

The only problem I know, is that on iteration over dicts/lists you get
this type of error and this is understandable. But usually one solves
this situations with .keys().

I think cPickle has not necessarily to iterate free over native dicts.
Whats does copy/deepcopy/[:] ?
Farwell and best of luck in finding other languages which support
threads in a way that is more to your liking than Python -- maybe Ruby
suits you, I don't know for sure though.

I looked several times on Ruby, but stay with Python. Ruby is featured,
but ill designed.

* Ruby code is very very ugly @!{}&%$||endendend ..... egyptology.
Nearly back to Perl.

* try to translate this into Ruby:

def f(): return 1
def g(x): return x()
g(f)

=> Then you'll receive a doctor hat about the OO paradigm and the famous
"Ruby way". But you'll know, why functional programming is a stronger
religion. Translating OO to Python, you'll often not even notice that
Python's OO is attached to funcs and dicts. OO is naturally attached!
The Ruby paradigm is more stilted.

* Ruby doesn't lead to disciplined code. So much names for loops and
everything => you are thinking and choosing 2x time and receive double
mud. With Python you write happy and choiceless - but have all and more
power.

* Ruby without refcounts provides no deterministic __del__ in
non-circular refs ==> your type finally finally finally .close .close
..close all the time

* Rubys module and class namespaces are rewriteable from everywhere
without any barriers. Thats mostly negative for serious apps. 'require'
is the same random as C's #include. You scribble and change here - a
bomb explodes in another module. That kills big projects. Modularization
and consistency of modular code is 3x better in Python with its local
module objects and other shielding stuff.

* Ruby threads are not real OS threads, but the Ruby interpreter itself
switches AFAIK. a Pro or Con regarding the requirements. The Python
method is more powerfull for bigger apps

* Ruby so far has no real (simple) generators, but in fact only block
callbacks (bad readable also). In Ruby there is no framework for delayed
execution - only a rudimentary error-prone 'callcc'. Thus they
don't/can't have real iterators. So they also don't know of these kind
of problems :). Python is more powerful in that, but things like
cPickle.dump and deepcopy should be writen with discipline to not break
code unnecessarily when Python evolves.

* Ruby code executes 2x..4x slower, (but startup of very small scripts
is 30% faster;)

* etc etc ...

Robert
 
F

Felipe Almeida Lessa

Em Sáb, 2006-03-11 às 23:44 +0100, robert escreveu:
I looked several times on Ruby, but stay with Python. Ruby is featured,
but ill designed.
[snip]

Oh noes! Another rant of Ruby vs. Python! *Please*, no flamewars!
 
R

robert

EleSSaR^ said:
robert si è profuso/a a scrivere su comp.lang.python tutte queste
elucubrazioni:



I don't know the exact specs of your object, and I don't know what
operations are you performing on that object, nor the way they're atomic.

There is not much to know. Python object trees consist only of dicts and
lists as far as variable non-atomic datastructures are concerned.
(unless you use advanced extension libs like NumPy)

Thus the RuntimeError problem is only about modified dicts/lists during
Iteration in pickly/copy.

It seems like you're trying to save periodically the state of such object
while it is being modified (a sort of backup?), and Python complains about
that. A self-implemented deepcopy might raise anomalies (i.e. your dumped
object may be partly a 'before' object and partly an 'after' object ) as
well.

Yes, a "backup" / autosave while all threads are running. It doesn't
matter if 'before' of 'after' another item has been added/deleted
atomically.

By the way, you could try employing locks from other threads to dump the
object as well... this would prevent additional locking.

Don't understand.
The threads work all simulatniously on the object tree, add and detach
atomically only valid sub-trees.

Regarding what AM said, I would have to lock _each_ dict/list operation
on the tree, thus almost each change, because even a single attribute
change "subobj.x='y'" is a dictionary operation. That would make
threaded programming very arduous.

AFAIK about the current Python implementation: This RuntimeError is only
thrown "during true Iteration over dict/list, when the number of items
changes". (and not when e.g. a single item is changed). Thus a

def rt_save_dict_copy()
tod={}
for k in fromd.keys():
try: tod[k]=fromd[k]
except: pass
return tod

without true iteration over the original dict whould copy without
RuntimeError.

(or maybe equivalent: "dict(fromd.items())" ? )

I don't know if dict.copy() works so - but I think so, as dict.keys()
and dict.items() have the same footprint.

The algorithm in cPickle.dump does not work so. Guess it does something
like "for k in fromd: ..."(!) internally. This might be a "90%-bug"?

Will have to see what copy/deepcopy does ...

Robert
 
R

robert

Tim said:
[robert]
...
PS: how does ZODB work with this kind of problem? I thought is uses cPickle?


It does. Each thread in a ZODB application typically uses its own
connection to a database. As a result, each thread gets its own
consistent view of database objects, which can (and routinely does)
vary across threads. No app-level synchronization is necessary
because no sharing of in-memory objects occurs. When N threads each
load a single persistent object from its own connection, N distinct
in-memory revisions of that object are created (one per connection ==
one per thread). If more than one thread modifies the same persistent
object, the first thread to commit its changes "wins", and later
threads that try to commit changes to the same object may suffer a
ConflictError exception at commit time. Between transaction
boundaries, each thread has an independent view of database state.
Pragmatically, it's much more like programming with multiple processes
than with multiple threads.

Thanks for that details.
So when committing objects with multithreaded changes on a complex
object into ZODB, it would raise the same a RuntimeError on altered
dicts/lists...

---

Looked up copy.py meanwhile:

copy and deepcopy use :

def _copy_dict(x):
return x.copy()
d[types.DictionaryType] = _copy_dict

.....
def _deepcopy_dict(x, memo):
y = {}
memo[id(x)] = y
for key, value in x.iteritems():
y[deepcopy(key, memo)] = deepcopy(value, memo)
return y
d[types.DictionaryType] = _deepcopy_dict


Thus deepcopy (but not copy) seems to also expose itself to this
RuntimeError as .iteritems() will iterate on the original dict!
( Would be maybe better to use x.items() here - as it was maybe before
py2.2 )

Its the same Problem as with cPickle.dump. Thus there seems to be no
RuntimeError-save possibility in the standard Python lib to get a
"current view" of an object tree in threaded applications.

Guess it would be more wise to not expose deepcopy, cPickle.dump etc. to
this kind of RuntimeError unnecessarily.
The speed gain of the iterator-method - if any - is minor, compared to
the app crash problems, which are not easy to discover and work-around
(because they happen rarely on fast computers).

Robert
 
R

robert

robert said:
Guess it would be more wise to not expose deepcopy, cPickle.dump etc. to
this kind of RuntimeError unnecessarily.
The speed gain of the iterator-method - if any - is minor, compared to
the app crash problems, which are not easy to discover and work-around
(because they happen rarely on fast computers).


searched the thread, threading module for a function for generally
locking/dislocking all other python threads from execution. Did not find
something like that.

(That would be very useful in some threading applications to protect
critical sections without forcing the whole application to be populated
with lock objects.
Of course, such function should be used with care (and "finally") - but
it should be there to make thread programming easier...)

Robert
 
A

Alex Martelli

robert said:
What? When I add/del an item to a dict or list, this is not an atomic
thread-safe operation?

Exactly: there is no such guarantee in the Python language.
E.g.:
One thread does things like d['x']='y'
Another thread reads d['z'] or sets d['z']='w' or dels something.

If those operations are not atomic, then you'd have to use locks all the
time to not get RuntimeErrors and worse !?

If you want to be writing correct Python, yes. A preferred approach is
to simply avoid sharing objects among threads, except for objects
designed to be thread-safe (chiefly Queue.Queue).
Infact I rely on that all the time and standard python modules also do
so AFAIK

You're relying on an accident of a specific, particular implementation;
if any Python-coded standard library module does likewise, and I'm not
aware of any, that's somewhat different (since that module is PART of
the implementation, it may rely on all kinds of implementation details,
correctly if maybe not wisely). The situation is quite different for
C-coded modules in the CPython implementation, Java-coded ones in the
Jython one, C#-coded one in the IronPython one; each of these is subject
to specific constraints that it's perfectly wise to rely on (since each
implementation, as the language specification fully allows it to do,
adopts a different locking strategy at these low levels).
I think cPickle has not necessarily to iterate free over native dicts.

It's not forced to by language specification, but neither is it
forbidden. It would be an absurd implementation strategy to waste time
and space to extract a dict's keys() first, as it would NOT buy
"atomicity" anyway -- what if some other thread deletes keys while
you're looping, or calls any nonatomic method on the very value you're
in the process of serializing?!

In some Python implementations, a C-coded module may count on some
atomicity as long as it doesn't explicitly allow other threads nor ever
call back into ANY python-coded part, but obviously cpickle cannot avoid
doing that, so even in those implementations it will never be atomic.
Whats does copy/deepcopy/[:] ?

Roughly the same situation.


If as you indicate you want to stick with a Python-like language but do
not want to change your style to make it correct Python, you could
perhaps fork the implementation into an "AtomicPython" in which you
somehow fix all nonatomicities (not sure how that would even be possible
if pickling, deepcopying or anything else ever needs to fork into Python
coded parts, but perhaps you might make the GIL into a reentrant lock
and somehow hack it to work, with some constraints). Or perhaps you
might be able to write an extension containing atomicset, atomicget,
atomicpickle, and other operations you feel you need to be atomic (again
barring the difficulties due to possible callbacks into Python) and use
those instead of bare Python primitives.


Alex
 
E

EleSSaR^

robert si è profuso/a a scrivere su comp.lang.python tutte queste
elucubrazioni:
Yes, a "backup" / autosave while all threads are running. It doesn't
matter if 'before' of 'after' another item has been added/deleted
atomically.

But it does matter if the autosave happens *while* an item is being
updated, I suppose. E.g. if a single 'atomic' operation would change two
dictionaries, and an autosave triggers after the first has been changed and
the second hasn't, this would be an unwanted autosave, right?
Don't understand.
The threads work all simulatniously on the object tree, add and detach
atomically only valid sub-trees.

You're never using any lock, then? Isn't it possible that two threads try
changing the very same dict/list at the same time? Just one more question:
are you running your software on a single-cpu machine?
change "subobj.x='y'" is a dictionary operation. That would make
threaded programming very arduous.

Well... threaded programming usually is a hard task. No surprise so many
people prefer async programming nowadays. It makes many things simpler.
def rt_save_dict_copy()
tod={}
for k in fromd.keys():
try: tod[k]=fromd[k]
except: pass
return tod

without true iteration over the original dict whould copy without
RuntimeError.

But with no warranty of data consistency. It will prevent new values to be
computed, but if one value from the dict is modified during iteration, the
dict may be left in a never-existed state:

import random
random.seed()
fromd = {1:1, 2:2, 3:3, 4:4, 5:5}

print "dict before iteration:", fromd
def rt_save_dict_copy():
tod={}
for k in fromd.keys():
try:
tod[k]=fromd[k]
except:
pass
fromd[random.choice(xrange(1,6))] = random.choice(xrange(1,10))
return tod

print "copied dict:", rt_save_dict_copy()
print "dict after copy:", fromd
 
R

robert

Alex said:
What? When I add/del an item to a dict or list, this is not an atomic
thread-safe operation?

Exactly: there is no such guarantee in the Python language.
E.g.:
One thread does things like d['x']='y'
Another thread reads d['z'] or sets d['z']='w' or dels something.

If those operations are not atomic, then you'd have to use locks all the
time to not get RuntimeErrors and worse !?

If you want to be writing correct Python, yes. A preferred approach is
to simply avoid sharing objects among threads, except for objects
designed to be thread-safe (chiefly Queue.Queue).

I don't know the Python language (non?-)definition about this. But the
day, that will be a requirement in the Python implementation, I'll put
the last Python in a safe :) ( ..and rethink my bad opinion about Ruby )

For example hundreds of things like sre._cache and tenthousands of
common global variables are shared "thread safe" in the standard lib
whithout locks.

;-) They never will change any Python implementation and do the work to
put millions of lock.acquire()'s into the standard lib...

You're relying on an accident of a specific, particular implementation;
if any Python-coded standard library module does likewise, and I'm not
aware of any, that's somewhat different (since that module is PART of
the implementation, it may rely on all kinds of implementation details,
correctly if maybe not wisely). The situation is quite different for
C-coded modules in the CPython implementation, Java-coded ones in the
Jython one, C#-coded one in the IronPython one; each of these is subject
to specific constraints that it's perfectly wise to rely on (since each
implementation, as the language specification fully allows it to do,
adopts a different locking strategy at these low levels).

the other implementations whould also have a hard time to rewrite the
standard lib.
Python byte code is kind of "defined" and always interpreted similar and..
.... print d['a']
.... print d.keys()
.... d['b']=2
.... 2 0 LOAD_GLOBAL 0 (d)
3 LOAD_CONST 1 ('a')
6 BINARY_SUBSCR
7 PRINT_ITEM
8 PRINT_NEWLINE

3 9 LOAD_GLOBAL 0 (d)
12 LOAD_ATTR 1 (keys)
15 CALL_FUNCTION 0
18 PRINT_ITEM
19 PRINT_NEWLINE

4 20 LOAD_CONST 2 (2)
23 LOAD_GLOBAL 0 (d)
26 LOAD_CONST 3 ('b')
29 STORE_SUBSCR
30 LOAD_CONST 0 (None)
33 RETURN_VALUE

...things like LOAD_CONST / STORE_SUBSCR will be atomic as long as there
is a GIL or at least a GIL during execution of one byte code. No
threaded script language can reasonably afford to have thread-switching
open down to native microprocessor execution atoms.

It's not forced to by language specification, but neither is it
forbidden. It would be an absurd implementation strategy to waste time
and space to extract a dict's keys() first, as it would NOT buy
"atomicity" anyway -- what if some other thread deletes keys while
you're looping, or calls any nonatomic method on the very value you're
in the process of serializing?!

First I look at the practical requirement: I have the threads and the
fast object tree and the need to autosave in this process. And don't
want to lock everywhere any dict-access to the tree just because of the
autosave (for app-reasons I need only a handful of locks so far). And I
don't want to use a slow & overkill method like ZODB.

One clean method (A) in order to stay practical would be possible if a
global lock for Python treading would be offered by the thread-lib as
described in <[email protected]>

The other method (B), (that I believed to use so far) is to have a
little discipline, but no need for massive trivial locking:
* add complex objects to the hot tree only if the objects are
complete/consistent.
* don't do nasty things on removed complex objects; just forget them as
usual
* most attribute changes are already app-atomic; the overall majority of
operations is read-only access anyway - both regarding the number of
executions and the amount of source code (that is 99% the reason for
using this "disciplined" method in threaded apps)
* ..in case of non-atomic attribute changes, create a new appropriate
top object with compound changes (same logic as in fact you have, when
you use ZODB) and replace the whole top object in one step. Or use locks
in rare cases.

Now a practical "autosave" method in my app could fit to such practical
disciplined method ( or not :-( ).

And there is reason, why the Python standard lib should at least offer
one "disciplined" method to sample-copy such object tree, despite of
threads. There is no need get an "arbitray-Python-atomic" copy of the
whole tree (which would require (A)). The app itself respects the
discipline for "app-atomicity" if I use things in this way.

Its a true practical standard requirement: Just a method with no
"RuntimeError".

Thus it is useful to have a deepcopy and/or cPickle.dump which does not
break.

Is current dict.copy() exposed to this Runtime Error? AFAIK: not.
In that case copy.copy() respects "the disciplin", but deepcopy & dump
not, because of use of .iteritems(). And most probably it worked before
py2.2 as I got no such errors with that app in those times.

The argument for the cost of dict.keys() - I guess it does nearly not
even pay off in terms of speed. It could better stay disciplined in
those few critical locations like in dump and deepcopy which are
supposed to double much more and expensive things anyway. Its a weak
O(1) issue.

So far, its simple: I'd have to write my own dump or deepcopy after
Python changes, because of new "fun" with RuntimeErrors !

At least an alternate Runtime-Save version of deepcopy/dump would fit
into a practical Python, if the defaults cannot be keept flat.

In some Python implementations, a C-coded module may count on some
atomicity as long as it doesn't explicitly allow other threads nor ever
call back into ANY python-coded part, but obviously cpickle cannot avoid
doing that, so even in those implementations it will never be atomic.

The degree of atomicity defines the degree of usability for programming
ideas. And that should not be lowered as it makes thread programming in
VHL script languages so practical, when you just can do most things like
d['a']='b' without thread worries.

There is no theoretical treshold in a practical (=connected) world:

Each OS and ASM/C level relies on CPU-time- & memory-atoms. In fact,
otherwise, without such atoms, digital "platonic" computers could not
interfer with the "threaded" reality at all. That is a
_natural_requirement_ of "ideas" to _definitely_ be "atomic".
The CPU bit x clocktick is defined now by ~1.8eV/kT in the real world =>
no computing error within 10^20 years.

( Only something like biological neuro-brains or analogical computers
can offer more of "free threading". But believe me, even the neuro- and
quantum-threading respects the Plank-h-quantum: the time-resolution of
"thread-interaction" is limited by energy restrictions and light speed )

If Python really has not yet defined its time-atoms, that should go on
the To-Do list ASAP. At worst, its atoms are that of ASM - I hope its
better...

Whats does copy/deepcopy/[:] ?

Roughly the same situation.

If as you indicate you want to stick with a Python-like language but do
not want to change your style to make it correct Python, you could
perhaps fork the implementation into an "AtomicPython" in which you
somehow fix all nonatomicities (not sure how that would even be possible
if pickling, deepcopying or anything else ever needs to fork into Python
coded parts, but perhaps you might make the GIL into a reentrant lock
and somehow hack it to work, with some constraints). Or perhaps you
might be able to write an extension containing atomicset, atomicget,
atomicpickle, and other operations you feel you need to be atomic (again
barring the difficulties due to possible callbacks into Python) and use
those instead of bare Python primitives.

Each Python _is_ an AtomicPython. And that sets its VHL value. Maybe, it
just doesn't know itself sofar?

I asked maybe for a _big_ practical atom, and now have to make it
myself, because Python became smaller :)

The practical problem is: You must rely on and know deeply the real
atoms of Python and the lib in order to know how much atoms/locks you
have to ensure on your own.
At a certain level, Python would loose its value. This
disciplined-threading/dump/deepcopy issue here is of course somewhat
discussable on the upper region of VHL. (I certainly like that direction)

But, what you sketch here on the far other side about
take-care-about-all-dicts-in-future-Python-threads is the door to the
underworld and is certainly a plan for a Non-Python.

"take care for everything" is in fact an excuse for heading towards low
level in the future. (Ruby for example has in many ways more "greed"
towards the low level and to "aritrary fun" - too much for me - and
without offering more real power)

Python evolution was mostly ok and careful. But it should be taken care
that not too much negative ghosts break VHL atoms:

One such ghost went into _deepcopy_dict/dump (for little LL speed
issues). Other ghosts broke rexec ( a nice "Atom" which i miss very much
now ), ...

A function for locking python threading globally in the thread module
would be kind of a practical "hammer" to maintain VHL code in deliberate
key cases. (Similar as you do 'cli' in device driver code, but for a
python process.)
For example: "I know as programmer, if I can afford to lock app threads
during a small multi-attribute change or even during .deepcopy etc.; The
cost for this is less than the need to spread individual locks massively
over the whole treaded app."
For example socket.getaddrinfo does this unforeseeable for all
app-thread for minutes (!) in bad cases internally on OS-level anyway.

Python should name its Atoms.


Robert
 
R

Raymond Hettinger

[robert]
In very rare cases a program crashes (hard to reproduce) :

* several threads work on an object tree with dict's etc. in it. Items
are added, deleted, iteration over .keys() ... ). The threads are "good"
in such terms, that this core data structure is changed only by atomic
operations, so that the data structure is always consistent regarding
the application. Only the change-operations on the dicts and lists
itself seem to cause problems on a Python level ..

* one thread periodically pickle-dumps the tree to a file:

"RuntimeError: dictionary changed size during iteration" is raised by
.dump ( or a similar "..list changed ..." )

What can I do about this to get a stable pickle-dump without risiking
execution error or even worse - errors in the pickled file ?

Is a copy.deepcopy ( -> "cPickle.dump(copy.deepcopy(obj),f)" ) an
atomic opertion with a guarantee to not fail?

No. It is non-atomic.

It seems that your application design intrinsically incorporates a race
condition -- even if deepcopying and pickling were atomic, there would
be no guarantee whether the pickle dump occurs before or after another
thread modifies the structure. While that design smells of a rat, it
may be that your apps can accept a dump of any consistent state and
that possibly concurrent transactions may be randomly included or
excluded without affecting the result.

Python's traditional recommendation is to put all access to a resource
in one thread and to have other threads communicate their transaction
requests via the Queue module. Getting results back was either done
through other Queues or by passing data through a memory location
unique to each thread. The latter approach has become trivially simple
with the advent of Py2.4's thread-local variables.

Thinking about future directions for Python threading, I wonder if
there is a way to expose the GIL (or simply impose a temporary
moratorium on thread switches) so that it becomes easy to introduce
atomicity when needed:

gil.acquire(BLOCK=True)
try:
#do some transaction that needs to be atomic
finally:
gil.release()


Or can I only retry several times in case of RuntimeError? (which would
apears to me as odd gambling; retry how often?)

Since the app doesn't seem to care when the dump occurs, it might be
natural to put it in a while-loop that continuously retries until it
succeeds; however, you still run the risk that other threads may never
leave the object alone long enough to dump completely.


Raymond
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,982
Messages
2,570,186
Members
46,744
Latest member
CortneyMcK

Latest Threads

Top