AssertionError in pickle's memoize function

M

Michael Hohn

Hi,

under python 2.2, the pickle/unpickle sequence incorrectly restores
a larger data structure I have.

Under Python 2.3, these structures now give an explicit exception from
Pickle.memoize():
assert id(obj) not in self.memo

I'm shrinking the offending data structure down to find the problem
and provide an easily reproducible example,
but maybe someone on the list could tell me under what general
conditions this assertion is expected to fail.

Thanks,
Michael
 
T

Tim Peters

[Michael Hohn]
under python 2.2, the pickle/unpickle sequence incorrectly restores
a larger data structure I have.

Under Python 2.3, these structures now give an explicit exception from
Pickle.memoize():
assert id(obj) not in self.memo

I'm shrinking the offending data structure down to find the problem
and provide an easily reproducible example,
but maybe someone on the list could tell me under what general
conditions this assertion is expected to fail.

Assertions are never expected to fail, so "something impossible
happened" when they do fail.

See whether your Python has Lib/pickletools.py. There's an enormous
amount of info about pickles in that (for example, it will tell you
what "memo" means).

May help to try cPickle instead of pickle. Since they're distinct
implementations, they have different bugs. cPickle can be much faster
than pickle, but it's a lot easier to understand pickle.py.
 
M

Michael Hohn

Tim Peters said:
[Michael Hohn]
under python 2.2, the pickle/unpickle sequence incorrectly restores
a larger data structure I have.

Under Python 2.3, these structures now give an explicit exception from
Pickle.memoize():
assert id(obj) not in self.memo

I'm shrinking the offending data structure down to find the problem
and provide an easily reproducible example,
but maybe someone on the list could tell me under what general
conditions this assertion is expected to fail.

Assertions are never expected to fail, so "something impossible
happened" when they do fail.

See whether your Python has Lib/pickletools.py. There's an enormous
amount of info about pickles in that (for example, it will tell you
what "memo" means).

May help to try cPickle instead of pickle. Since they're distinct
implementations, they have different bugs. cPickle can be much faster
than pickle, but it's a lot easier to understand pickle.py.

Here is a code sample that shows the problem I ran into:

test.py:
=================================
import pickle

class aList(list):
def __init__(self, arg):
# w/o this call, pickle works...
list.__init__(self, arg)
pass

A = aList([1,2])
B = aList([A, 3])

the_data = {'a': A, 'b': B}
A._stored_by = the_data

pickle.dumps([the_data, B]) # ok
pickle.dumps([B, the_data]) # fails

=================================


Outputs under:

Python 2.3 (#1, Sep 13 2003, 00:49:11)
[GCC 3.3 20030304 (Apple Computer, Inc. build 1495)] on darwin


9 scarlet::~:0> python test.py
Traceback (most recent call last):
File "test.py", line 16, in ?
pickle.dumps([B, the_data]) # fails
File "/System/Library/Frameworks/Python.framework/Versions/2.3/lib/python2.3/p
ickle.py", line 1386, in dumps
Pickler(file, protocol, bin).dump(obj)
File "/System/Library/Frameworks/Python.framework/Versions/2.3/lib/python2.3/p
ickle.py", line 231, in dump
self.save(obj)
File "/System/Library/Frameworks/Python.framework/Versions/2.3/lib/python2.3/p
ickle.py", line 293, in save
f(self, obj) # Call unbound method with explicit self
File "/System/Library/Frameworks/Python.framework/Versions/2.3/lib/python2.3/p
ickle.py", line 614, in save_list
self._batch_appends(iter(obj))
File "/System/Library/Frameworks/Python.framework/Versions/2.3/lib/python2.3/p
ickle.py", line 629, in _batch_appends
save(x)
File "/System/Library/Frameworks/Python.framework/Versions/2.3/lib/python2.3/p
ickle.py", line 338, in save
self.save_reduce(obj=obj, *rv)
File "/System/Library/Frameworks/Python.framework/Versions/2.3/lib/python2.3/p
ickle.py", line 419, in save_reduce
self.memoize(obj)
File "/System/Library/Frameworks/Python.framework/Versions/2.3/lib/python2.3/p
ickle.py", line 251, in memoize
assert id(obj) not in self.memo
AssertionError

with the same problem under python on linux:

Python 2.3 (#1, Jul 31 2003, 14:19:24)
[GCC 2.96 20000731 (Red Hat Linux 7.3 2.96-113)] on linux2


Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "/usr/tmp/python-286703ll", line 1, in ?
pickle.dumps([B, the_data]) # fails
File "/usr/local_cci/Python-2.3/lib/python2.3/pickle.py", line 1386, in dumps
Pickler(file, protocol, bin).dump(obj)
File "/usr/local_cci/Python-2.3/lib/python2.3/pickle.py", line 231, in dump
self.save(obj)
File "/usr/local_cci/Python-2.3/lib/python2.3/pickle.py", line 293, in save
f(self, obj) # Call unbound method with explicit self
File "/usr/local_cci/Python-2.3/lib/python2.3/pickle.py", line 614, in save_list
self._batch_appends(iter(obj))
File "/usr/local_cci/Python-2.3/lib/python2.3/pickle.py", line 629, in _batch_appends
save(x)
File "/usr/local_cci/Python-2.3/lib/python2.3/pickle.py", line 338, in save
self.save_reduce(obj=obj, *rv)
File "/usr/local_cci/Python-2.3/lib/python2.3/pickle.py", line 419, in save_reduce
self.memoize(obj)
File "/usr/local_cci/Python-2.3/lib/python2.3/pickle.py", line 251, in memoize
assert id(obj) not in self.memo
AssertionError
 
D

Dima Dorfman

[Followups to python-dev, please.]

[Michael Hohn]
[Tim Peters]
[Michael Hohn]
Here is a code sample that shows the problem I ran into:

Summary for the OP: This is a bug in Python. Using cPickle won't help,
but if you don't subclass builtin container types others than list,
dict, and tuple, using pickle protocol 2 should work. The rest of this
message is for python-dev.


The simplest breaking case is:

t = type('t', (list,), {})
obj = t()
obj.append(obj)
pickle.dumps(obj)
[infinite recursion]

The subclass causes save_reduce to be used instead of save_list. For
proto < 2, copy_reg._reduce_ex returns (_reconstructor, list(a)), and
the args--list(a)--cycle back through obj. Initially it looks like this
should be okay, but the args are saved before obj is memoized, and obj
can't be memoized until REDUCE can be executed with the args--and
there's the cycle. It is even more obviously impossible from the
unpickler's perspective because it has to call _reconstructor([obj]) to
create obj!

There are two separate problems:

1. Any __reduce__ implementation that returns args that cycle back
through the object it tried to reduce hasn't done its job. As
described above, _reduce_ex is one such implementation. reduce_2
avoids this by using the listitems and dictitems parameters. Since
that's a pickler-side feature it can be used in _reduce_ex too. The
basetype(obj) hook (documented in PEP 307) would remain for
immutable bases; it doesn't work for containers, but user-defined
containers already have to implement their own reduce functions.
POC patch: http://www.trit.org/~dima/home/reduce_ex.diff

At least the set and deque types also have this problem.

2. The pickle implementations don't detect reduction cycles. Pickling
an instance of this obviously broken class causes an infinite
recursion:

class evil(object):
def __reduce__(self):
return evil, (self,)

It's easy to detect this case. POC patch for the pickle module:
http://www.trit.org/~dima/home/redcycle.diff

BTW, the failed assert the OP is seeing happens when the cycle goes
through another object:

t = type('t', (list,), {})
obj = t()
d = {'obj': obj}
obj.append(d)
pickle.dumps(obj)
[AssertionError]

cPickle has the same problem, but it lacks the assert, so it writes
garbage instead:

new = cPickle.loads(cPickle.dumps(obj))
new[0]['obj'] is new -> False # wrong
obj[0]['obj'] is obj -> True # right

This makes the reduction cycle check (#2 above) more than just cosmetic
since if cPickle had that assert (it should) it would've been a crash.
Right now it's garbage output instead, which is arguably worse.

Formally complete versions of the above patches will be on SF tomorrow
unless someone suggests better alternatives.

Dima.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,209
Messages
2,571,089
Members
47,689
Latest member
kilaocrhtbfnr

Latest Threads

Top