Hello list,
I've found the following strange behavior of cPickle. Do you think
it's a bug, or is it by design?
Best regards,
Victor.
from pickle import dumps
from cPickle import dumps as cdumps
print dumps('1001799')==dumps(str(1001799))
print cdumps('1001799')==cdumps(str(1001799))
outputs
True
False
vicbook:~ victor$ python
Python 2.5 (r25:51918, Sep 19 2006, 08:49:13)
[GCC 4.0.1 (Apple Computer, Inc. build 5341)] on darwin
Type "help", "copyright", "credits" or "license" for more information.>>> quit()
vicbook:~ victor$ uname -a
Darwin vicbook 8.9.1 Darwin Kernel Version 8.9.1: Thu Feb 22 20:55:00
PST 2007; root:xnu-792.18.15~1/RELEASE_I386 i386 i386
I might have found the culprit: see
http://svn.python.org/projects/python/trunk/Modules/cPickle.c
Function static int put2(...) has the following code block in it :
---------cPickle.c-----------
int p;
....
if ((p = PyDict_Size(self->memo)) < 0) goto finally;
/* Make sure memo keys are positive! */
/* XXX Why?
* XXX And does "positive" really mean non-negative?
* XXX pickle.py starts with PUT index 0, not 1. This makes for
* XXX gratuitous differences between the pickling modules.
*/
p++;
-------------------------------
p++ will cause the difference. It seems the developers are not quite
sure why it's there or whether memo key sizes can be 0 or have to be
1.
Here is corresponding section for the Python version (pickle.py) taken
from Python 2.5
---------pickle.py----------
def memoize(self, obj):
"""Store an object in the memo."""
# The Pickler memo is a dictionary mapping object ids to 2-
tuples
# that contain the Unpickler memo key and the object being
memoized.
# The memo key is written to the pickle and will become
# the key in the Unpickler's memo. The object is stored in
the
# Pickler memo so that transient objects are kept alive during
# pickling.
# The use of the Unpickler memo length as the memo key is just
a
# convention. The only requirement is that the memo values be
unique.
# But there appears no advantage to any other scheme, and this
# scheme allows the Unpickler memo to be implemented as a
plain (but
# growable) array, indexed by memo key.
if self.fast:
return
assert id(obj) not in self.memo
memo_len = len(self.memo)
self.write(self.put(memo_len))
self.memo[id(obj)] = memo_len, obj
# Return a PUT (BINPUT, LONG_BINPUT) opcode string, with argument
i.
def put(self, i, pack=struct.pack):
if self.bin:
if i < 256:
return BINPUT + chr(i)
else:
return LONG_BINPUT + pack("<i", i)
return PUT + repr(i) + '\n'
------------------------------------------
In memoize memo_len is the 'int p' from the c version. The size is 0
and is kept 0 while in the C version the size initially is 0 but then
is incremented with p++;
Any developers that know more about this?
-Nick Vatamaniuc