Python 3.1.2 and marshal

raj · Jul 17, 2010

Hi,

I am using 64 bit Python on an x86_64 platform (Fedora 13). I have
some code that uses the python marshal module to serialize some
objects to files. However, in moving the code to python 3 I have come
across a situation where, if more than one object has been serialized
to a file, then while trying to de-serialize only the first object is
de-serialized. Trying to de-serialize the second object raises an
EOFError. De-serialization of multiple objects works fine in Python
2.x. I tried going through the Python 3 documentation to see if
marshal functionality has been changed, but haven't found anything to
that effect. Does anyone else see this problem? Here is some
example code:

bash-4.1$ cat marshaltest.py
import marshal

numlines = 1
numwords = 25

stream = open('fails.mar','wb')
marshal.dump(numlines, stream)
marshal.dump(numwords, stream)
stream.close()

tmpstream = open('fails.mar', 'rb')
value1 = marshal.load(tmpstream)
value2 = marshal.load(tmpstream)

print(value1 == numlines)
print(value2 == numwords)

Here are the results of running this code

bash-4.1$ python2.7 marshaltest.py
True
True

bash-4.1$ python3.1 marshaltest.py
Traceback (most recent call last):
File "marshaltest.py", line 13, in <module>
value2 = marshal.load(tmpstream)
EOFError: EOF read where object expected

Interestingly the file created by using Python 3.1 is readable by both
Python 2.7 as well as Python 2.6 and both objects are successfully
read.

Cheers,
raj

Thomas Jollans · Jul 17, 2010

Hi,

I am using 64 bit Python on an x86_64 platform (Fedora 13). I have
some code that uses the python marshal module to serialize some
objects to files. However, in moving the code to python 3 I have come
across a situation where, if more than one object has been serialized
to a file, then while trying to de-serialize only the first object is
de-serialized. Trying to de-serialize the second object raises an
EOFError. De-serialization of multiple objects works fine in Python
2.x. I tried going through the Python 3 documentation to see if
marshal functionality has been changed, but haven't found anything to
that effect. Does anyone else see this problem? Here is some
example code:

Interesting. I modified your script a bit:

0

ts/2:/tmp% cat marshtest.py
from __future__ import print_function
import marshal
import sys
if sys.version_info[0] == 3:
bytehex = lambda i: '%02X ' % i
else:
bytehex = lambda c: '%02X ' % ord(c)

numlines = 1
numwords = 25

stream = open('fails.mar','wb')
marshal.dump(numlines, stream)
marshal.dump(numwords, stream)
stream.close()

tmpstream = open('fails.mar', 'rb')

for byte in tmpstream.read():
sys.stdout.write(bytehex(byte))

sys.stdout.write('\n')
tmpstream.seek(0)

print('pos:', tmpstream.tell())
value1 = marshal.load(tmpstream)
print('val:', value1)
print('pos:', tmpstream.tell())
value2 = marshal.load(tmpstream)
print('val:', value2)
print('pos:', tmpstream.tell())

print(value1 == numlines)
print(value2 == numwords)
0

ts/2:/tmp% python2.6 marshtest.py
69 01 00 00 00 69 19 00 00 00
pos: 0
val: 1
pos: 5
val: 25
pos: 10
True
True
0

ts/2:/tmp% python3.1 marshtest.py
69 01 00 00 00 69 19 00 00 00
pos: 0
val: 1
pos: 10
Traceback (most recent call last):
File "marshtest.py", line 29, in <module>
value2 = marshal.load(tmpstream)
EOFError: EOF read where object expected
1

ts/2:/tmp%

So, the contents of the file is identical, but Python 3 reads the whole
file, Python 2 reads only the data it uses.

This looks like a simple optimisation: read the whole file at once,
instead of byte-by-byte, to improve performance when reading large
objects. (such as Python modules...)

The question is: was storing multiple objects in sequence an intended
use of the marshal module? I doubt it. You can always wrap your data in
tuples or use pickle.

raj · Jul 18, 2010

On Jul 17 said:
So, the contents of the file is identical, but Python 3 reads the whole
file, Python 2 reads only the data it uses.

This looks like a simple optimisation: read the whole file at once,
instead of byte-by-byte, to improve performance when reading large
objects. (such as Python modules...)

Good analysis and a nice catch. Thanks. It is likely that the intent
is to optimize performance.

The question is: was storing multiple objects in sequence an intended
use of the marshal module?

The documentation (http://docs.python.org/py3k/library/marshal.html)
for marshal itself states (emphasis added by me),

marshal.load(file)¶

Read *one value* from the open file and return it. If no valid
value is read (e.g. because the data has a different Python version’s
incompatible marshal format), raise EOFError, ValueError or TypeError.
The file must be an open file object opened in binary mode ('rb' or 'r
+b').

This suggests that support for reading multiple values is intended.

I doubt it. You can always wrap your data in
tuples or use pickle.

The code that I am moving to 3.x dates back to the python 1.5 days,
when marshal was significantly faster than pickle and Zope was
evolutionarily at the Bobo stage

. I have switched the current code
to pickle - makes more sense. The pickle files are a bit larger and
loading it is a tad bit slower, but nothing that makes even a
noticeable difference for my use case. Thanks.

raj

Marshal and allocating objects	7	Sep 4, 2004
python and gloox	0	Jan 1, 2009
Pickling over a socket	13	Apr 19, 2011
Persistent Objects with Ruby - simple beginning	2	Dec 16, 2008
ANN: Sequel 0.2.1 Released	0	Sep 25, 2007
A Comparison of Python Class Objects and Init Files for Program Configuration	0	Sep 12, 2006
deepcopy via eval	6	Apr 11, 2006
[ANN] Builds of PyWebkitGtk and Webkit-Glib-Gtk(r39359+#16401.master) for Debian i386,Debian AMD64 a	0	Dec 31, 2008

Python 3.1.2 and marshal

raj

Thomas Jollans

raj

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads