Ok, I've attached the proto PEP below.
Comments on the proto PEP and the implementation are appreciated.
Sw.
Title: Secure, standard serialization of simple python types.
Abstract
This PEP suggests the addition of a module to the standard library,
which provides a serialization class for simple Python types.
Copyright
This document is placed in the public domain.
Motivation
The standard library currently provides two modules which are used
for object serialization. Pickle is not secure by its very nature,
and the marshal module is clearly marked as being not secure in the
documentation. The marshal module does not guarantee compatibility
between Python versions. The proposed module will only serialize
simple built-in Python types, and provide compatibility across
Python versions.
See RFE 467384 (on SourceForge) for more discussion on the above
issues.
Specification
The proposed module should use the same API as the marshal module.
dump(value, file)
#serialize value, and write to open file object
load(file)
#read data from file object, unserialize and return an object
dumps(value)
#return the string that would be written to the file by dump
loads(value)
#unserialize and return object
Reference Implementation
http://metaplay.dyndns.org:82/~simon/gherkin.py.txt
Rationale
The marshal documentation explicitly states that it is unsuitable
for unmarshalling untrusted data. It also explicitly states that
the format is not compatible across Python versions.
Pickle is compatible across versions, but also unsafe for loading
untrusted data. Exploits demonstrating pickle vulnerability exist.
xmlrpclib provides serialization functions, but is unsuitable when
serializing large data structures, or when high performance is a
requirement. If performance is an issue, a C-based accelerator
module can be installed. If size is an issue, gzip can be used,
however, this creates a mutually exclusive size/performance
trade-off.
Other existing formats, such as JSON and Bencode (bittorrent) do
not handle some marginally complex python structures and/or all
the simple Python types.
Time and space efficiency, and security do not have to be mutually
exclusive features of a serializer. Python does not provide, in the
standard library, a serializer which can work safely with untrusted
data which is time and space efficient. The proposed gherkin module
goes some way to achieving this. The format is simple enough to
easily write interoperable implementations across platforms.