Secure Pickle-like module

J

jiba

Hi all,

I'm currently working on a secure Pickle-like module, Cerealizer,
http://home.gna.org/oomadness/en/cerealizer/index.html
Cerealizer has a pickle-like interface (load, dump, __getstate__,
__setstate__,...), however it requires to register the class you want
to "cerealize", by calling cerealizer.register(YourClass).
Cerealizer doesn't import other modules (contrary to pickle), and the
only methods it may call are YourClass.__new__, YourClass.__getstate__
and YourClass.__setstate__ (Cerealizer keeps it own reference to these
three method, so as YourCall.__setstate__ = cracked_method is
harmless).
Thus, as long as __new__, __getstate__ and __setstate__ are not
dangerous, Cerealizer should be secure.

The performance are quite good and, with Psyco, it is about as fast as
cPickle. However, Cerealizer is written in less than 300 lines of
pure-Python code.

I would appreciate any comments, especially if there are some security
gurus here :)

Jiba
 
J

jiba

There are a couple factual inaccuracies on the site that I'd like to clear up first:
Trivial benchmarks put cerealizer and banana/jelly on the same level as far as performance goes:
$ python -m timeit -s 'from cereal import dumps; L = ["Hello", " ", ("w", "o", "r", "l", "d", ".")]' 'dumps(L)'
10000 loops, best of 3: 84.1 usec per loop
$ python -m timeit -s 'from twisted.spread import banana, jelly; dumps = lambda o: banana.encode(jelly.jelly(o)); L = ["Hello", " ", ("w", "o", "r", "l", "d", ".")]' 'dumps(L)'
10000 loops, best of 3: 89.7 usec per loop

This is with cBanana though, which has to be explicitly enabled and, of course, is written in C. So Cerealizer looks like it has the potential to do pretty well, performance-wise.

My personal benchmark was different; it was using a list with 2000
objects defined as following:

class O(object):
def __init__(self):
self.x = 1
self.s = "jiba"
self.o = None

with self.o referring to another O object. I think my benchmark,
although still very limited, is more representative since it involves
object, string, number and list.

See it there:
http://svn.gna.org/viewcvs/*checkou.../test/test1.py?content-type=text/plain&rev=31

The results are (using Psyco):
With old-style classes:
cerealizer
dumps in 0.0619530677795 s, 114914 bytes length
loads in 0.0313038825989 s

cPickle
dumps in 0.0301840305328 s, 116356 bytes length
loads in 0.023097038269 s

jelly + banana
dumps in 0.168012142181 s 169729 bytes length
loads in 1.82081913948 s

jelly + cBanana
dumps in 0.082946062088 s 169729 bytes length
loads in 0.156159877777 s

With new-style classes:
cerealizer
dumps in 0.0575239658356 s, 114914 bytes length
loads in 0.028165102005 s

cPickle
dumps in 0.07634806633 s, 116428 bytes length
loads in 0.0278959274292 s

jelly + banana
dumps in 0.156242132187 s 169729 bytes length
(TypeError; I didn't investigate this problem yet although it is
surely solvable)

jelly + cBanana
dumps in 0.10772895813 s 169729 bytes length
(TypeError; I didn't investigate this problem yet although it is
surely solvable)

As you see, cPickle is about 2 times faster than cerealizer for
old-style classes, but cerealizer beats cPickle for new-style classes
(which makes sense since I have optimized it for new-style classes).
However, Jelly is far behind, even using cBanana, especially for
loading.

You talked about _Tuple and _Dereference on the website as well. These are internal implementation details. jelly also supports extension types, by way of setUnjellyableForClass and similar functions.

The problem arises only when the extension type expects an attribute of
a specific class, e.g. (in Pyrex):

cdef class MyClass:
cdef MyClass other

The other attribute of MyClass can only contains a reference to an
instance of MyClass (or None). Thus it cannot be set to an instance of
_Dereference or _Tuple, even temporarily; doing other =
_Dereference(...) raises an exception.

I solve this problem in Cerealizer by doing a 2-pass object creation:
step 1, create all the objects; step 2, set all objects' states.
As far as security goes, no obvious problems jump out at me, either
from the API for from skimming the code. I think early-binding
__new__, __getstate__, and __setstate__ may be going further than
is necessary. If someone can find code to set attributes on classes
in your process space, they can probably already do anything they
want to your program and don't need to exploit security problems in
your serializer.

I agree on that; however I prefer to be "over-secure" than "just as
secure as necessary" :)

Thank you for your opinion!
I'm going to update my website.
Jiba
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,968
Messages
2,570,154
Members
46,701
Latest member
XavierQ83

Latest Threads

Top