is there a safe marshaler?

I

Irmen de Jong

Pickle and marshal are not safe. They can do harmful
things if fed maliciously constructed data.
That is a pity, because marshal is fast.
I need a fast and safe (secure) marshaler.
Is xdrlib the only option?
I would expect that it is fast and safe because
it (the xdr spec) has been around for so long.

Or are there better options (perhaps 3rd party libraries)?

Thanks

Irmen.
 
P

Pierre Barbier de Reuille

Irmen de Jong a écrit :
Pickle and marshal are not safe. They can do harmful
things if fed maliciously constructed data.
That is a pity, because marshal is fast.
I need a fast and safe (secure) marshaler.
Is xdrlib the only option?
I would expect that it is fast and safe because
it (the xdr spec) has been around for so long.

Or are there better options (perhaps 3rd party libraries)?

Thanks

Irmen.

What exactly do you mean by "safe" ? Do you want to ensure your objects
cannot receive corrupted data ? Do you want to ensure no code will be
evaluated during the unmarshalling ?

Please, be more precise,

Pierre
 
G

guido

Irmen said:
Pickle and marshal are not safe. They can do harmful
things if fed maliciously constructed data.
That is a pity, because marshal is fast.

I think marshal could be fixed; the only unsafety I'm aware of is that
it doesn't always act rationally when confronted with incorrect input
like bad type codes or truncated input. It only receives instances of
the built-in types and it never executes user code as a result of
unmarshalling.

Perhaps someone would be interested in submitting a patch to the
unmarshalling code? Since this is a security fix we'd even accept a fix
for 2.3.
I need a fast and safe (secure) marshaler.
Is xdrlib the only option?
I would expect that it is fast and safe because
it (the xdr spec) has been around for so long.

I don't expect that to be particularly fast, since it mostly operates
at Python speed. I think it could be safe but I would still do a
thorough code review if I were you -- the code is older than my
awareness of the vulnerabilities inherent in this kind of remote data
transfer.

--Guido
 
I

Irmen de Jong

Pierre said:
Irmen de Jong a écrit :



What exactly do you mean by "safe" ? Do you want to ensure your objects
cannot receive corrupted data ? Do you want to ensure no code will be
evaluated during the unmarshalling ?

"safe (secure)"
But to be more precise, let's look at the security warning that
is in the marshal documentation:
"The marshal module is not intended to be secure against erroneous or
maliciously constructed data. Never unmarshal data received from an
untrusted or unauthenticated source."

So essentially I want the opposite of that ;-)

I want a marshalar that is okay to use where the data it processes
comes from unknown, external sources (untrusted). It should not crash
on corrupt data and it should not execute arbitrary code when
unmarshaling, so that it is safe against hacking attempts.

Oh, preferrably, it should be fast :)
Some XML-ish thing may be secure but is likely to be not fast at all.

Ideally it should be able to transfer user defined Python types,
but if it is like marshal (can only marshal builtin types) that's
okay too.

--Irmen
 
I

Irmen de Jong

Hello Guido

I think marshal could be fixed; the only unsafety I'm aware of is that
it doesn't always act rationally when confronted with incorrect input
like bad type codes or truncated input. It only receives instances of
the built-in types and it never executes user code as a result of
unmarshalling.

So it is not vulnerable in the way that pickle is? That's a start.
The security warning in the marsal doc then makes it sound worse than
it is...
Perhaps someone would be interested in submitting a patch to the
unmarshalling code? Since this is a security fix we'd even accept a fix
for 2.3.

That would be nice indeed :)

I don't expect that to be particularly fast, since it mostly operates
at Python speed.

Ah, I wasn't aware that xdrlib was implemented in Python :)
I thought it used a (standard?) C-implementation.
But I now see that it's a Python module (utilizing struct).
I think it could be safe but I would still do a
thorough code review if I were you -- the code is older than my
awareness of the vulnerabilities inherent in this kind of remote data
transfer.

Thanks for the warning.

--Irmen de Jong
 
A

Alan Kennedy

[Irmen de Jong]
Pickle and marshal are not safe. They can do harmful
things if fed maliciously constructed data.
That is a pity, because marshal is fast.
I need a fast and safe (secure) marshaler.

Hi Irmen,

I'm not necessarily proposing a solution to your problem, but am
interested in your requirement. Is this for pyro?

In the light of pyro, would something JSON be suitable for your need? I
only came across it a week ago (when someone else posted about it here
on c.l.py), and am intrigued by it.

http://json.org

What I find particularly intriguing is the JSON-RPC protocol, which
looks like a nice lightweight alternative to XML-RPC.

http://oss.metaparadigm.com/jsonrpc/

Also interesting is the browser embeddable JSON-RPC client written in
javascript, for which you can see a demo here

http://oss.metaparadigm.com/jsonrpc/demos.html

I thought you might be interested.

regards,
 
A

Alan Kennedy

[Alan Kennedy]
What I find particularly intriguing is the JSON-RPC protocol, which
looks like a nice lightweight alternative to XML-RPC.

http://oss.metaparadigm.com/jsonrpc/

Also interesting is the browser embeddable JSON-RPC client written in
javascript, for which you can see a demo here

http://oss.metaparadigm.com/jsonrpc/demos.html

I should have mentioned as well that there is a python JSON-RPC server
implementation, which incudes a complete JSON<-->python-objects codec.

http://www.json-rpc.org/pyjsonrpc/index.xhtml

regards,
 
I

Irmen de Jong

PA said:
XDR? Like Sun's "XDR: External Data Representation standard"?

http://www.faqs.org/rfcs/rfc1014.html
http://www.faqs.org/rfcs/rfc1832.html

Not "like", but "the".
Or at least, a subset. (the xdrlib module documentation says
"It supports most of the data types described in the RFC").

How does XDR copes with Unicode these days?

Not directly, it seems that you have to encode
your unicode strings yourself first .

Alternatively, perhaps there is a ASN.1 DER library in python?

http://asn1.elibel.tm.fr/en/standards/index.htm


I don't know. Is there?


PS the xdr format is not self-describing in the way that
marshal and pickle streams are. That is a big limitiation
for what I need it for so xdr seems to drop off my radar.
Is an ASN.1 stream self-describing?

--Irmen
 
I

Irmen de Jong

Alan said:
[Irmen de Jong]
Pickle and marshal are not safe. They can do harmful
things if fed maliciously constructed data.
That is a pity, because marshal is fast.
I need a fast and safe (secure) marshaler.


Hi Irmen,

I'm not necessarily proposing a solution to your problem, but am
interested in your requirement. Is this for pyro?

Yes and No.
Yes, I'm investigating possible marshaling alternatives
(others than pickle which Pyro uses right now).
No, I'm not changing Pyro yet. It's just that I want to
investigate possible *secure* alternatives to the current
implementation.
(Note that a secure version would also mean that Pyro's
advanced features such as mobile code should go the way
of the dodo, and I don't want to do this yet).
In the light of pyro, would something JSON be suitable for your need? I
only came across it a week ago (when someone else posted about it here
on c.l.py), and am intrigued by it.

http://json.org

Looks very interesting indeed, but in what way would this be
more secure than say, pickle or marshal?
A quick glance at some docs reveal that they are using eval
to process the data... ouch.

I thought you might be interested.

I certainly am but for different reasons.

--Irmen
 
P

PA

PS the xdr format is not self-describing in the way that
marshal and pickle streams are. That is a big limitiation
for what I need it for so xdr seems to drop off my radar.
Is an ASN.1 stream self-describing?

Not sure how much "self-describing" you want it to be, but, yes it can
be as formal as you want it to be...

"... Abstract Syntax Notation One (ASN.1) is a formal language for
abstractly describing messages... "

Sorry if this is off-topic, I didn't follow the thread from the very
beginning, but wouldn't something like YAML work for you perhaps?

http://yaml.org/

Or even something more, er, exotic:

https://alt.textdrive.com/pl/

Cheers
 
I

Irmen de Jong

PA said:
Sorry if this is off-topic, I didn't follow the thread from the very
beginning, but wouldn't something like YAML work for you perhaps?

http://yaml.org/

Perhaps, but the spec makes my skin crawl.
Also, it seems ill-fit for efficient machine-to-machine
communication (yaml seems to be designed to be easily (?) read/edited
by humans, a thing which I don't require at all).

Naah.

--Irmen
 
P

PA

Also, it seems ill-fit for efficient machine-to-machine
communication...

Well, then, if you are looking for industrial strength quality, ASN.1
is the way to go. After all, a good chunk of the telecom infrastructure
is using it.

Cheers
 
A

Alan Kennedy

[Irmen de Jong]
[Alan Kennedy]
[Irmen de Jong]
> Looks very interesting indeed, but in what way would this be
> more secure than say, pickle or marshal?
> A quick glance at some docs reveal that they are using eval
> to process the data... ouch.

Well, the python JSON codec provided appears to use eval, which might
make it *seem* unsecure.

http://www.json-rpc.org/pyjsonrpc/index.xhtml

But a more detailed examination of the code indicates, to this reader at
least, that it can be made completely secure very easily. The designer
of the code could very easily have not used eval, and possibly didn't do
so simply because he wasn't thinking in security terms.

The codec uses tokenize.generate_tokens to split up the JSON string into
tokens to be interpreted as python objects. tokenize.generate_tokens
generates a series of textual name/value pairs, so nothing insecure
there: the content of the token/strings is not executed.

Each of the tokens is then passed to a "parseValue" function, which is
defined thusly:

#===================

def parseValue(self, tkns):
(ttype, tstr, ps, pe, lne) = tkns.next()
if ttype in [token.STRING, token.NUMBER]:
return eval(tstr)
elif ttype == token.NAME:
return self.parseName(tstr)
elif ttype == token.OP:
if tstr == "-":
return - self.parseValue(tkns)
elif tstr == "[":
return self.parseArray(tkns)
elif tstr == "{":
return self.parseObj(tkns)
elif tstr in ["}", "]"]:
return EndOfSeq
elif tstr == ",":
return SeqSep
else:
raise "expected '[' or '{' but found: '%s'" % tstr
else:
return EmptyValue

#===================

As you can see, eval is *only* called when the next token in the stream
is either a string or a number, so it's really just a very simple code
shortcut to get a value from a string or number.

If one defined the function like this (not tested!), to remove the eval,
I think it should be safe.

#===================

default_number_type = float
#default_number_type = int

def parseValue(self, tkns):
(ttype, tstr, ps, pe, lne) = tkns.next()
if ttype in [token.STRING]:
return tstr
if ttype in [token.NUMBER]:
return default_number_type(tstr)
elif ttype == token.NAME:
return self.parseName(tstr)
elif ttype == token.OP:
if tstr == "-":
return - self.parseValue(tkns)
elif tstr == "[":
return self.parseArray(tkns)
elif tstr == "{":
return self.parseObj(tkns)
elif tstr in ["}", "]"]:
return EndOfSeq
elif tstr == ",":
return SeqSep
else:
raise "expected '[' or '{' but found: '%s'" % tstr
else:
return EmptyValue

#===================

The only other use of eval is also only for string types, i.e. in the
parseObj function:

#===================
def parseObj(self, tkns):
obj = {}
nme =""
try:
while 1:
(ttype, tstr, ps, pe, lne) = tkns.next()
if ttype == token.STRING:
nme = eval(tstr)
(ttype, tstr, ps, pe, lne) = tkns.next()
if tstr == ":":
v = self.parseValue(tkns)
# Remainder of this function elided
#===================

Which could similarly be replaced with direct use of the string itself,
rather than eval'ing it. (Although one might want to look at encoding
issues: I haven't looked at JSON-RPC enough to know how it proposes to
handle string encodings.)

So I don't think there any serious security issues here: the
"simplicity" of the JSON grammar is what attracted me to it in the first
place, especially since there are already robust and efficient lexers
and parsers already available built-in to python and javascript (and
javascript interpreters are getting pretty ubiquitous these days).

And it's certainly the case that if the only available python impl of
JSON/RPC is not secure, it is possible to write one that is both
efficient and secure.

Hopefully there isn't some glaring security hole that I've missed:
doubtless I'll find out real soon ;-) Gotta love full disclosure.

regards,
 
I

Irmen de Jong

Hi Alan

Alan said:
Well, the python JSON codec provided appears to use eval, which might
make it *seem* unsecure.

http://www.json-rpc.org/pyjsonrpc/index.xhtml

But a more detailed examination of the code indicates, to this reader at
least, that it can be made completely secure very easily. The designer
of the code could very easily have not used eval, and possibly didn't do
so simply because he wasn't thinking in security terms.
[...]

Very interesting indeed.
So I don't think there any serious security issues here: the
"simplicity" of the JSON grammar is what attracted me to it in the first
place, especially since there are already robust and efficient lexers
and parsers already available built-in to python and javascript (and
javascript interpreters are getting pretty ubiquitous these days).

The cross-platform/language aspect is quite nice indeed.
And it's certainly the case that if the only available python impl of
JSON/RPC is not secure, it is possible to write one that is both
efficient and secure.

I think we (?) should do this then, and send it to the author
of the original version so that he can make an improved version
available? I think there are more people interested in a secure
marshaling implementation than just me :)


I'll still have to look at Twisted's Jelly.


Thanks for your analysis,
--Irmen
 
A

Alan Kennedy

[Alan Kennedy]
[Irmen de Jong]
> I think we (?) should do this then, and send it to the author
> of the original version so that he can make an improved version
> available? I think there are more people interested in a secure
> marshaling implementation than just me :)

I should learn to keep my mouth zipped :-L

OK, I really don't have time for a detailed examination of either the
JSON spec or the python impl of same. And I *definitely* don't have time
for a detailed security audit, much though I'd love to.

But I'll try to help: the code changes are really very simple. So I've
edited the single affected file, json.py, and here's a patch: But be
warned that I haven't even run this code!

Index: json.py
===================================================================
--- json.py (revision 2)
+++ json.py (working copy)
@@ -66,8 +66,10 @@

def parseValue(self, tkns):
(ttype, tstr, ps, pe, lne) = tkns.next()
- if ttype in [token.STRING, token.NUMBER]:
- return eval(tstr)
+ if ttype == token.STRING:
+ return unicode(tstr)
+ if ttype == token.NUMBER:
+ return float(tstr)
elif ttype == token.NAME:
return self.parseName(tstr)
elif ttype == token.OP:
@@ -110,7 +112,12 @@
while 1:
(ttype, tstr, ps, pe, lne) = tkns.next()
if ttype == token.STRING:
- nme = eval(tstr)
+ possible_ident = unicode(tstr)
+ try:
+ # Python identifiers have to be ascii
+ nme = possible_ident.encode('ascii')
+ except UnicodeEncodeError:
+ raise "Non-ascii identifier"
(ttype, tstr, ps, pe, lne) = tkns.next()
if tstr == ":":
v = self.parseValue(tkns)

I'll leave contacting the author to you, if you wish.
> I'll still have to look at Twisted's Jelly.

Hmmm, s-expressions, interesting. But you'd have to write your own
s-expression parser and jelly RPC client to get up and running in other
languages.

regards,
 
C

cmkl

Irmen de Jong said:
"safe (secure)"
But to be more precise, let's look at the security warning that
is in the marshal documentation:
"The marshal module is not intended to be secure against erroneous or
maliciously constructed data. Never unmarshal data received from an
untrusted or unauthenticated source."

So essentially I want the opposite of that ;-)

I want a marshalar that is okay to use where the data it processes
comes from unknown, external sources (untrusted). It should not crash
on corrupt data and it should not execute arbitrary code when
unmarshaling, so that it is safe against hacking attempts.

Oh, preferrably, it should be fast :)
Some XML-ish thing may be secure but is likely to be not fast at all.

Ideally it should be able to transfer user defined Python types,
but if it is like marshal (can only marshal builtin types) that's
okay too.

--Irmen

I'm just curious,

but can't effbot's fast cElementree be used for PYROs XML_PICKLE
and would it be safe and fast enough?

Carl
 
I

Irmen de Jong

cmkl said:
but can't effbot's fast cElementree be used for PYROs XML_PICKLE
and would it be safe and fast enough?

ElementTree's not a marshaler.
Or has it object (de)serialization included?

--Irmen
 
S

Skip Montanaro

Carl> but can't effbot's fast cElementree be used for PYROs XML_PICKLE
Carl> and would it be safe and fast enough?

It's not clear to me that if marshal is unsafe how XML could be safe. In
this context they are both just serializations of basic Python data
structures.

Skip
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,219
Messages
2,571,120
Members
47,741
Latest member
WilliamsFo

Latest Threads

Top