is there a safe marshaler?

Aahz · Feb 11, 2005

Carl> but can't effbot's fast cElementree be used for PYROs XML_PICKLE
Carl> and would it be safe and fast enough?

It's not clear to me that if marshal is unsafe how XML could be safe. In
this context they are both just serializations of basic Python data
structures.

The difference is that parsing XML -- even badly malformed -- won't
crash Python.
--
Aahz ([email protected]) <*> http://www.pythoncraft.com/

"The joy of coding Python should be in seeing short, concise, readable
classes that express a lot of action in a small amount of clear code --
not in reams of trivial code that bores the reader to death." --GvR

Fredrik Lundh · Feb 11, 2005

Irmen said:
ElementTree's not a marshaler.
Or has it object (de)serialization included?

nope. building a serialization layer on top of it is pretty trivial, and the result
is pretty fast, but nowhere close to C speed.

</F>

Fredrik Lundh · Feb 11, 2005

Aahz said:
The difference is that parsing XML -- even badly malformed -- won't
crash Python.
optimist.

(have patience. have lots of patience.)

</F>

Fredrik Lundh · Feb 11, 2005

(repost; gmane seems to have eaten my original post)

So it is not vulnerable in the way that pickle is? That's a start.
The security warning in the marsal doc then makes it sound worse than
it is...

the problem is that the following may or may not reach the "done!" statement,
somewhat depending on python version, memory allocator, and what data you
pass to dumps.

import marshal

data = marshal.dumps((1, 2, 3, "hello", 4, 5, 6))

for i in range(len(data), -1, -1):
try:
print marshal.loads(data[:i])
except EOFError:
print "EOFError"
except ValueError:
print "ValueError"

print "done!"

(try different data combinations, to see how far you get on your platform...)

fixing this should be relatively easy, and should result in a safe unmarshaller (your
application will still have to limit the amount of data fed into load/loads, of course).

</F>

Fredrik Lundh · Feb 11, 2005

(repost; gmane seems to have eaten my original post)

The difference is that parsing XML -- even badly malformed -- won't
crash Python.
optimist.

(have patience. have lots of patience.)

</F>

Irmen de Jong · Feb 11, 2005

Fredrik said:
(have patience. have lots of patience.)

Hehe, the XML killer file "BillionLaughs"... correct?

--Irmen

Irmen de Jong · Feb 11, 2005

Alan said:
I should learn to keep my mouth zipped :-L
:-D

OK, I really don't have time for a detailed examination of either the
JSON spec or the python impl of same. And I *definitely* don't have time
for a detailed security audit, much though I'd love to.

No problem. The patch you wrote is a very good start, I think!!

Interestingly enough, I just ran across "Flatten":
http://sourceforge.net/project/showfiles.php?group_id=82591&package_id=91311

"...which aids in serializing/unserializing networked data securely,
without having to fear execution of code or the like."

Sounds promising!

--Irmen

Irmen de Jong · Feb 11, 2005

Fredrik said:
the problem is that the following may or may not reach the "done!" statement,
somewhat depending on python version, memory allocator, and what data you
pass to dumps.

import marshal

data = marshal.dumps((1, 2, 3, "hello", 4, 5, 6))

for i in range(len(data), -1, -1):
try:
print marshal.loads(data[:i])
except EOFError:
print "EOFError"
except ValueError:
print "ValueError"

print "done!"

(try different data combinations, to see how far you get on your platform...)

Python 2.4 on my windows box crashes with
Fatal Python error: PyString_InternInPlace: strings only please!

This application has requested the Runtime to terminate it in an unusual way.
Please contact the application's support team for more information.
c:\> _

So indeed it seems that marshal is not safe yet :-|

fixing this should be relatively easy, and should result in a safe unmarshaller (your
application will still have to limit the amount of data fed into load/loads, of course).

Okay.

--Irmen

Alan Kennedy · Feb 12, 2005

[Irmen de Jong]

Interestingly enough, I just ran across "Flatten":
http://sourceforge.net/project/showfiles.php?group_id=82591&package_id=91311

"...which aids in serializing/unserializing networked data securely,
without having to fear execution of code or the like."

Sounds promising!

Well, I'm always dubious of OSS projects that don't even have any bugs
reported, let alone fixed: no patches submitted, etc, etc.

http://sourceforge.net/tracker/?group_id=82591

Though maybe I'm missing something obvious?

Irmen de Jong · Feb 12, 2005

Alan said:
[Irmen de Jong]

Interestingly enough, I just ran across "Flatten":
http://sourceforge.net/project/showfiles.php?group_id=82591&package_id=91311

"...which aids in serializing/unserializing networked data securely,
without having to fear execution of code or the like."

Sounds promising!

Click to expand...

Well, I'm always dubious of OSS projects that don't even have any bugs
reported, let alone fixed: no patches submitted, etc, etc.

http://sourceforge.net/tracker/?group_id=82591

Though maybe I'm missing something obvious?

Perhaps the SF trackers are simply not used for that project?
Consider my own project:
http://sourceforge.net/tracker/?group_id=18837
I can assure you that I have fixed and applied a huge
amount of bugs and patches during the lifetime of the project.
They are just not entered in the trackers, except for a few.

--Irmen

Paul Rubin · Feb 14, 2005

[email protected] said:
I think marshal could be fixed; the only unsafety I'm aware of is that
it doesn't always act rationally when confronted with incorrect input
like bad type codes or truncated input. It only receives instances of
the built-in types and it never executes user code as a result of
unmarshalling.

There's another issue with marshal that makes it unsuitable for Pyro,
which is that its data format is (for legitimate reasons) not
guaranteed to be the same across different Python releases. That
means that if the two ends of the Pyro application aren't using the
same Python version, they might not be able to interoperate.

I don't remember if marshal strings contain a version number. If they
do, then the non-interoperating versions can notice the
incompatibility and raise an appropriate error. If they don't, then
undefined behavior and possible security holes could result, unless
Pyro takes special measures to notice the possibility.

See SF bugs #467384 and #471893 for some further discussion.

Irmen de Jong · Feb 14, 2005

Paul said:
There's another issue with marshal that makes it unsuitable for Pyro,
which is that its data format is (for legitimate reasons) not
guaranteed to be the same across different Python releases. That
means that if the two ends of the Pyro application aren't using the
same Python version, they might not be able to interoperate.

Paul, the default serialization protocol that Pyro uses is pickle
(with the highest available protocol number). So there is a risk
already that it doesn't interoperate with older Python versions,
unless you configure the max pickle protocol or switch to using
one of the supported XML serializations.
For mobile code, Pyro relies on the transfer of the actual
bytecode and this won't work at all no matter what if you use
different Python versions. Unless the bytecode happens to be
the same (consider yourself lucky).

--Irmen

Paul Rubin · Feb 14, 2005

Irmen de Jong said:
Paul, the default serialization protocol that Pyro uses is pickle
(with the highest available protocol number). So there is a risk
already that it doesn't interoperate with older Python versions,
unless you configure the max pickle protocol or switch to using
one of the supported XML serializations.

Yes, however, you can at least set the protocol level. Marshal doesn't
give you that option.

What do you do about the security issue if you're using pickle? Do
you have to trust the other end to not send you malicious pickles?

Irmen de Jong · Feb 14, 2005

Paul said:
Yes, however, you can at least set the protocol level. Marshal doesn't
give you that option.

That's right. So good for Pyro then

It works most of the time, even across different Python versions,
unless using mobile code.

What do you do about the security issue if you're using pickle? Do
you have to trust the other end to not send you malicious pickles?

I do nothing about it.
Yes, you have to trust the other end.
So you have to use your own -or Pyro's- authentication/authorization
logic to make sure that the other end can be trusted.
You could use SSL with certificates for instance.

In fact, this is the reason why I started this thread.
I wanted to discover some possibilities to replace pickle
by another thing, so that Pyro becomes 'safe' at the wire
protocol level.
But further discussion on the Pyro mailing list sort of
made it clear that this is not desirable.

--Irmen

Paul Rubin · Feb 14, 2005

Irmen de Jong said:
I do nothing about it.
Yes, you have to trust the other end.
So you have to use your own -or Pyro's- authentication/authorization
logic to make sure that the other end can be trusted.
You could use SSL with certificates for instance.

Well, ok, if you trust then other end then I think it's enough to just
authenticate all the pickles (say using hmac.py) without needing
something as heavyweight as SSL. If you use SSL you need something
like m2crypto since the SSL option in the socket module doesn't check
certificates, IIRC.

In fact, this is the reason why I started this thread.
I wanted to discover some possibilities to replace pickle
by another thing, so that Pyro becomes 'safe' at the wire
protocol level.
But further discussion on the Pyro mailing list sort of
made it clear that this is not desirable.

Why do you say it's not desirable? Don't competing protocols like RMI
try to stay safe from malicious peers? Why should I not want to
expose a Pyro service to the internet? It's a natural thing to want
to do.

Irmen de Jong · Feb 14, 2005

Well, ok, if you trust then other end then I think it's enough to just

authenticate all the pickles (say using hmac.py) without needing
something as heavyweight as SSL.

An interesting idea that hadn't crossed my mind yet.
Pyro *does* already have connection authentication that uses md5
(and hmac since 3.5beta) with a shared secret, but after that,
the communication is done in plaintext so to speak.

If you use SSL you need something
like m2crypto since the SSL option in the socket module doesn't check
certificates, IIRC.

I'm using m2crypto for this kind of SSL, yes.
(sadly it has a bug in its API that is triggerd by the current
Pyro version on some platforms like Linux).

Why do you say it's not desirable? Don't competing protocols like RMI
try to stay safe from malicious peers? Why should I not want to
expose a Pyro service to the internet? It's a natural thing to want
to do.

You should not want to expose a Pyro service to the internet because
Python doesn't have Java's security model and sandboxing, that are
used with RMI. Pyro has a few features that are very powerful
but also require the use of intrinsic insecure Python code (namely,
pickle, and marshal).
Just look at the recent security advisory about the XMLRPC server
that comes with Python.... it's much more primitive than Pyro is,
but even that one was insecure.

I wouldn't put a Java RMI server or xyz CORBA server or whatever
kind of unrestricted API open on the internet anyway.
Am I rational or paranoid?

--Irmen

Paul Rubin · Feb 14, 2005

Irmen de Jong said:
An interesting idea that hadn't crossed my mind yet. Pyro *does*
already have connection authentication that uses md5 (and hmac since
3.5beta) with a shared secret, but after that, the communication is
done in plaintext so to speak.

Yes, that's what I meant, using hmac to authenticate using a shared secret,
sending the rest in the clear. Note you should also put sequence numbers
in the messages, to stop the attacker from fooling you by selectively
deleting or replaying messages.

You should not want to expose a Pyro service to the internet because
Python doesn't have Java's security model and sandboxing, that are
used with RMI. Pyro has a few features that are very powerful
but also require the use of intrinsic insecure Python code (namely,
pickle, and marshal).

Can you say some more about this? Does RMI really rely on sandboxes,
if you don't send code around, but just expose operations on server
side objects?

I don't think marshal is inherently insecure, since the unmarshaller
doesn't itself execute any marshalled code. It apparently has some
bugs that can confuse it if you send it a malformed marshalled string,
but those can be fixed. Pickle is inherently insecure because of how
it calls class constructors.

Just look at the recent security advisory about the XMLRPC server
that comes with Python.... it's much more primitive than Pyro is,
but even that one was insecure.

I haven't looked at that bug carefully yet but yes, anything exposed
to the internet has to be done very carefully, and XMLRPC missed something.

I wouldn't put a Java RMI server or xyz CORBA server or whatever
kind of unrestricted API open on the internet anyway.
Am I rational or paranoid?

I haven't used Java enough to advise you on this, but I thought they
were supposed to be ok to expose to the internet. Certainly the whole
idea of .NET is to let you securely provide RPC services (excuse me
for a moment while I try to stop laughing for mentioning security and
Microsoft in the same sentence). And lots of people use things like
SOAP for that.

Irmen de Jong · Feb 15, 2005

Paul said:
Yes, that's what I meant, using hmac to authenticate using a shared secret,
sending the rest in the clear. Note you should also put sequence numbers
in the messages, to stop the attacker from fooling you by selectively
deleting or replaying messages.

Thanks for the tip. I'll think about this.

Can you say some more about this? Does RMI really rely on sandboxes,
if you don't send code around, but just expose operations on server
side objects?

Well, my experience with RMI is very limited (and from a few years ago)
but I remember that you are required to set a security manager on your
RMI objects. I always used Java's default rmi security manager but I
honestly don't know what it actually does :-D

Other than that, it would be interesting to know if the RMP or IIOP
protocols have any problems with malicious packets? I don't know
them well enough to say anything about this.

I don't think marshal is inherently insecure, since the unmarshaller
doesn't itself execute any marshalled code. It apparently has some
bugs that can confuse it if you send it a malformed marshalled string,
but those can be fixed. Pickle is inherently insecure because of how
it calls class constructors.

Yep, that's what I now know too from the other replies in this thread.

I haven't looked at that bug carefully yet but yes, anything exposed
to the internet has to be done very carefully, and XMLRPC missed something.

What I know of it is that you had the possibility to arbitrarily follow
attribute paths, including attributes that should rather be kept hidden.

I haven't used Java enough to advise you on this, but I thought they
were supposed to be ok to expose to the internet. Certainly the whole
idea of .NET is to let you securely provide RPC services (excuse me
for a moment while I try to stop laughing for mentioning security and
Microsoft in the same sentence). And lots of people use things like
SOAP for that.

I label things like SOAP and XML-RPC much different than RMI or Pyro,
because they (SOAP) are much more "distant" from the actual
programming language and environment beneath them. I don't know if
this is good thinking or not but the fact that RMI and Pyro expose
language features directly, and SOAP not, makes that I reason about them
differently.

Then again, Pyro allows you to use two forms of XML serialization
on the wire (instead of pickle), which may or may not move it much closer
to SOAP and the likes. But there are other reasons for not wanting
a Pyro server exposed on the internet. Such as the lack of a good
security analisys of Pyro. Perhaps it suffers from similar holes
as XMLRPC until recently...

Furthermore there are practical issues such as having to
open a buch of new ports in your firewall. In my experience
this is very hard to get done, sadly, in contrast to just
exposing a "web-service" (in whatever form) on port 80 HTTP.

--Irmen

Fredrik Lundh · Feb 15, 2005

Irmen said:
What I know of it is that you had the possibility to arbitrarily follow
attribute paths, including attributes that should rather be kept hidden.

the bug had nothing to do with the XML-RPC protocol itself; it was a
weakness in the SimpleXMLRPCServer framework which used reflection
to automatically publish instance methods (if you use getattr repeatedly on
an instance, you can access a lot more than just attributes and methods...)

how do you publish "RPC endpoints" in Pyro?

</F>

Irmen de Jong · Feb 15, 2005

Fredrik said:
the bug had nothing to do with the XML-RPC protocol itself;

True, sorry for the confusion. I should have written it more precisely.

it was a
weakness in the SimpleXMLRPCServer framework which used reflection
to automatically publish instance methods (if you use getattr repeatedly on
an instance, you can access a lot more than just attributes and methods...)

how do you publish "RPC endpoints" in Pyro?

By reflection

return getattr(self,method) (*args,**keywords)
But Pyro currently treats attribute lookups differently.
It either ignores them completely (you have to enable remote-attribute
access explicitly) or returns attributes as 'local' objects.
What I mean is that you can access a remote attribute of a Pyro object,
but only one level deep. There is no repeated (nested) remote attribute
lookup. It's quite difficult to explain, if you want more details please
read the relevant section in the Pyro manual:
http://pyro.sourceforge.net/manual/7-features.html#nestedattrs
As far as I can see, Pyro is safe from the XMLRPCServer weakness.

Interestingly, I have been thinking for a long time to add nested
remote attribute lookup to Pyro. I know know that this is perhaps
not a really good idea

--Irmen

TPCServer and xdrlib	9	May 16, 2008
Is there a module to organize and parse command line parameters?	2	Sep 14, 2007
PyWart: PEP8: a seething cauldron of inconsistencies.	1	Jul 28, 2011
PyWart: PEP8: A cauldron of inconsistencies.	7	Jul 27, 2011
Yesterday is a canceled check	0	Jun 2, 2009
Yesterday is a canceled check	0	May 17, 2009
Experiences/guidance on teaching Python as a first programminglanguage	110	Dec 9, 2013
Desc of packages for XML processing	1	Dec 23, 2005

is there a safe marshaler?

Aahz

Fredrik Lundh

Fredrik Lundh

Fredrik Lundh

Fredrik Lundh

Irmen de Jong

Irmen de Jong

Irmen de Jong

Alan Kennedy

Irmen de Jong

Paul Rubin

Irmen de Jong

Paul Rubin

Irmen de Jong

Paul Rubin

Irmen de Jong

Paul Rubin

Irmen de Jong

Fredrik Lundh

Irmen de Jong

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads