[Irmen de Jong]
[Alan Kennedy]
[Irmen de Jong]
> Looks very interesting indeed, but in what way would this be
> more secure than say, pickle or marshal?
> A quick glance at some docs reveal that they are using eval
> to process the data... ouch.
Well, the python JSON codec provided appears to use eval, which might
make it *seem* unsecure.
http://www.json-rpc.org/pyjsonrpc/index.xhtml
But a more detailed examination of the code indicates, to this reader at
least, that it can be made completely secure very easily. The designer
of the code could very easily have not used eval, and possibly didn't do
so simply because he wasn't thinking in security terms.
The codec uses tokenize.generate_tokens to split up the JSON string into
tokens to be interpreted as python objects. tokenize.generate_tokens
generates a series of textual name/value pairs, so nothing insecure
there: the content of the token/strings is not executed.
Each of the tokens is then passed to a "parseValue" function, which is
defined thusly:
#===================
def parseValue(self, tkns):
(ttype, tstr, ps, pe, lne) = tkns.next()
if ttype in [token.STRING, token.NUMBER]:
return eval(tstr)
elif ttype == token.NAME:
return self.parseName(tstr)
elif ttype == token.OP:
if tstr == "-":
return - self.parseValue(tkns)
elif tstr == "[":
return self.parseArray(tkns)
elif tstr == "{":
return self.parseObj(tkns)
elif tstr in ["}", "]"]:
return EndOfSeq
elif tstr == ",":
return SeqSep
else:
raise "expected '[' or '{' but found: '%s'" % tstr
else:
return EmptyValue
#===================
As you can see, eval is *only* called when the next token in the stream
is either a string or a number, so it's really just a very simple code
shortcut to get a value from a string or number.
If one defined the function like this (not tested!), to remove the eval,
I think it should be safe.
#===================
default_number_type = float
#default_number_type = int
def parseValue(self, tkns):
(ttype, tstr, ps, pe, lne) = tkns.next()
if ttype in [token.STRING]:
return tstr
if ttype in [token.NUMBER]:
return default_number_type(tstr)
elif ttype == token.NAME:
return self.parseName(tstr)
elif ttype == token.OP:
if tstr == "-":
return - self.parseValue(tkns)
elif tstr == "[":
return self.parseArray(tkns)
elif tstr == "{":
return self.parseObj(tkns)
elif tstr in ["}", "]"]:
return EndOfSeq
elif tstr == ",":
return SeqSep
else:
raise "expected '[' or '{' but found: '%s'" % tstr
else:
return EmptyValue
#===================
The only other use of eval is also only for string types, i.e. in the
parseObj function:
#===================
def parseObj(self, tkns):
obj = {}
nme =""
try:
while 1:
(ttype, tstr, ps, pe, lne) = tkns.next()
if ttype == token.STRING:
nme = eval(tstr)
(ttype, tstr, ps, pe, lne) = tkns.next()
if tstr == ":":
v = self.parseValue(tkns)
# Remainder of this function elided
#===================
Which could similarly be replaced with direct use of the string itself,
rather than eval'ing it. (Although one might want to look at encoding
issues: I haven't looked at JSON-RPC enough to know how it proposes to
handle string encodings.)
So I don't think there any serious security issues here: the
"simplicity" of the JSON grammar is what attracted me to it in the first
place, especially since there are already robust and efficient lexers
and parsers already available built-in to python and javascript (and
javascript interpreters are getting pretty ubiquitous these days).
And it's certainly the case that if the only available python impl of
JSON/RPC is not secure, it is possible to write one that is both
efficient and secure.
Hopefully there isn't some glaring security hole that I've missed:
doubtless I'll find out real soon ;-) Gotta love full disclosure.
regards,