T
tho_mica_l
I liked this quiz because it made me look into treetop, ragel and
some
other libraries I wanted to examine a little bit closer for quite a
while now. Anyway, for my (official) solution I took the easy road
and
rely on ruby to do the actual work.
My solution is ruby19 only, since I used the opportunity to explore
some
of the new regexp features. Since it uses eval(), there is a
possibility
for ruby code injection like the ultimatively relieving "#{`sudo rm -
rf
/`}". I think my solution catches such attacks though.
BTW, what do you all think should be the canonic output of the
following
JSON snippet:
json1 = <<JSON
{"a":2,"b":3.141,"TIME":"2007-03-14T11:52:40","c":"c","d":[1,"b",
3.14],"COUNT":666,"e":{"foo":"bar"},"foo":"B\\u00e4r","g":"\\u677e\
\u672c\\u884c\\u5f18","h":1000.0,"bar":"\\u00a9 \\u2260 \\u20ac!","i":
0.001,"j":"\\ud840\\udc01"}
JSON
I get conflicting results between various versions of my solution and
the official ruby19 parser with respect to these utf characters. This
snippet is taken (IIRC) from the ruby-json parser.
Regards,
Thomas.
#!/usr/bin/env ruby19
# Author:: Thomas Link (micathom AT gmail com)
# Created:: 2008-02-01.
# The string (in JSON format) is tokenized and pre-validated. Minor
# replacements are made in order to transform the JSON into valid
ruby
# input. The transformed string is then evaluated by ruby, which will
# throw an exception on syntactic errors.
#
# PROBLEMS:
# - The "parser" doesn't per se detect something like {"foo": 1,} or
# [1,2,] since this is valid in ruby. I'm not sure about JSON. Anyway,
I
# included another "invalid" clause in order to catch these cases of
# which I'm not sure how they are handled properly. If you want the
# parser to be more permissive, remove the first "invalid" clause.
#
# REFERENCES:
# http://json.org
# http://www.ietf.org/rfc/rfc4627.txt
class JSONParser
RXE = /
\[|\]|
\{|\}|
(?<name_sep>|
(?<invalid>,\s*[}\]])|
,|
(?<string>"([^"\\]++|\\(u[0-9a-fA-F]{4}|[bfnrt"\/\\]))*")|
-?(0|[1-9]\d*+)(\.\d++)?([Ee][+-]?\d++)?(?=\D|$)|
true|
false|
(?<null>null)|
[[:space:][:cntrl:]]++|
(?<invalid>.++)
/xmu
def parse(json)
ruby = json.gsub(RXE) do |t|
m = $~
if m['invalid'] then invalid(m['invalid'])
elsif m['null'] then 'nil'
elsif m['name_sep'] then '=>'
elsif m['string'] then m['string'].gsub(/#/, '\\\\#')
else
t
end
end
begin
return eval(ruby)
rescue Exception => e
invalid(json)
end
end
def invalid(string)
raise RuntimeError, 'Invalid JSON: %s' % string
end
end
if __FILE__ == $0
a = ARGV.join
p a
p JSONParser.new.parse(a)
end
some
other libraries I wanted to examine a little bit closer for quite a
while now. Anyway, for my (official) solution I took the easy road
and
rely on ruby to do the actual work.
My solution is ruby19 only, since I used the opportunity to explore
some
of the new regexp features. Since it uses eval(), there is a
possibility
for ruby code injection like the ultimatively relieving "#{`sudo rm -
rf
/`}". I think my solution catches such attacks though.
BTW, what do you all think should be the canonic output of the
following
JSON snippet:
json1 = <<JSON
{"a":2,"b":3.141,"TIME":"2007-03-14T11:52:40","c":"c","d":[1,"b",
3.14],"COUNT":666,"e":{"foo":"bar"},"foo":"B\\u00e4r","g":"\\u677e\
\u672c\\u884c\\u5f18","h":1000.0,"bar":"\\u00a9 \\u2260 \\u20ac!","i":
0.001,"j":"\\ud840\\udc01"}
JSON
I get conflicting results between various versions of my solution and
the official ruby19 parser with respect to these utf characters. This
snippet is taken (IIRC) from the ruby-json parser.
Regards,
Thomas.
#!/usr/bin/env ruby19
# Author:: Thomas Link (micathom AT gmail com)
# Created:: 2008-02-01.
# The string (in JSON format) is tokenized and pre-validated. Minor
# replacements are made in order to transform the JSON into valid
ruby
# input. The transformed string is then evaluated by ruby, which will
# throw an exception on syntactic errors.
#
# PROBLEMS:
# - The "parser" doesn't per se detect something like {"foo": 1,} or
# [1,2,] since this is valid in ruby. I'm not sure about JSON. Anyway,
I
# included another "invalid" clause in order to catch these cases of
# which I'm not sure how they are handled properly. If you want the
# parser to be more permissive, remove the first "invalid" clause.
#
# REFERENCES:
# http://json.org
# http://www.ietf.org/rfc/rfc4627.txt
class JSONParser
RXE = /
\[|\]|
\{|\}|
(?<name_sep>|
(?<invalid>,\s*[}\]])|
,|
(?<string>"([^"\\]++|\\(u[0-9a-fA-F]{4}|[bfnrt"\/\\]))*")|
-?(0|[1-9]\d*+)(\.\d++)?([Ee][+-]?\d++)?(?=\D|$)|
true|
false|
(?<null>null)|
[[:space:][:cntrl:]]++|
(?<invalid>.++)
/xmu
def parse(json)
ruby = json.gsub(RXE) do |t|
m = $~
if m['invalid'] then invalid(m['invalid'])
elsif m['null'] then 'nil'
elsif m['name_sep'] then '=>'
elsif m['string'] then m['string'].gsub(/#/, '\\\\#')
else
t
end
end
begin
return eval(ruby)
rescue Exception => e
invalid(json)
end
end
def invalid(string)
raise RuntimeError, 'Invalid JSON: %s' % string
end
end
if __FILE__ == $0
a = ARGV.join
p a
p JSONParser.new.parse(a)
end