[Note: parts of this message were removed to make it a legal post.]
In honor of that, this week's Ruby Quiz is to write a parser for JSON.
Here is another solution of mine:
http://pastie.caboo.se/147505
In this one, I just made a fast hand-built recursive-descent/LL(1) parser.
This is the kind of parser that I'm trying to get my 'grammar' package to
approach (using lots of optimizations). It uses no Regexp or ruby eval
(both of which have compiled C to help speed). And yet, it is the fastest
pure-ruby JSON parser we've seen (see the recursive descent line below):
ch/s author/gem
---- ----------
- Pawel Radecki (RE, mismatch)
3214 Justin Ethier (RE lexer + ruby eval, fixed number parsing)
4054 Eric Mahurin (Grammar0, no lexer, no parser generation)
4078 Eric I (Treetop, unicode broken)
6534 oksteev (Treetop, mismatches in benchmark)
8313 Clifford Heath (Treetop, had to remove handling of "\/")
17320 Alexander Stedile (RE)
54586 Eric Mahurin (Grammar, no lexer, v0.5)
137989 Paolo Bonzini (RE)
166041 Thomas Link (RE lexer + ruby eval, ruby 1.9 results)
220289 json
223486 Eric Mahurin (Grammar, no lexer, unreleased)
224823 fjson (uses C extensions)
333368 Thomas Link & Paolo Bonzini (RE + eval, unicode broken)
388670 Eric Mahurin (hand-built recursive descent)
553081 Eric Mahurin (Grammar, no lexer, unreleased, w/ ruby2cext)
1522250 json (w/ C extensions)
#
# JSON hand-built recursive descent/LL(1) parser, by Eric Mahurin
#
require 'stringio'
class JSONParser
def parse(s)
@next = (@io=StringIO.new(s)).getc
ws
value(out=[])
ws
raise("EOF expected") if @next
raise(out.inspect) unless out.length==1
out[0]
end
def error(expected, found)
raise("expected #{expected}, found #{found ? ("'"<<found<<?\') :
'EOF'}")
end
def value(out)
if ?\[.equal?(@next)
# array
@
[email protected]
ws
a = []
unless ?\].equal?(@next)
value(a)
ws
until ?\].equal?(@next)
?\,.equal?(@next) ? (@
[email protected]) : error("','",
@next)
ws
value(a)
ws
end
end
@next = @io.getc
out << a
elsif ?\{.equal?(@next)
# object
@
[email protected]
ws
h = {}
unless ?\}.equal?(@next)
?\".equal?(@next) ? string(kv=[]) : error("a string", @next)
ws
?\:.equal?(@next) ? (@
[email protected]) : error("':'", @next)
ws
value(kv)
ws
h[kv[0]] = kv[1]
until ?\}.equal?(@next)
?,.equal?(@next) ? (@
[email protected]) : error("','",
@next)
ws
?\".equal?(@next) ? string(kv.clear) : error("a string",
@next)
ws
?\:.equal?(@next) ? (@
[email protected]) : error("':'",
@next)
ws
value(kv)
ws
h[kv[0]] = kv[1]
end
end
@next = @io.getc
out << h
elsif (?a..?z)===(@next)
# boolean
(s="")<<@next
@next = @io.getc
while (?a..?z)===(@next)
s<<@next;@
[email protected]
end
out << case s
when "true" then true
when "false" then false
when "null" then nil
else error("'true' or 'false' or 'null'", s)
end
elsif ?\".equal?(@next)
string(out)
else
# number
n = ""
(n<<@next;@
[email protected]) if ?-.equal?(@next)
?0.equal?(@next) ? (n<<@next;@
[email protected]) : digits(n)
(?..equal?(@next) ?
(n<<@next;@
[email protected];digits(n);exp(n);true) :
exp(n)) ?
(out << n.to_f) :
(out << n.to_i)
end
end
# Flattening any of the methods below will improve performance further
def ws
@next = @io.getc while (case @next;when ?\s,?\t,?\n,?\r;true;end)
end
def digits(out)
(?0..?9)===@next ? (out<<@next;@
[email protected]) : error("a digit",
@next)
while (?0..?9)===@next; (out<<@next;@
[email protected]); end
true
end
def exp(out)
(case @next;when ?e,?E;true;end) ? (out<<@next;@
[email protected]) :
return
(out<<@next;@
[email protected]) if (case @next;when ?-,?+;true;end)
digits(out)
end
def string(out)
# we've already verified the starting "
@
[email protected]
s = ""
until ?\".equal?(@next)
if ?\\.equal?(@next)
@next = @io.getc
case @next
when ?\",?\\,?\/ then (s<<@next;@
[email protected])
when ?b then (s<<?\b;@
[email protected])
when ?f then (s<<?\f;@
[email protected])
when ?n then (s<<?\n;@
[email protected])
when ?r then (s<<?\r;@
[email protected])
when ?t then (s<<?\t;@
[email protected])
when ?u
@next = @io.getc
u = ""
4.times {
case @next
when ?0..?9, ?a..?f, ?A..?F
u<<@next;@
[email protected]
else
error("a hex character", @next)
end
}
s << u.to_i(16)
else
error("a valid escape", @next)
end
else
error("a character", @next) unless @next
s<<@next;@
[email protected]
end
end
@next = @io.getc
out << s
end
end