M
Marc Hoeppner
Hi (again, sort of)
I am still on my quest to write a program that parses a large XML file.
After having tried to do it in tree mode, I had to realize that the
performance was simply abysmal. So back to the drawing board. But, and
here is the thing...I could find a good straight-forward tutorial on how
to write a stream parser using REXML. The official tutorial is pretty
much mute on that part and the only other example I found (or rather was
pointed to -
http://www.janvereecken.com/2007/4/11/event-driven-xml-parser-in-ruby)
was way too complex for someone like me who is still pretty much a
beginner in ruby.
So, what I am looking for is either a brief description of how to write
an event driven parser or else a link to a good and simple tutorial.
For the former, this is what the parser should do:
Find the element "Gene-ref", allow me to access its children and then
close and repeat for the next "Gene-ref entry. In xml code, that would
look like
<something here>
<Gene-ref>
<name>...</name>
<start>...</start>
<end>...</end>
</Gen-ref>
<Gen-ref>
...
I understand that I need a Listener class, like
classListener
def tag_start(name, attrs)
end
def text(text)
end
def tag_end(name)
end
end
But I havent really worked with classes all that much and maybe someone
could just put down the basics for the script from where I can start
experimenting? Would be very much appreciated. Let's say for each
element "Gene-ref" I want to puts the name, start and end in one line,
or something along those lines.
Cheers,
Marc
I am still on my quest to write a program that parses a large XML file.
After having tried to do it in tree mode, I had to realize that the
performance was simply abysmal. So back to the drawing board. But, and
here is the thing...I could find a good straight-forward tutorial on how
to write a stream parser using REXML. The official tutorial is pretty
much mute on that part and the only other example I found (or rather was
pointed to -
http://www.janvereecken.com/2007/4/11/event-driven-xml-parser-in-ruby)
was way too complex for someone like me who is still pretty much a
beginner in ruby.
So, what I am looking for is either a brief description of how to write
an event driven parser or else a link to a good and simple tutorial.
For the former, this is what the parser should do:
Find the element "Gene-ref", allow me to access its children and then
close and repeat for the next "Gene-ref entry. In xml code, that would
look like
<something here>
<Gene-ref>
<name>...</name>
<start>...</start>
<end>...</end>
</Gen-ref>
<Gen-ref>
...
I understand that I need a Listener class, like
classListener
def tag_start(name, attrs)
end
def text(text)
end
def tag_end(name)
end
end
But I havent really worked with classes all that much and maybe someone
could just put down the basics for the script from where I can start
experimenting? Would be very much appreciated. Let's say for each
element "Gene-ref" I want to puts the name, start and end in one line,
or something along those lines.
Cheers,
Marc