parsing XML files with SAX

M

mike henkins

hi,

I've been looking through the various XML parsers API available and I have
decided to use the SAX parser. Probably not the best of choices but I think
it can do the job. What is the best way to parse an XML file using the SAX
parser ? I have seen examples where they store each element tag in java bean
classes. I am not sure this is a good way for my XML file which looks like
this:

<parent>
<node1>
<child1>AAA</child1>
<grandchild1>BBB</grandchild1>
<grandchild2>
<anything>CCC</anything>
</grandchild2>
<child2>DDD<</child2>
<child3>DDD<</child3>
</node1>
<node2>
<child1>AAA<</child1>
<grandchild1>BBB</grandchild1>
<grandchild2>
<anything>CCC</anything>
</grandchild2>
<child2>DDD<</child2>
<child3>DDD<</child3>
</node2>
</parent>

I have to get the value of the tag "anything" in node1, node2 etc ..., store
the value of child3 in a database etc ...

Does anyone have any experience or advices regarding the fastest way to do
that using SAX (or any other parser) ?

Thanks !
 
W

William Park

mike henkins said:
hi,

I've been looking through the various XML parsers API available and I have
decided to use the SAX parser. Probably not the best of choices but I think
it can do the job. What is the best way to parse an XML file using the SAX
parser ? I have seen examples where they store each element tag in java bean
classes. I am not sure this is a good way for my XML file which looks like
this:

<parent>
<node1>
<child1>AAA</child1>
<grandchild1>BBB</grandchild1>
<grandchild2>
<anything>CCC</anything>
</grandchild2>
<child2>DDD<</child2>
<child3>DDD<</child3>
</node1>
<node2>
<child1>AAA<</child1>
<grandchild1>BBB</grandchild1>
<grandchild2>
<anything>CCC</anything>
</grandchild2>
<child2>DDD<</child2>
<child3>DDD<</child3>
</node2>
</parent>

I have to get the value of the tag "anything" in node1, node2 etc ..., store
the value of child3 in a database etc ...

Does anyone have any experience or advices regarding the fastest way to do
that using SAX (or any other parser) ?

Try
http://home.eol.ca/~parkw/index.html#expat
which is shell interface to Expat XML parser.

--
William Park <[email protected]>, Toronto, Canada
ThinFlash: Linux thin-client on USB key (flash) drive
http://home.eol.ca/~parkw/thinflash.html
BashDiff: Super Bash shell
http://freshmeat.net/projects/bashdiff/
 
M

Mukul Gandhi

I would personally prefer DOM parsing in this case. DOM gives us a neat
object oriented method to read elements and attributes, as well as
modify the document tree.

With SAX approach, I'll have to set up the whole parser call back
infrastructure in my application just to read a single element node,
which somehow does'nt appeal to me!

Another advantage with DOM is, that I can easily store element and
attribute properties in Java beans or in other kind of container
objects easily. With SAX I'll find that difficult to do.

I'll prefer SAX, if I have to ready the whole (or nearly whole)
document serially in one pass.

Regards,
Mukul
 
?

=?ISO-8859-1?Q?J=FCrgen_Kahrs?=

Mukul said:
With SAX approach, I'll have to set up the whole parser call back
infrastructure in my application just to read a single element node,
which somehow does'nt appeal to me!

Is it really so complicated to set up "the whole parser call back
infrastructure" ? Even in Java this should not be much more text
than a comparable DOM solution.

Besides Java, there are scripting languages based upon
the SAX approach. In these languages, reading a single element node
can be done with a one-line script. The larger your file is,
the greater the speed advantage of a SAX-based script.
 
M

Mukul Gandhi

Thanks for telling more about SAX. Which scripting languages have SAX
bindings? Can you please provide some references?

Regards,
Mukul
 
?

=?ISO-8859-1?Q?J=FCrgen_Kahrs?=

Mukul said:
Thanks for telling more about SAX. Which scripting languages have SAX
bindings? Can you please provide some references?

GNU Awk and bash have XML extensions which are not
yet merged into the official source code:

http://home.vrweb.de/~juergen.kahrs/gawk/XML/
http://home.eol.ca/~parkw/index.html#expat

Perl is probably the script language that has
the longest tradition of supporting XML files.
Python, Ruby etc. also have some kind of XML
support. Recently, there has been an ECMA proposal
for extending JavaScript with functions for processing
XML data. Use Google to find out more.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,990
Messages
2,570,211
Members
46,796
Latest member
SteveBreed

Latest Threads

Top