Why is SAX faster than DOM?

  • Thread starter Ramon F Herrera
  • Start date
R

Ramon F Herrera

When I first started learning XML, after giving a cursory look to SAX,
I decided that I need it like a need a hole in the head, and have
been using DOM since. The event-based architecture scared me.

My initial impression was that SAX is good for huge files (ie, those
that do not fit in RAM).

Am I correct?

The reason I ask is because I would like to speed up my application
and was wondering whether what it needs is some SAX appeal. :)

-Ramon
 
B

Bjoern Hoehrmann

* Ramon F Herrera wrote in comp.text.xml:
When I first started learning XML, after giving a cursory look to SAX,
I decided that I need it like a need a hole in the head, and have
been using DOM since. The event-based architecture scared me.

My initial impression was that SAX is good for huge files (ie, those
that do not fit in RAM).

Am I correct?

The reason I ask is because I would like to speed up my application
and was wondering whether what it needs is some SAX appeal. :)

Usually when you have very large files they are comprised of relatively
small records. Wikipedia for instance offers database dumps in XML forms
that contain every revision of every article, and even when compressed
they tend to be many GB in size. But the data for each article, or for
each revision, tends to be very small. Processing everything with a SAX-
style interface would require you to code up a lot of logic to maintain
information about the document structure, so people have come up with
combinations of "SAX" and "DOM" style interfaces, "Reader" interfaces
are one example. With a typical "Reader" interface you might navigate
to an article or a revision in Wikipedia dumps, and then read all of it
into some structure where you can access the subtree in a DOM-like way.

That might be what you are looking for, but since you worry about speed,
rather than memory, it would help to have more details how your markup
looks like and what you do with it. Creating a DOM from a SAX stream is
not something that comes at a great cost normally, it's mainly memory
allocation and data copying in proportion of input size.
 
J

Joe Kesselman

My initial impression was that SAX is good for huge files (ie, those
that do not fit in RAM).

That's one of the things SAX is good for. It's also good for situations
where you want to load the data into custom datastructures tuned for the
needs of your application, when you wouldn't want to build a DOM first
just to recopy its contents into another representation.
The reason I ask is because I would like to speed up my application
and was wondering whether what it needs is some SAX appeal. :)

In other words, your question isn't "why" but "whether and when". Which
makes more sense.

*IF* you either don't need random access to the data (can process it as
it comes in), are able to easily filter out unneeded data (discarding it
immediately rather than processing it), and/or can and will create data
structures tuned for your specific application's needs -- and if you're
careful about your coding -- moving to SAX ***MAY*** help you.

Or it may be completely irrelevant, if that isn't where your application
is spending most of its time. Remember that infinite speedup of
something that accounts for 1% of your total runtime is only a 1%
speedup of the application. I ***STRONGLY*** recommend you get your
hands on some performance profiling tools, establish how much of your
application's time is actually being spent in parsing and DOM
construction and DOM navigation vs. other tasks, and only then decide
whether this is where you want to invest your effort.

--
Joe Kesselman,
http://www.love-song-productions.com/people/keshlam/index.html

{} ASCII Ribbon Campaign | "may'ron DaroQbe'chugh vaj bIrIQbej" --
/\ Stamp out HTML mail! | "Put down the squeezebox & nobody gets hurt."
 
J

japisoft

You can't compare SAX and DOM. SAX is under the parsing level therefore DOM
is for manipulating an XML document. DOM is mostly built with SAX system.
You can use it or ignore it building your own SAX code. However create your
own SAX handler is much complex and the final result could be much slower
than with a pure DOM usage.

Best regards,

A.Brillant
EditiX XML Editor - http://www.editix.com


"Ramon F Herrera" a écrit dans le message de groupe de discussion :
(e-mail address removed)...


When I first started learning XML, after giving a cursory look to SAX,
I decided that I need it like a need a hole in the head, and have
been using DOM since. The event-based architecture scared me.

My initial impression was that SAX is good for huge files (ie, those
that do not fit in RAM).

Am I correct?

The reason I ask is because I would like to speed up my application
and was wondering whether what it needs is some SAX appeal. :)

-Ramon
 
J

Joe Kesselman

You can't compare SAX and DOM. SAX is under the parsing level therefore
DOM is for manipulating an XML document. DOM is mostly built with SAX
system. You can use it or ignore it building your own SAX code. However
create your own SAX handler is much complex and the final result could
be much slower than with a pure DOM usage.

Very true. (Though some DOM parsers/loaders bypass SAX for greater
efficiency; I believe Xerces actually uses lower-level events to drive
its DOM construction.)

SAX does require that you manage all the state information, which may or
may not include building something like the DOM for part or all of the
document. How fast or slow that will be depends entirely on the problem
at hand and how good your code is.

If you've got time, doing it all via SAX may be worth trying. But it
isn't always going to be a magic bullet.

As I said in my other post, the first thing to do is to find out whether
this is even a significant part of your application's processing time.

--
Joe Kesselman,
http://www.love-song-productions.com/people/keshlam/index.html

{} ASCII Ribbon Campaign | "may'ron DaroQbe'chugh vaj bIrIQbej" --
/\ Stamp out HTML mail! | "Put down the squeezebox & nobody gets hurt."
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,982
Messages
2,570,186
Members
46,740
Latest member
JudsonFrie

Latest Threads

Top