OutOfMemoryException With XML DOM

J

Jason Cavett

My project uses XML for its data files and I am using a DOM parser
(the one native to the JDK) to parse out the files. DOM is especially
useful because the project lends itself to the use of trees.

Unfortunately, there tends to be a limit as to how big the XML files
can be before the DOM parser starts chewing up memory and, if the file
is big enough, I get an OutOfMemoryException. It's not from the
project specifically - it's instead a result of the enormous amount of
space DOM takes up.

I was wondering if there's a solution to this? I have read about SAX
a bit, and although it would fix the OOMEx. it would make it more
difficult to manage the tree structure. I could also increase the
amount of RAM available to the JRE, but I'd rather do that as a last
resort.

Does anybody have any other suggestions? Thanks.
 
A

Arne Vajhøj

Jason said:
My project uses XML for its data files and I am using a DOM parser
(the one native to the JDK) to parse out the files. DOM is especially
useful because the project lends itself to the use of trees.

Unfortunately, there tends to be a limit as to how big the XML files
can be before the DOM parser starts chewing up memory and, if the file
is big enough, I get an OutOfMemoryException. It's not from the
project specifically - it's instead a result of the enormous amount of
space DOM takes up.

I was wondering if there's a solution to this? I have read about SAX
a bit, and although it would fix the OOMEx. it would make it more
difficult to manage the tree structure. I could also increase the
amount of RAM available to the JRE, but I'd rather do that as a last
resort.

Does anybody have any other suggestions?

No.

-Xmx seems as the best way to go.

Arne
 
B

Boris Stumm

Jason said:
I was wondering if there's a solution to this? I have read about SAX
a bit, and although it would fix the OOMEx. it would make it more
difficult to manage the tree structure. I could also increase the
amount of RAM available to the JRE, but I'd rather do that as a last
resort.

Maybe have a look at XML databases. I am not really into this matter,
but I know some guys in my working group have one that is accessible with
DOM. There should be others, too. The problem will be to find one which
is stable enough for production use.
 
J

Jason Cavett

No.

-Xmx seems as the best way to go.

Arne

Haha. Alright. I was sort of hoping that wasn't the solution, but if
that's what has to be done, that's what I'll do.

Thanks.
 
S

Stanimir Stamenkov

Wed, 20 Feb 2008 17:19:32 -0800 (PST), /Jason Cavett/:
My project uses XML for its data files and I am using a DOM parser
(the one native to the JDK) to parse out the files. DOM is especially
useful because the project lends itself to the use of trees.

Unfortunately, there tends to be a limit as to how big the XML files
can be before the DOM parser starts chewing up memory and, if the file
is big enough, I get an OutOfMemoryException. It's not from the
project specifically - it's instead a result of the enormous amount of
space DOM takes up.

I was wondering if there's a solution to this? I have read about SAX
a bit, and although it would fix the OOMEx. it would make it more
difficult to manage the tree structure. I could also increase the
amount of RAM available to the JRE, but I'd rather do that as a last
resort.

Does anybody have any other suggestions? Thanks.

Great deal of the DOM is usually taken by whitespace in element
content (used only to format the source XML text). Depending on the
parser implementation you could supply a DTD to make the parser
ignore [1] the whitespace in element content, or use custom
filtering [2] as provided by the DOM Level 3 Load and Save APIs and
implementation part of the standard Java 1.5 framework.

The Xerces2 implementation (modified version of which is part of the
Sun's Java 1.5 distribution) is capable of ignoring whitespace in
element content when a suitable DTD is provided even in
non-validating mode. One could supply a DTD for documents which
don't have a DOCTYPE declaration setting an EntityResolver2 [3] (see
the getExternalSubset() method) instance to the DocumentBuilder [4].

All the above stuff is also available to Java 1.4 users simply by
plugging the latest Xerces2 jars into the classpath.

[1]
<http://java.sun.com/j2se/1.5.0/docs...#setIgnoringElementContentWhitespace(boolean)>
[2]
<http://java.sun.com/j2se/1.5.0/docs/api/org/w3c/dom/ls/LSParserFilter.html>
[3]
<http://java.sun.com/j2se/1.5.0/docs/api/org/xml/sax/ext/EntityResolver2.html>
[4]
<http://java.sun.com/j2se/1.5.0/docs...setEntityResolver(org.xml.sax.EntityResolver)>
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,982
Messages
2,570,186
Members
46,743
Latest member
WoodrowMea

Latest Threads

Top