J
Jakub Moskal
Hi,
I need to remove certain elements from the XML document tree based on
given parameters, e.g. I have a document with a structure as follows:
<country>
<city>
<street name="streetName" />
</city>
</country>
and I want to remove all <country> nodes for which the street name is
"someName" (I know the example is lame, but it exposes my problem).
Initially I used DOM and whenever I found <street> element with the
name attribute that I don't want, I removed such country using:
root.removeChild(node.getParent().getParent().getParent())).
It worked just fine with small files, but problems occurred when I
started dealing with docs that are 10-60MB in size. DOM loads the
entire document tree into the memory and this solution doesn't scale
at all - on most computers I get memory issues. I don't want to go
into giving JVM more memory, because I don't feel that this is the
direction in which I should go about it - it's not a universal
solution.
SAX parses the document in a serial fashion, I can't find a way to
remove the great-grand-node of the current element with it. Processing
XSLT works similar to DOM and memory issues occur.
Is there anything else out there that would help me solve this issue?
Would chopping the file into smaller pieces be a good solution?
Any help greatly appreciated,
Jakub.
I need to remove certain elements from the XML document tree based on
given parameters, e.g. I have a document with a structure as follows:
<country>
<city>
<street name="streetName" />
</city>
</country>
and I want to remove all <country> nodes for which the street name is
"someName" (I know the example is lame, but it exposes my problem).
Initially I used DOM and whenever I found <street> element with the
name attribute that I don't want, I removed such country using:
root.removeChild(node.getParent().getParent().getParent())).
It worked just fine with small files, but problems occurred when I
started dealing with docs that are 10-60MB in size. DOM loads the
entire document tree into the memory and this solution doesn't scale
at all - on most computers I get memory issues. I don't want to go
into giving JVM more memory, because I don't feel that this is the
direction in which I should go about it - it's not a universal
solution.
SAX parses the document in a serial fashion, I can't find a way to
remove the great-grand-node of the current element with it. Processing
XSLT works similar to DOM and memory issues occur.
Is there anything else out there that would help me solve this issue?
Would chopping the file into smaller pieces be a good solution?
Any help greatly appreciated,
Jakub.