[XSLT] Performance issue

G

Guest

Hi,

I hope my question is not too trivial. But I have an annoying performance
issue with xslt. I must create an xml file from multiple source files. I
made a catalog of these files:

catalog.xml:
<files>
<file>file1.xml</file>
<file>file2.xml</file>
</files>

each file has the same structure (about 2 KB):
<a>
<test>abcd</test>
</a>
(The tree is more complicated than that).

and then i process this catalog file by this xslt:

test.xsl:
<xsl:template match="files">
<results>
<xsl:apply-templates select="document(file)"/>
</results>
</xsl:template>

<xsl:template match="a">
<result a="{test}"/>
</xsl:template>
(it's the exact template, I don't do anything fancier within the template,
just selecting direct children of the root node of each file).

My problem is that my catalog file has more than 100,000 <file> elements.
xsltproc takes up to 700 MB to deal with the file and saxon gives up after
the first hundred elements with a java.lang.OutOfMemory error.

I have done something wrong?

Thanks for your help

J.
 
J

Joe Kesselman

My problem is that my catalog file has more than 100 said:
xsltproc takes up to 700 MB to deal with the file and saxon gives up after
the first hundred elements with a java.lang.OutOfMemory error.

For some moderately subtle reasons, XSLT may have to keep documents in
memory once loaded -- so you may be trying to load all 100,000 files
into memory at once.

Xalan added a (nonstandard) mechanism to explicitly release documents
after you're done with them, specifically to address this use case. I
don't know whether the others have something similar. It's hard to
recognize and handle this automatically in the processor.

If you can't use Xalan, you may need to process each of those files in a
separate stylesheet invocation, then merge the results.
 
D

David Carlisle

Jérôme Mainka said:
Hi,

I hope my question is not too trivial. But I have an annoying performance
issue with xslt. I must create an xml file from multiple source files. I
made a catalog of these files:

catalog.xml:
<files>
<file>file1.xml</file>
<file>file2.xml</file>
</files>

each file has the same structure (about 2 KB):
<a>
<test>abcd</test>
</a>
(The tree is more complicated than that).

and then i process this catalog file by this xslt:

test.xsl:
<xsl:template match="files">
<results>
<xsl:apply-templates select="document(file)"/>
</results>
</xsl:template>

<xsl:template match="a">
<result a="{test}"/>
</xsl:template>
(it's the exact template, I don't do anything fancier within the template,
just selecting direct children of the root node of each file).

My problem is that my catalog file has more than 100,000 <file> elements.
xsltproc takes up to 700 MB to deal with the file and saxon gives up after
the first hundred elements with a java.lang.OutOfMemory error.

I have done something wrong?

Thanks for your help

J.

saxon8 has an extension function saxon:discard-document() to address
this issue. Other processors may have similar functionality.
Without such an extension the XSLT spec (both 1 and 2) more or less
force the processors to hold your documents in memory (or copies on
disk) as they need to guarantee that you get the same doument back if
you call document() twice on the same URI, even if it has changed on the
server during the process. If you know that your documenets will not
change (or you don't care if they do) or you know you only call
document() once for each file, allowing the processor to free up the
memory once the document node goes out of scope is a big win...

David
 
G

Guest

David said:
saxon8 has an extension function saxon:discard-document() to address
this issue. Other processors may have similar functionality.

Thanks David for this advice. The function did the trick.

Cheers

J.
 
W

Wolfgang Krpelan

Hi,
xslt generally works with dom-trees in memory.
of course this limits documentsizes.
Cheers Wolfgang
 
J

Joseph Kesselman

Wolfgang said:
xslt generally works with dom-trees in memory.
of course this limits documentsizes.

Slight correction: XSLT generally works with an in-memory document
model, since stylesheets have full random access to the source document.
That model does NOT have to be an implementation of the W3C DOM APIs,
and in fact there are distinct advantages to using a model more
specifically tuned for the requirements of XPath/XSLT. For an example of
this, take a look at the DTM model used in Apache Xalan.

(Claimer: I'm the original author of DTM, along with some of the other
code in Xalan, so I'm biased. Then again, I'm also one of the DOM spec's
authors, and the original author of the DOM implementation used in
Xerces though that has evolved significantly since I worked on it. Make
up your own mind which direction I'm biased and by how much... <smile/>)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,994
Messages
2,570,223
Members
46,812
Latest member
GracielaWa

Latest Threads

Top