Slow performance with Specific XSLT

Darren · Jul 15, 2005

Hi all,

I have an issue relating to Xalan-C performance that I need some help
on, the problem is that I have a large document that I need to perform
some very simple transformation on
1) Sort
and
2) Remove 1st Level of the document hierarchy.

The Document is structured like the following:

<Batch>
<Batch>
<ProductTypeA>
<ProductID>009466</ProductID>

</ProductTypeA>
</Batch>
<Batch>
<ProductTypeB>
<ProductID>002700</ProductID>

</ProductTypeB>
<ProductTypeA>
<ProductID>002600</ProductID>

</ProductTypeA>
</Batch>
</Batch>

Within the real document I have over 500,000 ProductTypeX records, and
I want the document to come out like the following:

<Batch>
<ProductTypeA>
<ProductID>002600</ProductID>

</ProductTypeA>
<ProductTypeB>
<ProductID>002700</ProductID>

</ProductTypeB>
<ProductTypeA>
<ProductID>009466</ProductID>

</ProductTypeA>
</Batch>

The Problem is that executing the following XSLT using XalanTransform
the process takes nearly 2 hours (not including the sort)! However if I
manually remove the intermediate <Batch> tags the process only takes 5
minutes (including the sort).

<?xml version='1.0' ?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">
<xsl

utput omit-xml-declaration="yes" />
<xsl:template match="Batch">
<Batch>
<xsl:apply-templates select="Batch/*">

</xsl:apply-templates>
</Batch>
</xsl:template>
<xsl:template match="/ | @* | node()">
<xsl:copy>
<xsl:apply-templates select="@* | node()" />
</xsl:copy>
</xsl:template>
</xsl:stylesheet>

So the version that takes 5 minutes differs from the above by changing
the line:
<xsl:apply-templates select="Batch/*">
To:
<xsl:apply-templates select="*">

And un-commenting the sort.

Can anyone help?

P.S.
1) The file I get is generated by a 3rd Party system and is there
format not in my control
2) The File is originally all in 1 physical line, and far too big for
something like SED to process initially.

Darren

=?ISO-8859-1?Q?J=FCrgen_Kahrs?= · Jul 15, 2005

Darren said:
2) The File is originally all in 1 physical line, and far too big for
something like SED to process initially.

There are other tools aside from sed which can do this.
A standard POSIX command for doing this is tr.
But AWK can also do this (you may re-define AWK's line-separator RS).

Darren · Jul 15, 2005

Thanks for the Reply!

I did initially use the command " awk '{gsub(/
/,"\n");print}'
" to break the file into lines, however the file was left slightly
corrupted by this -- I think the AIX version of awk used couldn't cope
with the size of the file).

and TR's performance was non existant when trying to do an equivilant.

=?ISO-8859-1?Q?J=FCrgen_Kahrs?= · Jul 15, 2005

Darren said:
I did initially use the command " awk '{gsub(/
/,"\n");print}'
" to break the file into lines, however the file was left slightly
corrupted by this -- I think the AIX version of awk used couldn't cope
with the size of the file).

I forgot to say about AWK: Whenever you have insane lengths for
lines or fields or anything else, you should at least try
GNU Awk. GNU Awk is well-known for not having limitations
on line length.

and TR's performance was non existant when trying to do an equivilant.

Interesting.

David Carlisle · Jul 15, 2005

Don't know why it should be so slow, (what does saxon do for example)
You could probably speed it up a bit by using copy-of rather than a
recursive copying template, since below a certain level you just want to
copy whole branches:
<xsl:template match="/Batch">
<Batch>
<xsl:for-each select="Batch/*">
<xsl:copy-of select="."/>

</xsl:for-each>
</Batch>
</xsl:template>

David

Darren · Jul 15, 2005

Thanks for this - I'm trying it now ... however, it's already been
running over an hour -- so it doesn't look good!

I will then retry and change the <xsl:for-each select="Batch/*"> with
a <xsl:for-each select="*"> - just to see the time difference.

Dimitre Novatchev · Jul 16, 2005

Darren said:
Thanks for this - I'm trying it now ... however, it's already been
running over an hour -- so it doesn't look good!

I will then retry and change the <xsl:for-each select="Batch/*"> with
a <xsl:for-each select="*"> - just to see the time difference.

Probably you do not have sufficient memory.

I produced 500000 records of the type you describe and both transformations
provided by you (slightly corrected) take about a minute with MSXML4.

I have a 3GHz Pentium 4 with 2GB of RAM.

The correction to your first transformation is the following:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">
<xsl

utput omit-xml-declaration="yes" />

<xsl:template match="/*">
<Batch>
<xsl:apply-templates select="Batch/*">

</xsl:apply-templates>
</Batch>
</xsl:template>

<xsl:template match="/ | @* | node()">
<xsl:copy>
<xsl:apply-templates select="@* | node()" />
</xsl:copy>
</xsl:template>
</xsl:stylesheet>

It avoids trying:

<xsl:apply-templates select="Batch/*">

when the current node is a "Batch"

and this probably saves some time.

The second transformation is:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">
<xsl

utput omit-xml-declaration="yes" />

<xsl:template match="/*">
<Batch>
<xsl:for-each select="Batch/*">
<xsl:sort select="ProductID"/>

<xsl:copy-of select="."/>
</xsl:for-each>
</Batch>
</xsl:template>
</xsl:stylesheet>

Here I use xsl:copy-of instead of the potentially deep-recursive identity
rule.

Cheers,
Dimitre Novatchev.

Darren · Jul 18, 2005

Thanks for your reply - I have tested with your suggested changes and
still in the nearly 2 hours bracket. I don't think the issue is
related to Memory as the host in question has 24GB of RAM - although it
an old (700mhz RISC based processor).

Reworking the process into the following steps

1. Use a simple <xsl:copy-of select="."/> to format document into
readable XML (break into multi-lines)
XalanTransform Products.xml copy.xslt new.xml

2. Use AWK to remove Batch Tags
awk '{gsub("</*Batch>","");print}' <new.xml >out.xml

3 re-insert outer Batch Tags (real 0m22.03s)
print "print \<Batch\>\ncat out.xml\nprint \</Batch\>\n" | sh >new.xml

4. Sort the file!
XalanTransform new.xml sort.xslt out.xml

reduces the processing time to 8.5 minutes - however far less eligant!

Merging tables with XSLT	2	May 29, 2009
XML to XML using XSLT	1	Aug 18, 2011
xslt and namespaces	2	Nov 5, 2010
[XSLT] Performance issue	5	Nov 12, 2007
XSLT filter nodes containing attributes with known values	0	Sep 26, 2009
Transforming xhtml with xslt	6	May 28, 2007
Tricky XSLT question involving variables, serializing, processinginstructions	2	Apr 2, 2009
Slow running XSLT: Any help appreciated	5	Jul 10, 2007

Slow performance with Specific XSLT

Darren

=?ISO-8859-1?Q?J=FCrgen_Kahrs?=

Darren

=?ISO-8859-1?Q?J=FCrgen_Kahrs?=

David Carlisle

Darren

Dimitre Novatchev

Darren

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads