D
Darren
Hi all,
I have an issue relating to Xalan-C performance that I need some help
on, the problem is that I have a large document that I need to perform
some very simple transformation on
1) Sort
and
2) Remove 1st Level of the document hierarchy.
The Document is structured like the following:
<Batch>
<Batch>
<ProductTypeA>
<ProductID>009466</ProductID>
<!-- ... other elements -->
</ProductTypeA>
</Batch>
<Batch>
<ProductTypeB>
<ProductID>002700</ProductID>
<!-- ... other elements -->
</ProductTypeB>
<ProductTypeA>
<ProductID>002600</ProductID>
<!-- ... other elements -->
</ProductTypeA>
</Batch>
</Batch>
Within the real document I have over 500,000 ProductTypeX records, and
I want the document to come out like the following:
<Batch>
<ProductTypeA>
<ProductID>002600</ProductID>
<!-- ... other elements -->
</ProductTypeA>
<ProductTypeB>
<ProductID>002700</ProductID>
<!-- ... other elements -->
</ProductTypeB>
<ProductTypeA>
<ProductID>009466</ProductID>
<!-- ... other elements -->
</ProductTypeA>
</Batch>
The Problem is that executing the following XSLT using XalanTransform
the process takes nearly 2 hours (not including the sort)! However if I
manually remove the intermediate <Batch> tags the process only takes 5
minutes (including the sort).
<?xml version='1.0' ?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">
<xslutput omit-xml-declaration="yes" />
<xsl:template match="Batch">
<Batch>
<xsl:apply-templates select="Batch/*">
<!--<xsl:sort select="ProductID"/>-->
</xsl:apply-templates>
</Batch>
</xsl:template>
<xsl:template match="/ | @* | node()">
<xsl:copy>
<xsl:apply-templates select="@* | node()" />
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
So the version that takes 5 minutes differs from the above by changing
the line:
<xsl:apply-templates select="Batch/*">
To:
<xsl:apply-templates select="*">
And un-commenting the sort.
Can anyone help?
P.S.
1) The file I get is generated by a 3rd Party system and is there
format not in my control
2) The File is originally all in 1 physical line, and far too big for
something like SED to process initially.
Darren
I have an issue relating to Xalan-C performance that I need some help
on, the problem is that I have a large document that I need to perform
some very simple transformation on
1) Sort
and
2) Remove 1st Level of the document hierarchy.
The Document is structured like the following:
<Batch>
<Batch>
<ProductTypeA>
<ProductID>009466</ProductID>
<!-- ... other elements -->
</ProductTypeA>
</Batch>
<Batch>
<ProductTypeB>
<ProductID>002700</ProductID>
<!-- ... other elements -->
</ProductTypeB>
<ProductTypeA>
<ProductID>002600</ProductID>
<!-- ... other elements -->
</ProductTypeA>
</Batch>
</Batch>
Within the real document I have over 500,000 ProductTypeX records, and
I want the document to come out like the following:
<Batch>
<ProductTypeA>
<ProductID>002600</ProductID>
<!-- ... other elements -->
</ProductTypeA>
<ProductTypeB>
<ProductID>002700</ProductID>
<!-- ... other elements -->
</ProductTypeB>
<ProductTypeA>
<ProductID>009466</ProductID>
<!-- ... other elements -->
</ProductTypeA>
</Batch>
The Problem is that executing the following XSLT using XalanTransform
the process takes nearly 2 hours (not including the sort)! However if I
manually remove the intermediate <Batch> tags the process only takes 5
minutes (including the sort).
<?xml version='1.0' ?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">
<xslutput omit-xml-declaration="yes" />
<xsl:template match="Batch">
<Batch>
<xsl:apply-templates select="Batch/*">
<!--<xsl:sort select="ProductID"/>-->
</xsl:apply-templates>
</Batch>
</xsl:template>
<xsl:template match="/ | @* | node()">
<xsl:copy>
<xsl:apply-templates select="@* | node()" />
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
So the version that takes 5 minutes differs from the above by changing
the line:
<xsl:apply-templates select="Batch/*">
To:
<xsl:apply-templates select="*">
And un-commenting the sort.
Can anyone help?
P.S.
1) The file I get is generated by a 3rd Party system and is there
format not in my control
2) The File is originally all in 1 physical line, and far too big for
something like SED to process initially.
Darren