K
killy971
I have been testing different libraries to process XSL transformations
on large XML files.
The fact is that I read a document from Intel, stating their library
(XSLT accelerator) was more twice faster than Apache Xalan, and was
designed to perform well on large XML files.
- http://isdlibrary.intel-dispatch.com/isd/10/wp_XSLT.PDF
- http://www.intel.com/cd/software/products/asmo-na/eng/366637.htm
Please note that their benchmark was only performed with small XML
files, under 200 ko.
I downloaded their evaluation version, configured it to work with 2
threads.
My computer specs are "dual core Intel processor (2.40GHZ), with 2GB
of ram".
I made tests on XML files that consists of a simple header and footer,
and the content is a sequence of the following piece of XML :
<log>
<timestamp>2008-01-02 00:00:08</timestamp>
<host name="mail-server">122.122.122.122</host>
<pid/>
<facility>mail</facility>
<priority>notice</priority>
<message> The key values: Key1=A, Key2=C, Key3=E and Key4=G.
000008351</message>
<application name="Benchmark">
<action name="misc" color="FFFFFF">
<param tag="key1" name="Key1">A</param>
<param tag="key2" name="Key2">C</param>
<param tag="key3" name="Key3">E</param>
<param tag="key4" name="Key4">G</param>
</action>
</application>
</log>
By repeating this pattern, I made test files with different file
sizes, from 1MB to 200MB, and tested the XSL transformation process
(with an XSL file transforming the XML file to an HTML file) with the
two libraries.
For little XML files (under 4MB), Intel XSLT accelerator performances
were better than Xalan performances, but for bigger files, XSLT
accelerator starts to be _very_ slow (exponential growth of the
processing time).
An extract of the result of my benchmark :
10MB XML file
- Xalan : 4.5 seconds (processing time)
- XSLT accelerator : 19.4 seconds
30MB XML file
- Xalan : 11.4 seconds
- XSLT accelerator : 191.2 seconds
50MB XML file
- Xalan : 18.4 seconds
- XSLT accelerator : 548.4 seconds (~30 times slower !)
Concerning Intel library, the only configurations options were the
working thread number, and the memory allocated to the process. I
tried different settings, but always obtained the same results (and
the best results were for 2 working threads).
I contacted Intel support, but didn't get any explanation on this
problem.
I would like to know if anyone has already experienced the same
problem, and if there is a way to get Intel XSLT accelerator to
outperform Xalan performances on large XML file processing (or at
least to get the same performances).
Has anyone ever used Intel XSLT accelerator for large XML file
processing ?
In the end, does anyone know a library that really outperform Xalan
performances on XSL transformations, especially concerning the
OutOfMemoryError ?
(with my XML format and XSL sheet, this error occurs when the memory
allocated to the JVM is less than about 3.4 times the size of the XML
file I try to process)
(The reason I am interested in Intel library is that it doesn't seem
to crash even when processing large xml files. Xalan does, when the
memory allocated to the JVM is not enough)
on large XML files.
The fact is that I read a document from Intel, stating their library
(XSLT accelerator) was more twice faster than Apache Xalan, and was
designed to perform well on large XML files.
- http://isdlibrary.intel-dispatch.com/isd/10/wp_XSLT.PDF
- http://www.intel.com/cd/software/products/asmo-na/eng/366637.htm
Please note that their benchmark was only performed with small XML
files, under 200 ko.
I downloaded their evaluation version, configured it to work with 2
threads.
My computer specs are "dual core Intel processor (2.40GHZ), with 2GB
of ram".
I made tests on XML files that consists of a simple header and footer,
and the content is a sequence of the following piece of XML :
<log>
<timestamp>2008-01-02 00:00:08</timestamp>
<host name="mail-server">122.122.122.122</host>
<pid/>
<facility>mail</facility>
<priority>notice</priority>
<message> The key values: Key1=A, Key2=C, Key3=E and Key4=G.
000008351</message>
<application name="Benchmark">
<action name="misc" color="FFFFFF">
<param tag="key1" name="Key1">A</param>
<param tag="key2" name="Key2">C</param>
<param tag="key3" name="Key3">E</param>
<param tag="key4" name="Key4">G</param>
</action>
</application>
</log>
By repeating this pattern, I made test files with different file
sizes, from 1MB to 200MB, and tested the XSL transformation process
(with an XSL file transforming the XML file to an HTML file) with the
two libraries.
For little XML files (under 4MB), Intel XSLT accelerator performances
were better than Xalan performances, but for bigger files, XSLT
accelerator starts to be _very_ slow (exponential growth of the
processing time).
An extract of the result of my benchmark :
10MB XML file
- Xalan : 4.5 seconds (processing time)
- XSLT accelerator : 19.4 seconds
30MB XML file
- Xalan : 11.4 seconds
- XSLT accelerator : 191.2 seconds
50MB XML file
- Xalan : 18.4 seconds
- XSLT accelerator : 548.4 seconds (~30 times slower !)
Concerning Intel library, the only configurations options were the
working thread number, and the memory allocated to the process. I
tried different settings, but always obtained the same results (and
the best results were for 2 working threads).
I contacted Intel support, but didn't get any explanation on this
problem.
I would like to know if anyone has already experienced the same
problem, and if there is a way to get Intel XSLT accelerator to
outperform Xalan performances on large XML file processing (or at
least to get the same performances).
Has anyone ever used Intel XSLT accelerator for large XML file
processing ?
In the end, does anyone know a library that really outperform Xalan
performances on XSL transformations, especially concerning the
OutOfMemoryError ?
(with my XML format and XSL sheet, this error occurs when the memory
allocated to the JVM is less than about 3.4 times the size of the XML
file I try to process)
(The reason I am interested in Intel library is that it doesn't seem
to crash even when processing large xml files. Xalan does, when the
memory allocated to the JVM is not enough)