L
Lenny Wintfeld
Hi
I'm attempting additions/changes to a Java program that (among other
things) uses XSLT to transform a large (96 Mb) XML file. It runs fine on
small XML files but generates OutOfMemory exceptions with large XML
files. I tried a simple punt of -Xmx512MB but that didn't work. In the
future, the input XML file may become considerably bigger than 96 MB, so
even if it did work, it probably would be putting off the inevitable to
some later date.
I'm using JavaSE 1.4.2_11 and the XSL/XML libraries that come with it.
The translation is from and to an xml file. The code I inherited looks a
lot like most of the example code you can find on the net for doing an
XSLT transformation. The relevant part is:
TransformerFactory tf = TransformerFactory.newInstance();
Transformer transformer = tf.newTransformer(xsltSource);
transformer.transform(new StreamSource(new StringReader(x)),
xsltDest);
where xsltSource is XSLT in the form of a string, generated by code
immediately above the snip shown, and the "x" is the input xml to be
transformed.
Things I tried:
1. I modified the above code to use a file instead of a String as the
XML to be transformed and a file for the XSLT that specifies the
transformation. It works fine with small XML input files but not with
large ones. I assume this code is using the DOM parser, and there is
simply not enough room in memory to house the input XML file.
2. Based on some old (years old) newsgroup posts I found, I tried using
a SAX equivalent of the above code, assuming that SAX takes in, parses
and transforms the input XML file either picemeal (maybe element by
element?) or that SAX uses the complete virtual memory of the computer.
But this code also results successful runs on small input XML files and
OutOfMemory errors on large ones. Here is a snip of the SAX code
(adapted from a chapter of Burke's "XSLT and Java" at the O'Reilly
website):
FileInputStream brXSLT = new FileInputStream ("C:/Documents and
Settings/Lenny/Desktop/OCCxsl.xsl");
// Set up the transformer
TransformerFactory transFact =
TransformerFactory.newInstance( );
SAXTransformerFactory saxTransFact =
(SAXTransformerFactory) transFact;
Source xsltSource = new StreamSource(brXSLT);
TransformerHandler transHand =
saxTransFact.newTransformerHandler(xsltSource);
// Set up input source
InputSource inxml = new InputSource(inXML);
SAXSource saxSource = new SAXSource(inxml);
// Set the destination for the XSLT transformation
transHand.setResult(new StreamResult(outXML));
// attach the XSLT processor to the XMLReader
String parserClass = "org.apache.crimson.parser.XMLReaderImpl";
XMLReader reader = XMLReaderFactory.createXMLReader(parserClass);
//parse the input file to an output file
reader.setContentHandler(transHand);
reader.parse(inxml);
I'm considering making a custom parser of the input XML file which
basically identifies elements of the input XML file and treats each
element as if it were a comlete document. e.g. send the content handler
ch.startDocument()
ch.startElement(..) // pass through the original element
ch.characters(..) // "
ch.endElement(..) // "
ch.endDocument()
for each element in the input XML file.
But being a newbie to XSLT, I don't know if this is worth pursuing, or
even if it would work; I'm hoping there are simpler, more strightforward
ways of accomplising the same thing and at a higher level. It does seem
pretty clumsy, even if it would work.
I found a reply on the web to someone who had a similar problem. To the
effect that a "SAX pipeline" should be used. But there was no further
elaboration, and so far, I haven't figured out what a SAX Pipeline is or
how it would help.
Any advice, references to examples, or actual examples would be
greatly appreciated.
Non-procedural programming is taking quite a bit of effort to
understand!
Thanks in advance for your help.
Lenny Wintfeld
ps - I've had this up on comp.lang.java.programmer for most of the day
with no replies. It bridges both specialties, that's why I'm trying
here.
I'm attempting additions/changes to a Java program that (among other
things) uses XSLT to transform a large (96 Mb) XML file. It runs fine on
small XML files but generates OutOfMemory exceptions with large XML
files. I tried a simple punt of -Xmx512MB but that didn't work. In the
future, the input XML file may become considerably bigger than 96 MB, so
even if it did work, it probably would be putting off the inevitable to
some later date.
I'm using JavaSE 1.4.2_11 and the XSL/XML libraries that come with it.
The translation is from and to an xml file. The code I inherited looks a
lot like most of the example code you can find on the net for doing an
XSLT transformation. The relevant part is:
TransformerFactory tf = TransformerFactory.newInstance();
Transformer transformer = tf.newTransformer(xsltSource);
transformer.transform(new StreamSource(new StringReader(x)),
xsltDest);
where xsltSource is XSLT in the form of a string, generated by code
immediately above the snip shown, and the "x" is the input xml to be
transformed.
Things I tried:
1. I modified the above code to use a file instead of a String as the
XML to be transformed and a file for the XSLT that specifies the
transformation. It works fine with small XML input files but not with
large ones. I assume this code is using the DOM parser, and there is
simply not enough room in memory to house the input XML file.
2. Based on some old (years old) newsgroup posts I found, I tried using
a SAX equivalent of the above code, assuming that SAX takes in, parses
and transforms the input XML file either picemeal (maybe element by
element?) or that SAX uses the complete virtual memory of the computer.
But this code also results successful runs on small input XML files and
OutOfMemory errors on large ones. Here is a snip of the SAX code
(adapted from a chapter of Burke's "XSLT and Java" at the O'Reilly
website):
FileInputStream brXSLT = new FileInputStream ("C:/Documents and
Settings/Lenny/Desktop/OCCxsl.xsl");
// Set up the transformer
TransformerFactory transFact =
TransformerFactory.newInstance( );
SAXTransformerFactory saxTransFact =
(SAXTransformerFactory) transFact;
Source xsltSource = new StreamSource(brXSLT);
TransformerHandler transHand =
saxTransFact.newTransformerHandler(xsltSource);
// Set up input source
InputSource inxml = new InputSource(inXML);
SAXSource saxSource = new SAXSource(inxml);
// Set the destination for the XSLT transformation
transHand.setResult(new StreamResult(outXML));
// attach the XSLT processor to the XMLReader
String parserClass = "org.apache.crimson.parser.XMLReaderImpl";
XMLReader reader = XMLReaderFactory.createXMLReader(parserClass);
//parse the input file to an output file
reader.setContentHandler(transHand);
reader.parse(inxml);
I'm considering making a custom parser of the input XML file which
basically identifies elements of the input XML file and treats each
element as if it were a comlete document. e.g. send the content handler
ch.startDocument()
ch.startElement(..) // pass through the original element
ch.characters(..) // "
ch.endElement(..) // "
ch.endDocument()
for each element in the input XML file.
But being a newbie to XSLT, I don't know if this is worth pursuing, or
even if it would work; I'm hoping there are simpler, more strightforward
ways of accomplising the same thing and at a higher level. It does seem
pretty clumsy, even if it would work.
I found a reply on the web to someone who had a similar problem. To the
effect that a "SAX pipeline" should be used. But there was no further
elaboration, and so far, I haven't figured out what a SAX Pipeline is or
how it would help.
Any advice, references to examples, or actual examples would be
greatly appreciated.
Non-procedural programming is taking quite a bit of effort to
understand!
Thanks in advance for your help.
Lenny Wintfeld
ps - I've had this up on comp.lang.java.programmer for most of the day
with no replies. It bridges both specialties, that's why I'm trying
here.