Juergen said:
The usual answer is that XSLT needs a DOM.
Quibble: XSLT, in general, needs an in-memory model of the source
document. ("DOM" stands for Document Object Model, though it usually
refers to the W3C DOM which is in fact an object-based API for documents
and doesn't actually say anything about what the model behind that API
might be.)
My understanding is that files
larger than 500 MB are impractical to process
with XSLT.
Depends on how much memory you have in your machine and how fast your
memory swap system is, as well as how much locality of reference there
is in the stylesheet's execution.
XSLT processors which can automatically recognize opportunities to keep
less of the source document in memory are something of a "holy grail"
project -- we all know it's possible, but as far as I know nobody has
yet made that optimization work sufficiently generally or reliably.
Search the Apache Xalan mailing list's archives for the key words
"streaming", "pruning" and "filtering" to see some past discussion of
that. (In fact, when Xalan is processing from its database adapter it
often does operate in a streaming mode, counting on the user not to
write stylesheets which require wide random-access to the source.)
I know work is continuing on this in several research groups. Meanwhile,
depending on what you're doing, you may find that a hand-coded solution
can be made more efficient. XSLT is a good "high-level language" for XML
manipulation, but sometimes ya just gotta break down and write something
closer to the machine... at least, until the optimizers get smarter.