XSLT is a complex, slow solution. I'd be surprised if it's the
easiest way. I'm not up-to-date on JAVA APIs, but with C or C++
it's much easier than that, and I'd expect NekoHTML to offer a
good solution in Java.
I would rather say it is impossible with XSLT.
Html is, per definition not an XML language, and thus cannot possibly be
transformed with xslt, which requires valid XML input. If the html would
Nonsense. XSLT starts with a DOM tree, which can equally well represent
an HTML or an XHTML document. As soon as you've parsed the HTML to DOM,
your XSLT parser can pretend it was XML all along (and if you apply the
identity transform, you've just converted to XHTML).
But by the time you've generated the DOM, you've already done more
work than you needed to. SAX is just as easy.