HTML to XML Conversion - Difficulty with Tidy and TagSoup

E

Eric

I'm trying to convert html pages to xml and I'm having some difficulty
with the folowing:

1. I try to use Tidy but the html that I'm trying to convert to xhtml
has too many errors and so I spend a lot of time trying to "fix" the
html before running it through Tidy. I'm using Tidy with -asxml

2. I've tried using TagSoup with JDOM but the SAXBuilder internally
tries to set the namespace prefixes and TagSoup does not support that
internal feature.

I really would appreciate help from someone who has delt with having
to crank out lots of html from poorly formatted html. I appreciate
any help! ;)

-Eric
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,230
Members
46,818
Latest member
Brigette36

Latest Threads

Top