Best tool to convert html into XHTML for XML parsing?

S

Sebastien B.

I'm looking for the best tool to convert 'every day' html into proper XHTML
so that I can parse it as an XML document.

So far I've been using Tidylib to do this, but it doesn't handle things as
gracefully as browsers do. For example, take the page at
http://mail.yahoo.com - all browsers display it properly, but tidying it up
with Tidy (using the tool at http://cgi.w3.org/cgi-bin/tidy) will give a
result that renders quite differently than the original.

So are there any tools that would allow me to properly convert html into
proper xhtml, but without it producing output that would render differently
when viewed in a browser (ie. parse it as a browser would, and create proper
xhtml from that)?

I'm programming in C, if you need to know.

Thx,
Seb
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,992
Messages
2,570,220
Members
46,805
Latest member
ClydeHeld1

Latest Threads

Top