S
Sebastien B.
I'm looking for the best tool to convert 'every day' html into proper XHTML
so that I can parse it as an XML document.
So far I've been using Tidylib to do this, but it doesn't handle things as
gracefully as browsers do. For example, take the page at
http://mail.yahoo.com - all browsers display it properly, but tidying it up
with Tidy (using the tool at http://cgi.w3.org/cgi-bin/tidy) will give a
result that renders quite differently than the original.
So are there any tools that would allow me to properly convert html into
proper xhtml, but without it producing output that would render differently
when viewed in a browser (ie. parse it as a browser would, and create proper
xhtml from that)?
I'm programming in C, if you need to know.
Thx,
Seb
so that I can parse it as an XML document.
So far I've been using Tidylib to do this, but it doesn't handle things as
gracefully as browsers do. For example, take the page at
http://mail.yahoo.com - all browsers display it properly, but tidying it up
with Tidy (using the tool at http://cgi.w3.org/cgi-bin/tidy) will give a
result that renders quite differently than the original.
So are there any tools that would allow me to properly convert html into
proper xhtml, but without it producing output that would render differently
when viewed in a browser (ie. parse it as a browser would, and create proper
xhtml from that)?
I'm programming in C, if you need to know.
Thx,
Seb