HTML to XML

2peachy · Jan 14, 2004

hello... I am brand new to this...
I did a search with no results...

how do you convert an html page into an xml page

2peach

Johannes Koch · Jan 14, 2004

2peachy said:
hello... I am brand new to this...
I did a search with no results...

how do you convert an html page into an xml page ?

For valid HTML documents you can use sx from OpenSP. Or use tidy to
output XHTML.

Andy Dingley · Jan 14, 2004

how do you convert an html page into an xml page ?

How long is a piece of string ?

How many pages are you dealing with ? Is this a one-off "I want to
convert my site" or a regular "I want to scrape stock prices from
another site and make them into an XML feed" ?

What's "HTML" ? Is this well-coded valid HTML 3.2 / 4.0, XHTML or
some tag-soup written by a M$oft tool ? What happens if it's not
valid ? Can your code crash, abandon the page, scream for human help,
or must it make a best-attempt ?

Can you avoid this altogether ? Can you obtain the content by some
friendlier means, such as RSS, direct access to the database, or some
other source ?

Why do you want to do it ? There are no "XML pages", there are only
XML documents. If you want to end up with "a web page" at the end of
it, then raw XML isn't enough of a finishing point, you need to take
it further.

What is "XML" ? What DTD or Schema are you aiming at ?

For one-offs, use Dave Raggett's Tidy (easily obtained via HTMLKit).
Even if you're not looking for an XHTML output, Tidy can be an
excellent pre-processor for sorting out ugly Tag Soup.

For screen-scrapes, use your favourite scripting language (Perl is
always a good start, but you could use Python or even JavaScript) and
use someone else's HTML parser.

RSS 1.0 is a good XML Schema to target at for generic screen scraping,
even if you don;t think your content is "relevant" to a newseed (but
RSS 0.92 isn't)

Batch Convert HTML to UTF-8 Files	2	Oct 2, 2023
HTML Anchor tag not working	2	Dec 15, 2020
How to push data from one HTML page to another	4	Jan 3, 2024
I want to Display Excel As HTML In js	2	Feb 24, 2023
Im having some issues with my html website	1	Jun 4, 2024
Read xml column inside csv file with Python	0	Jul 23, 2022
HTML Aligning social media icons	2	Dec 6, 2020
HTML Assessment for interview	2	Feb 16, 2024

HTML to XML

2peachy

Johannes Koch

Andy Dingley

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads