creat a DOM from an html document

M

Mark Harrison

I thought I saw a package that would create a DOM from html, with
allowances that it would do a "best effort" job to parse
non-perfectly formed html.

Now I can't seem to find this... does anybody have a recommendation
as to a good package to look at?

Many TIA!
Mark
 
M

Mark Harrison

Mark Harrison said:
Now I can't seem to find this... does anybody have a recommendation
as to a good package to look at?

Ahh, it's BeautifulSoup...

Thanks All!!
 
J

John J. Lee

Mark Harrison said:
Ahh, it's BeautifulSoup...

Strictly that's not THE DOM, just A document object model. The DOM
proper is a standardised interface, which BeautifulSoup does not
implement. You could build a DOM using BeautifulSoup, though.


John
 
P

Paul Boddie

John said:
Strictly that's not THE DOM, just A document object model. The DOM
proper is a standardised interface, which BeautifulSoup does not
implement. You could build a DOM using BeautifulSoup, though.

For a certain value of standardised, libxml2dom provides "the DOM" for
HTML:

import urllib, libxml2dom
f = urllib.urlopen("http://www.python.org")
s = f.read(); f.close()
d = libxml2dom.parseString(s, html=1)
print "There are", len(d.xpath("//table")), "tables in the document."

See http://www.python.org/pypi/libxml2dom for more information.

Paul
 
X

Xavier Morel

Mark said:
I thought I saw a package that would create a DOM from html, with
allowances that it would do a "best effort" job to parse
non-perfectly formed html.

Now I can't seem to find this... does anybody have a recommendation
as to a good package to look at?

Many TIA!
Mark
While it doesn't generate a W3C DOM, BeautifulSoup is probably your best
bet for parsing less-than-perfect HTML and get something useable out of it.

Once you have your (parsed) document, you can either use it as is or try
to convert it to a valid W3C DOM though.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,283
Messages
2,571,409
Members
48,102
Latest member
charleswillson

Latest Threads

Top