DOM and HTML

  • Thread starter robert.differentone
  • Start date
R

robert.differentone

Hi All,

I am looking for any Python library which can help to get DOM
tree from HTML. Is there any way to access HTML DOM, just like
accessing it using javascript.

Any kind of help is appreciated.

Thanks.
R
 
S

Sullivan WxPyQtKinter

I do not know much about the HTML DOM....But I think if you just mean
treating HTML like XML and build it into a DOM tree and (Very
important) the HTML file is not a 10000 lines or even longer one, then
go ahead to xml.dom.minidom module for help. It has a basic (and great)
implementation for light-weighted DOM implementation.
 
F

Fredrik Lundh

Sullivan WxPyQtKinter said:
go ahead to xml.dom.minidom module for help. It has a basic (and great)
implementation for light-weighted DOM implementation.

that's a rather unusual way to use words like "great" and "light-weight"...

</F>
 
A

Ant

I've used Beautiful Soup, and it is a very pythonic way of accessing
the data in the HTML. It is actually very similar to the way you access
the DOM with JS - for example soup.html.body.h1 will give you the first
h1 tag.

There are also various other ways of searching the HTML in XPathish
ways (if XPath used dictionaries and lists...).

http://www.crummy.com/software/BeautifulSoup/
 
L

Larry Bates

Hi All,

I am looking for any Python library which can help to get DOM
tree from HTML. Is there any way to access HTML DOM, just like
accessing it using javascript.

Any kind of help is appreciated.

Thanks.
R
Since the browser can't execute anything except Javascript, you
can't get to/manipulate the DOM with anything but Javascript code.
There have been attempts at getting a browser that can execute
Python code, but I don't think they ever really got anywhere.

-Larry
 
P

Paul Boddie

Larry said:
I am looking for any Python library which can help to get DOM
tree from HTML. Is there any way to access HTML DOM, just like
accessing it using javascript.
[...]

Since the browser can't execute anything except Javascript, you

Who said anything about the browser? Accessing a DOM "just like [...]
javascript" can mean a number of things: using an API like the one
JavaScript uses, for example, as well as actually accessing a DOM
associated with a page in a browser.
can't get to/manipulate the DOM with anything but Javascript code.
There have been attempts at getting a browser that can execute
Python code, but I don't think they ever really got anywhere.

Actually, this isn't strictly true either. Disregarding, perhaps
unfairly, recent work on PyXPCOM to integrate Python more tightly with
Mozilla, there are various packages which do access browser DOMs: if
the questioner uses a KDE desktop and isn't averse to installing some
packages, there's qtxmldom [1] which can access the DOM in Konqueror in
association with the kpartplugins distribution [2]; otherwise, I
believe there's a Python package for accessing Internet Explorer's DOM.

And outside browsers, one can still use various packages already
mentioned, in addition to libxml2dom [3] which provides support via
libxml2 for reading HTML and XML, producing a DOM which resembles the
standardised DOM typically available to JavaScript. It shouldn't be
forgotten that PyXML also supports HTML parsing [4], either.

Paul

[1] http://www.boddie.org.uk/python/qtxmldom.html
[2] http://www.boddie.org.uk/python/kpartplugins.html
[3] http://www.boddie.org.uk/python/libxml2dom.html
[4] http://www.boddie.org.uk/python/HTML.html
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,291
Messages
2,571,493
Members
48,160
Latest member
KieranKisc

Latest Threads

Top