D
Dave Kuhlman
I recently read an article by Jon Udell about extracting data from
Web pages as a poor person's Web services. So, I have a question:
Is there any Python support for finding and extracting information
from HTML documents.
I'd like something that would do things like the following:
- return the data which is inside a <b> tag which is inside a
<li> tag.
- return the data which is inside a <a> tag that has attribute
href="http://www.python.org".
- Etc.
It would be a sort of structured grep for HTML.
I've found the HTMLParser and htmllib modules in the Python
standard library, but I'm wondering if there is anything at a
higher level.
Web searches did not turn up anything interesting.
Thanks for help.
Dave
Web pages as a poor person's Web services. So, I have a question:
Is there any Python support for finding and extracting information
from HTML documents.
I'd like something that would do things like the following:
- return the data which is inside a <b> tag which is inside a
<li> tag.
- return the data which is inside a <a> tag that has attribute
href="http://www.python.org".
- Etc.
It would be a sort of structured grep for HTML.
I've found the HTMLParser and htmllib modules in the Python
standard library, but I'm wondering if there is anything at a
higher level.
Web searches did not turn up anything interesting.
Thanks for help.
Dave