Fuzzyman said:
Very easily. Have a look at my article on the ``urllib2`` module.
http://www.voidspace.org.uk/python/articles.shtml#http
You may need to use ClientCookie/cookielib to handle cookies and may
have to cope with BASIC authentication. There are also articles about
both of these as well.
If you want to handle filling in forms programattically then the module
ClientForm is useful (allegedly).
The last piece of the puzzle is BeautifulSoup. That's what you use to
extract data from the web page.
For instance a lot of web pages listing data have something like this
on it:
<table>
....
<tr><th>Item:</th><td>Value</td></tr>
....
</table>
You can extract value from such with BeautifulSoup by doing something like:
soup.fetchText('Item:')[0].findParent(['td', 'th']).nextSibling.string
Where this checks works for the item being in either a td or th tag.
Of course, I recommend doing things a little bit more verbosely. In my
case, I'm writing code that's expected to work on a large number of
web pages with different formats, so I put in a lot of error checking,
along with informative errors.
links = table.fetchText(name)
if not links:
raise BadTableMatch, "%s not found in table" % name
td = links[0].findParent(['td', 'th'])
if not td:
raise BadmatchTable, "td/th not a parent of %s" % name
next = td.nextSibling
if not next:
raise BadTableMatch, "td for %s has no sibling" % name
out = get_contents(next)
if not out:
raise BadTableMatch, "no value string found for %s" % name
return out
BeautifulSoup would raise exceptions if the conditions I check for are
true and I didn't check them - but the error messages wouldn't be as
informative.
Oh yeah - get_contents isn't from BeautifulSoup. I ran into cases
where the <td> tag held other tags, and wanted the flat text
extracted. Couldn't find a BeautifulSoup method to do that, so I wrote:
def get_contents(ele):
"""Utility function to return all the text in a tag."""
if ele.string:
return ele.string # We only have one string. Done
return ''.join(get_contents(x) for x in ele)
<mike