M
Matt
I'm trying to get the HTML data off of a webpage. Let's say for the
sake of argument it's the python homepage. I've googled around and
found some examples that people said worked. Here's what I've cobbled
together:
#getHTML.py
############################################
import urllib
import urllib2
proxy_info = {'user':'us3r',
'password':'p@ssword',
'host':'MY_PROXY',
'port':'80'}
os.environ['HTTP_PROXY'] =
'http://%(user)s:%(password)s@%(host)s:%(port)s' % proxy_info
test_url = "http://www.python.org/index.html"
handle = urllib2.urlopen(test_url)
#handle = urllib.urlopen(test_url)
txt = handle.read().lower()
handle.close()
print "Text: "
print txt
#################################
#end getHTML.py
When I run this with urllib2 I get (with or without a dummy password):
Traceback (most recent call last):
File "P:\My Documents\Projects\Python\validate_zipcodes.py", line
103, in ?
handle = urllib2.urlopen(test_url)
File "C:\Python23\lib\urllib2.py", line 129, in urlopen
return _opener.open(url, data)
File "C:\Python23\lib\urllib2.py", line 326, in open
'_open', req)
File "C:\Python23\lib\urllib2.py", line 306, in _call_chain
result = func(*args)
File "C:\Python23\lib\urllib2.py", line 901, in http_open
return self.do_open(httplib.HTTP, req)
File "C:\Python23\lib\urllib2.py", line 886, in do_open
raise URLError(err)
urllib2.URLError: <urlopen error (7, 'getaddrinfo failed')>
When I run it with urllib.urlopen, I get:
Traceback (most recent call last):
File "P:\My Documents\Projects\Python\validate_zipcodes.py", line
104, in ?
handle = urllib.urlopen(test_url)
File "C:\Python23\lib\urllib.py", line 76, in urlopen
return opener.open(url)
File "C:\Python23\lib\urllib.py", line 181, in open
return getattr(self, name)(url)
File "C:\Python23\lib\urllib.py", line 287, in open_http
h = httplib.HTTP(host)
File "C:\Python23\lib\httplib.py", line 1009, in __init__
self._setup(self._connection_class(host, port, strict))
File "C:\Python23\lib\httplib.py", line 507, in __init__
self._set_hostport(host, port)
File "C:\Python23\lib\httplib.py", line 518, in _set_hostport
raise InvalidURL("nonnumeric port: '%s'" % host[i+1:])
httplib.InvalidURL: nonnumeric port: 'p@ssword@MY_PROXY:80'
Obviously, going through Internet Explorer works.
Has anyone else had a similar issue? I don't know the proxy situation
we have here, so is it possible that the proxy is causing this?
Any help is much appreciated.
Thanks for at least reading this far!
M@
sake of argument it's the python homepage. I've googled around and
found some examples that people said worked. Here's what I've cobbled
together:
#getHTML.py
############################################
import urllib
import urllib2
proxy_info = {'user':'us3r',
'password':'p@ssword',
'host':'MY_PROXY',
'port':'80'}
os.environ['HTTP_PROXY'] =
'http://%(user)s:%(password)s@%(host)s:%(port)s' % proxy_info
test_url = "http://www.python.org/index.html"
handle = urllib2.urlopen(test_url)
#handle = urllib.urlopen(test_url)
txt = handle.read().lower()
handle.close()
print "Text: "
print txt
#################################
#end getHTML.py
When I run this with urllib2 I get (with or without a dummy password):
Traceback (most recent call last):
File "P:\My Documents\Projects\Python\validate_zipcodes.py", line
103, in ?
handle = urllib2.urlopen(test_url)
File "C:\Python23\lib\urllib2.py", line 129, in urlopen
return _opener.open(url, data)
File "C:\Python23\lib\urllib2.py", line 326, in open
'_open', req)
File "C:\Python23\lib\urllib2.py", line 306, in _call_chain
result = func(*args)
File "C:\Python23\lib\urllib2.py", line 901, in http_open
return self.do_open(httplib.HTTP, req)
File "C:\Python23\lib\urllib2.py", line 886, in do_open
raise URLError(err)
urllib2.URLError: <urlopen error (7, 'getaddrinfo failed')>
When I run it with urllib.urlopen, I get:
Traceback (most recent call last):
File "P:\My Documents\Projects\Python\validate_zipcodes.py", line
104, in ?
handle = urllib.urlopen(test_url)
File "C:\Python23\lib\urllib.py", line 76, in urlopen
return opener.open(url)
File "C:\Python23\lib\urllib.py", line 181, in open
return getattr(self, name)(url)
File "C:\Python23\lib\urllib.py", line 287, in open_http
h = httplib.HTTP(host)
File "C:\Python23\lib\httplib.py", line 1009, in __init__
self._setup(self._connection_class(host, port, strict))
File "C:\Python23\lib\httplib.py", line 507, in __init__
self._set_hostport(host, port)
File "C:\Python23\lib\httplib.py", line 518, in _set_hostport
raise InvalidURL("nonnumeric port: '%s'" % host[i+1:])
httplib.InvalidURL: nonnumeric port: 'p@ssword@MY_PROXY:80'
Obviously, going through Internet Explorer works.
Has anyone else had a similar issue? I don't know the proxy situation
we have here, so is it possible that the proxy is causing this?
Any help is much appreciated.
Thanks for at least reading this far!
M@