J
jacob c.
When I request a URL using urllib2, it appears that urllib2 always
makes the request using HTTP 1.0, and not HTTP 1.1. I'm trying to use
the "If-None-Match"/"ETag" HTTP headers to conserve bandwidth, but if
I'm not mistaken, these are HTTP 1.1 headers, so I can't reasonably
expect a web server to respond correctly to my requests. (In my
limited testing, it looks like some servers respond correctly with an
HTTP 304 status, while some respond with an HTTP 200 and a full
response body.)
My specific issue notwithstanding, is there any way to force HTTP 1.1
to be used? Or am I doing something wrong?
I've condensed my code into the following example, which produces
similar results on two different setups, Python 2.3.2 on Windows and
Python 2.2.1 on Debian Linux. With this particular URL, the server
responds to my HTTP 1.0 request with an HTTP 1.1 response and an HTTP
304 status code, which suits my purposes, but I'd feel more comfortable
if my outgoing response also declared itself to be an HTTP 1.1 request.
Example code:
import httplib
httplib.HTTPConnection.debuglevel = 1
import urllib2
url = 'http://www.mozilla.org/images/firefox-oneoh-top.png'
etag = '"788054-2d18-3b21a80"'
request = urllib2.Request(url)
request.add_header('If-None-Match', etag)
opener = urllib2.build_opener()
response = opener.open(request)
Example output:
connect: (www.mozilla.org, 80)
send: 'GET /images/firefox-oneoh-top.png HTTP/1.0\r\nHost:
www.mozilla.org\r\nUser-agent: Python-urllib/2.0a1\r\nIf-none-match:
"788054-2d18-3b21a80"\r\n\r\n'
reply: 'HTTP/1.1 304 Not Modified\r\n'
header: Date: Tue, 21 Dec 2004 21:56:27 GMT
header: Server: Apache/2.0.46 (Red Hat)
header: Connection: close
header: ETag: "788054-2d18-3b21a80"
Traceback (most recent call last):
File "urllib2-test.py", line 11, in ?
response = opener.open(request)
File "D:\Python\lib\urllib2.py", line 333, in open
'_open', req)
File "D:\Python\lib\urllib2.py", line 313, in _call_chain
result = func(*args)
File "D:\Python\lib\urllib2.py", line 849, in http_open
return self.do_open(httplib.HTTP, req)
File "D:\Python\lib\urllib2.py", line 843, in do_open
return self.parent.error('http', req, fp, code, msg, hdrs)
File "D:\Python\lib\urllib2.py", line 359, in error
return self._call_chain(*args)
File "D:\Python\lib\urllib2.py", line 313, in _call_chain
result = func(*args)
File "D:\Python\lib\urllib2.py", line 419, in http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 304: Not Modified
makes the request using HTTP 1.0, and not HTTP 1.1. I'm trying to use
the "If-None-Match"/"ETag" HTTP headers to conserve bandwidth, but if
I'm not mistaken, these are HTTP 1.1 headers, so I can't reasonably
expect a web server to respond correctly to my requests. (In my
limited testing, it looks like some servers respond correctly with an
HTTP 304 status, while some respond with an HTTP 200 and a full
response body.)
My specific issue notwithstanding, is there any way to force HTTP 1.1
to be used? Or am I doing something wrong?
I've condensed my code into the following example, which produces
similar results on two different setups, Python 2.3.2 on Windows and
Python 2.2.1 on Debian Linux. With this particular URL, the server
responds to my HTTP 1.0 request with an HTTP 1.1 response and an HTTP
304 status code, which suits my purposes, but I'd feel more comfortable
if my outgoing response also declared itself to be an HTTP 1.1 request.
Example code:
import httplib
httplib.HTTPConnection.debuglevel = 1
import urllib2
url = 'http://www.mozilla.org/images/firefox-oneoh-top.png'
etag = '"788054-2d18-3b21a80"'
request = urllib2.Request(url)
request.add_header('If-None-Match', etag)
opener = urllib2.build_opener()
response = opener.open(request)
Example output:
connect: (www.mozilla.org, 80)
send: 'GET /images/firefox-oneoh-top.png HTTP/1.0\r\nHost:
www.mozilla.org\r\nUser-agent: Python-urllib/2.0a1\r\nIf-none-match:
"788054-2d18-3b21a80"\r\n\r\n'
reply: 'HTTP/1.1 304 Not Modified\r\n'
header: Date: Tue, 21 Dec 2004 21:56:27 GMT
header: Server: Apache/2.0.46 (Red Hat)
header: Connection: close
header: ETag: "788054-2d18-3b21a80"
Traceback (most recent call last):
File "urllib2-test.py", line 11, in ?
response = opener.open(request)
File "D:\Python\lib\urllib2.py", line 333, in open
'_open', req)
File "D:\Python\lib\urllib2.py", line 313, in _call_chain
result = func(*args)
File "D:\Python\lib\urllib2.py", line 849, in http_open
return self.do_open(httplib.HTTP, req)
File "D:\Python\lib\urllib2.py", line 843, in do_open
return self.parent.error('http', req, fp, code, msg, hdrs)
File "D:\Python\lib\urllib2.py", line 359, in error
return self._call_chain(*args)
File "D:\Python\lib\urllib2.py", line 313, in _call_chain
result = func(*args)
File "D:\Python\lib\urllib2.py", line 419, in http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 304: Not Modified