Python3: Is this a bug in urllib?

J

Johannes Bauer

Hi,

I've experienced the following behavior with Python3 of which I do not
know if it's a bug or not. On two Python3.1 implementations, Python's
urllib hangs when encountering a HTTP 301 (Redirect).

The code to reproduce is a one-liner (actually, two-liner), Python from
Ubuntu tree:

Python 3.1.2 (r312:79147, Apr 15 2010, 15:35:48)
[GCC 4.4.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Also occurs on another Python version (Gentoo):

Python 3.1.2 (release31-maint, Jun 9 2010, 23:58:21)
[GCC 4.3.4] on linux2

The exchanged HTTP is:

GET http://google.de HTTP/1.1
Accept-Encoding: identity
Host: google.de
User-Agent: Python-urllib/3.1

HTTP/1.1 301 Moved Permanently
Via: 1.1 IMMPWISA01
Connection: Keep-Alive
Proxy-Connection: Keep-Alive
Content-Length: 218
Expires: Thu, 18 Nov 2010 15:18:40 GMT
Date: Tue, 19 Oct 2010 15:18:40 GMT
Location: http://www.google.de/
Content-Type: text/html; charset=UTF-8
Server: gws
Cache-Control: public, max-age=2592000
X-XSS-Protection: 1; mode=block

<HTML><HEAD><meta http-equiv="content-type"
content="text/html;charset=utf-8">
<TITLE>301 Moved</TITLE></HEAD><BODY>
<H1>301 Moved</H1>
The document has moved
<A HREF="http://www.google.de/">here</A>.
</BODY></HTML>

Although the content might indicate looping forever, it just hangs with
no web traffic whatsoever (the TCP connection stays open, however).

When interrupting with Ctrl-C, this is the calltrace:

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.1/urllib/request.py", line 1454, in open
return getattr(self, name)(url)
File "/usr/lib/python3.1/urllib/request.py", line 1628, in open_http
return self._open_generic_http(http.client.HTTPConnection, url, data)
File "/usr/lib/python3.1/urllib/request.py", line 1624, in
_open_generic_http
response.status, response.reason, response.msg, data)
File "/usr/lib/python3.1/urllib/request.py", line 1644, in http_error
return self.http_error_default(url, fp, errcode, errmsg, headers)
File "/usr/lib/python3.1/urllib/request.py", line 1648, in
http_error_default
void = fp.read()
File "/usr/lib/python3.1/socket.py", line 214, in readinto
return self._sock.recv_into(b)
KeyboardInterrupt

Can anyone tell me if this is a bug or expected behavior?

Regards,
Johannes

--
Zumindest nicht öffentlich!
Ah, der neueste und bis heute genialste Streich unsere großen
Kosmologen: Die Geheim-Vorhersage.
- Karl Kaos über Rüdiger Thomas in dsa <[email protected]>
 
P

Peter Otten

Johannes said:
Hi,

I've experienced the following behavior with Python3 of which I do not
know if it's a bug or not. On two Python3.1 implementations, Python's
urllib hangs when encountering a HTTP 301 (Redirect).

The code to reproduce is a one-liner (actually, two-liner), Python from
Ubuntu tree:

Python 3.1.2 (r312:79147, Apr 15 2010, 15:35:48)
[GCC 4.4.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Also occurs on another Python version (Gentoo):

Python 3.1.2 (release31-maint, Jun 9 2010, 23:58:21)
[GCC 4.3.4] on linux2

The exchanged HTTP is:

GET http://google.de HTTP/1.1
Accept-Encoding: identity
Host: google.de
User-Agent: Python-urllib/3.1

HTTP/1.1 301 Moved Permanently
Via: 1.1 IMMPWISA01
Connection: Keep-Alive
Proxy-Connection: Keep-Alive
Content-Length: 218
Expires: Thu, 18 Nov 2010 15:18:40 GMT
Date: Tue, 19 Oct 2010 15:18:40 GMT
Location: http://www.google.de/
Content-Type: text/html; charset=UTF-8
Server: gws
Cache-Control: public, max-age=2592000
X-XSS-Protection: 1; mode=block

<HTML><HEAD><meta http-equiv="content-type"
content="text/html;charset=utf-8">
<TITLE>301 Moved</TITLE></HEAD><BODY>
<H1>301 Moved</H1>
The document has moved
<A HREF="http://www.google.de/">here</A>.
</BODY></HTML>

Although the content might indicate looping forever, it just hangs with
no web traffic whatsoever (the TCP connection stays open, however).

When interrupting with Ctrl-C, this is the calltrace:

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.1/urllib/request.py", line 1454, in open
return getattr(self, name)(url)
File "/usr/lib/python3.1/urllib/request.py", line 1628, in open_http
return self._open_generic_http(http.client.HTTPConnection, url, data)
File "/usr/lib/python3.1/urllib/request.py", line 1624, in
_open_generic_http
response.status, response.reason, response.msg, data)
File "/usr/lib/python3.1/urllib/request.py", line 1644, in http_error
return self.http_error_default(url, fp, errcode, errmsg, headers)
File "/usr/lib/python3.1/urllib/request.py", line 1648, in
http_error_default
void = fp.read()
File "/usr/lib/python3.1/socket.py", line 214, in readinto
return self._sock.recv_into(b)
KeyboardInterrupt

Can anyone tell me if this is a bug or expected behavior?

While I'm not 100 percent sure it looks like a bug to me and I think you
should report it at http://bugs.python.org

Peter
 
J

Johannes Bauer

Am 20.10.2010 14:32, schrieb Justin Ezequiel:
aren't you supposed to call read on the return value of open?
i.e.,
request.URLopener().open("http://google.de").read()

If open blocks, a appended "read()" will never be executed. In order to
demonstrate the problem, I reduced the call to the least amount of code
needed to trigger it. Had I appended read(), it would not have been
clear if the read() actually hangs or the open(). The way I posted it,
it is clear.

Regards,
Johannes

--
Zumindest nicht öffentlich!
Ah, der neueste und bis heute genialste Streich unsere großen
Kosmologen: Die Geheim-Vorhersage.
- Karl Kaos über Rüdiger Thomas in dsa <[email protected]>
 
J

Justin Ezequiel

'''
C:\Documents and Settings\Administrator\Desktop>python wtf.py
301 Moved Permanently
b'<HTML><HEAD><meta http-equiv="content-type" content="text/
html;charset=utf-8">
\n<TITLE>301 Moved</TITLE></HEAD><BODY>\n<H1>301 Moved</H1>\nThe
document has mo
ved\n<A HREF="http://www.google.de/">here</A>.\r\n</BODY></HTML>\r\n'
foo 5.328 secs
301 Moved Permanently
bar 241.016 secs

C:\Documents and Settings\Administrator\Desktop>
'''
import http.client
import time

def foo():
c = http.client.HTTPConnection('google.de')
try:
c.request('GET', '/')
r = c.getresponse()
try:
print(r.status, r.reason)
x = r.read()
finally: r.close()
finally: c.close()
print(x)

def bar():
c = http.client.HTTPConnection('google.de')
try:
c.request('GET', '/')
r = c.getresponse()
try:
print(r.status, r.reason)
x = r.fp.read()
finally: r.fp.close()
finally: c.close()

s = time.time()
foo()
e = time.time()
print('foo %.3f secs' % (e-s,))

s = time.time()
bar()
e = time.time()
print('bar %.3f secs' % (e-s,))
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,968
Messages
2,570,154
Members
46,701
Latest member
XavierQ83

Latest Threads

Top