Bugs: Content-Length not updated by reused urllib.request.Request/ has_header() case-sensitive

Johannes Kleese · Nov 12, 2012

Hi!

(Yes, I did take a look at the issue tracker but couldn't find any
corresponding bug, and no, I don't want to open a new account just for
this one.)

--------------------------------------------------------------------

I'm reusing a single urllib.request.Request object to HTTP-POST data to
the same URL a number of times. While the data itself is sent as
expected every time, the Content-Length header is not updated after the
first request. Tested with Python 3.1.3 and Python 3.1.4.
{"Content-Type": "application/x-www-form-urlencoded"})

[('Content-length', '1'), ('Content-type',
'application/x-www-form-urlencoded'), ('Host', 'example.com'),
('User-agent', 'Python-urllib/3.1')]

[('Content-length', '1'), ('Content-type',
'application/x-www-form-urlencoded'), ('Host', 'example.com'),
('User-agent', 'Python-urllib/3.1')]

Note that after the second run, Content-Length stays "1", but should be
"9", corresponding to the data b'123456789'. (Request data is not
x-www-form-urlencoded to shorten the test case. Doesn't affect the bug,
though.)

--------------------------------------------------------------------

While at it, I noticed that urllib.request.Request.has_header() and
..get_header() are case-sensitive, while HTTP headers are not (RFC 2616,
4.2). Thus the following, slightly unfortunate behaviour:
[('Content-length', '1'), ('Content-type',
'application/x-www-form-urlencoded'), ('Host', 'example.com'),
('User-agent', 'Python-urllib/3.1')]
'application/x-www-form-urlencoded'

Terry Reedy · Nov 12, 2012

Hi!

(Yes, I did take a look at the issue tracker but couldn't find any
corresponding bug, and no, I don't want to open a new account just for
this one.)

You only have to open a tracker account just once. I am reluctant to
report this myself as I do not use the module and cannot answer questions.

I'm reusing a single urllib.request.Request object to HTTP-POST data to
the same URL a number of times. While the data itself is sent as
expected every time, the Content-Length header is not updated after the
first request. Tested with Python 3.1.3 and Python 3.1.4.

3.1 only gets security fixes. Consider upgrading. In any case, suspected
bugs need to be tested with the latest release, as patches get applied
daily. As it happens,

import urllib.request
opener = urllib.request.build_opener()
request = urllib.request.Request("http://example.com/", headers =
{"Content-Type": "application/x-www-form-urlencoded"})

opener.open(request, "1".encode("us-ascii"))
print(request.data, '\n', request.header_items())

opener.open(request, "123456789".encode("us-ascii"))
print(request.data, '\n', request.header_items())

exhibits the same behavior in 3.3.0 of printing ('Content-length', '1')
in the last output. I agree that that looks wrong, but I do not know if
such re-use is supposed to be supported.

While at it, I noticed that urllib.request.Request.has_header() and
.get_header() are case-sensitive,

Python is case sensitive.

while HTTP headers are not (RFC 2616, 4.2).
Thus the following, slightly unfortunate behaviour:
[('Content-length', '1'), ('Content-type',
'application/x-www-form-urlencoded'), ('Host', 'example.com'),
('User-agent', 'Python-urllib/3.1')]
'application/x-www-form-urlencoded'

Judging from 'Content-type', 'User-agent', 'Content-length', 'Host',
urllib.request consistently capitalizes the first word of all header
tags and expects them in that form. If that is not standard, it should
be documented.

Terry Reedy · Nov 13, 2012

import urllib.request
opener = urllib.request.build_opener()
request = urllib.request.Request("http://example.com/", headers =
{"Content-Type": "application/x-www-form-urlencoded"})

opener.open(request, "1".encode("us-ascii"))
print(request.data, '\n', request.header_items())

opener.open(request, "123456789".encode("us-ascii"))
print(request.data, '\n', request.header_items())

exhibits the same behavior in 3.3.0 of printing ('Content-length', '1')
in the last output. I agree that that looks wrong, but I do not know if
such re-use is supposed to be supported.

I opened http://bugs.python.org/issue16464

Johannes Kleese · Nov 13, 2012

Terry said:
On 11/12/2012 10:52 AM, Johannes Kleese wrote:

3.1 only gets security fixes. Consider upgrading.

Stuck with Debian on a server, thus stuck with 3.1 on development machine.

exhibits the same behavior in 3.3.0 of printing ('Content-length', '1')
in the last output. I agree that that looks wrong, but I do not know if
such re-use is supposed to be supported.

The Request object should then either get it right on re-use (which I'd
prefer), or block re-use.

Python is case sensitive.

True, of course, but

and the functions work on HTTP data, not Python data. After all, we are
lucky to have functions here and not just a dictionary.

Anyway, thanks for reporting!

Terry Reedy · Nov 27, 2012

I opened http://bugs.python.org/issue16464

A patch has been written by Alexey Kachayev and pushed by Andrew Svetlov
and the behavior will change in 3.4.0 to allow reuse.

How to get JSON values and how to trace sessions??	2	Apr 22, 2013
SENTINEL CONTROL LOOP WHEN DEALING WITH TWO ARRAYS	1	Oct 26, 2023
urllib2 opendirector versus request object	0	Jun 9, 2011
Python3: Is this a bug in urllib?	5	Oct 19, 2010
More than one cookie with urllib2	7	Dec 11, 2003
Python 3.0 urllib.parse.parse_qs results in TypeError	6	Jan 13, 2009
cgi.FieldStorage() not working on Windows	4	Jun 12, 2007
Tryign to send mail via a python script by using the local MTA	58	Sep 15, 2013

Bugs: Content-Length not updated by reused urllib.request.Request/ has_header() case-sensitive

Johannes Kleese

Terry Reedy

Terry Reedy

Johannes Kleese

Terry Reedy

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads