How to read POSTed data

  • Thread starter =?ISO-8859-1?Q?H=E5kan_Persson?=
  • Start date
?

=?ISO-8859-1?Q?H=E5kan_Persson?=

Hi.

I am trying to set up a simple HTTP-server but I have problems reading
data that is beeing POSTed.

class httpServer(BaseHTTPServer.BaseHTTPRequestHandler):
def do_POST(self):
input = self.rfile.read()

The self.rfile.read() will hang on the
data = self._sock.recv(recv_size)
line in the read() function in socket.py.

Is there another way to read the data?

Thanks,
Håkan Persson
 
D

Dan Perl

Håkan Persson said:
Hi.

I am trying to set up a simple HTTP-server but I have problems reading
data that is beeing POSTed.

class httpServer(BaseHTTPServer.BaseHTTPRequestHandler):
def do_POST(self):
input = self.rfile.read()

The self.rfile.read() will hang on the
data = self._sock.recv(recv_size)
line in the read() function in socket.py.

Is there another way to read the data?

I'm not an expert on this subject, but I am in the process of looking at the
code in BaseHTTPServer and other related modules. What I see used is
normally self.rfile.readline() and that makes more sense to me (intuitively)
than read(). Have you tried that?
 
D

Dan Perl

I am piggybacking on Hakan's original posting because I am addressing the
same group of people (those with good knowledge in the standard web
programming modules), on a related topic. However, my question is
independent of Hakan's.

I have trouble getting a simple CGI script to work because it gets an empty
cgi.FieldStorage form. I have looked into and tried to debug the internals
of the standard modules that are used but I cannot figure it out: how is a
multipart POST request parsed by CGIHTTPServer? BTW, I am using pyton 2.4.

I found the parsing of the header in the rfc822 module used by
BaseHTTPServer.BaseHTTPRequestHandler (through the mimetools module), but
that stops after the header (at the first empty line). Where is the parsing
done for the POST data following the header? I will appreciate any pointers
that will help me debug this problem.

As a side note, I found other old reports of problems with cgi handling POST
requests, reports that don't seem to have had a resolution. There is even a
bug reported just a few days ago (1112856) that is exactly about multipart
post requests. If I understand the bug report correctly though, it is only
on the latest version in CVS and it states that what is in the 2.4 release
works. All this tells me that it could be a "fragile" part in the standard
library. So it could be even a bug in the standard library, but for now I
am assuming that I'm doing something wrong.

Thanks,

Dan
 
A

and-google

Dan said:
how is a multipart POST request parsed by CGIHTTPServer?

It isn't; the input stream containing the multipart/form-data content
is passed to the CGI script, which can choose to parse it or not using
any code it has to hand - which could be the 'cgi' module, but not
necessarily.
Where is the parsing done for the POST data following the header?

If you are using the 'cgi' module, then cgi.parse_multipart.
As a side note, I found other old reports of problems with cgi
handling POST requests, reports that don't seem to have had a
resolution.

(in particular?)

FWIW, for interface-style and multipart-POST-file-upload-speed reasons
I wrote an alternative to cgi, form.py
(http://www.doxdesk.com/software/py/form.html). But I haven't found
cgi's POST reading to be buggy in general.
There is even a bug reported just a few days ago (1112856) that is
exactly about multipart post requests. If I understand the bug
report correctly though, it is only on the latest version in CVS
and it states that what is in the 2.4 release works.

That's correct.
All this tells me that it could be a "fragile" part in the standard
library.

I don't really think so; it's really an old stable part of the library
that is a bit crufty in places due to age. The patch that caused
1112856 was an attempt to rip out and replace the parser stuff, which
as a big change to old code is bound to cause trouble. But that's what
the dev cycle is for.

CGIHTTPServer, on the other hand, I have never really trusted. I would
suspect that fella.
 
M

M.E.Farmer

Dan said:
I am piggybacking on Hakan's original posting because I am addressing the
same group of people (those with good knowledge in the standard web
programming modules), on a related topic. However, my question is
independent of Hakan's.

I have trouble getting a simple CGI script to work because it gets an empty
cgi.FieldStorage form. I have looked into and tried to debug the internals
of the standard modules that are used but I cannot figure it out: how is a
multipart POST request parsed by CGIHTTPServer? BTW, I am using pyton 2.4.

I found the parsing of the header in the rfc822 module used by
BaseHTTPServer.BaseHTTPRequestHandler (through the mimetools module), but
that stops after the header (at the first empty line). Where is the parsing
done for the POST data following the header? I will appreciate any pointers
that will help me debug this problem.

As a side note, I found other old reports of problems with cgi handling POST
requests, reports that don't seem to have had a resolution. There is even a
bug reported just a few days ago (1112856) that is exactly about multipart
post requests. If I understand the bug report correctly though, it is only
on the latest version in CVS and it states that what is in the 2.4 release
works. All this tells me that it could be a "fragile" part in the standard
library. So it could be even a bug in the standard library, but for now I
am assuming that I'm doing something wrong.

Thanks,

Dan

Dan,
I was wondering how you were coming with your project.
I had wondered if i had missed something going the CherryPy route
instead of CGI. Now I see that you have had a bit of a snag , sorry to
hear that.
I am glad you are at least learning new things, 'cause if you had used
CherryPy2 you would have be done by now :p
M.E.Farmer
 
D

Dan Perl

Dan,
I was wondering how you were coming with your project.
I had wondered if i had missed something going the CherryPy route
instead of CGI. Now I see that you have had a bit of a snag , sorry to
hear that.
I am glad you are at least learning new things, 'cause if you had used
CherryPy2 you would have be done by now :p
M.E.Farmer

I chose to go first through the cgi module and later CherryPy exactly
because I thought I will learn more this way. And I guess I am learning
more.
 
D

Dan Perl

It isn't; the input stream containing the multipart/form-data content
is passed to the CGI script, which can choose to parse it or not using
any code it has to hand - which could be the 'cgi' module, but not
necessarily.


If you are using the 'cgi' module, then cgi.parse_multipart.

Thanks, at least I was right not to find that in the CGIHTTPServer and
BaseHTTPServer code. So I have to use that instead of FieldStorage? I was
expecting FieldStorage to encapsulate all that for all the cases, POST with
multipart/form-data being just a special case.

I took a brief look at cgi.parse_multipart and I still have to figure out
how to provide the fp argument. Any code examples anywhere? All the
examples I have found were using only FieldStorage:
http://www.cs.virginia.edu/~lab2q/lesson_7/
http://www.devshed.com/index2.php?option=content&task=view&id=198&pop=1&page=0&hide_js=1
http://gnosis.cx/publish/programming/feature_5min_python.html
http://mail.python.org/pipermail/edu-sig/2001-June/001368.html

BTW, I also tried the example in the last link and that doesn't work either,
with similar results/problems as my script. I think all those examples are
a few years old, has something changed since then?
(in particular?)

I made the comment and now I have to back that up:
http://mail.python.org/pipermail/python-list/2002-February/thread.html#88686
http://mail.python.org/pipermail/python-list/2002-September/thread.html#124109
FWIW, for interface-style and multipart-POST-file-upload-speed reasons
I wrote an alternative to cgi, form.py
(http://www.doxdesk.com/software/py/form.html). But I haven't found
cgi's POST reading to be buggy in general.

I was quite careful in calling the code "fragile" and I am not normally the
kind of person to mince words. I would have said buggy if I meant that or I
could have used even worse words. But even the rationale behind the patch
that caused the bug I mentioned ("Remove dependencies on (deprecated) rfc822
and mimetools modules, replacing with email.") and even your calling this
part of the standard library "a bit crufty in places due to age" support my
view that this code needs work.

Besides, I am not happy at all with the idea of having to use
cgi.parse_multipart instead of FieldStorage. It seems a lot more low level
than I would like to even if it offers more control. I looked at your
form.py and you seem to address that. Sorry though, I will probably not use
it. Once I learn enough from the cgi module I will just move on to using a
framework like CherryPy.

Dan
 
P

Pierre Quentel

Here is an example of how to get the POST data :

# def do_POST(self):
# ctype, pdict =
cgi.parse_header(self.headers.getheader('content-type'))
# length = int(self.headers.getheader('content-length'))
# if ctype == 'multipart/form-data':
# self.body = cgi.parse_multipart(self.rfile, pdict)
# elif ctype == 'application/x-www-form-urlencoded':
# qs = self.rfile.read(length)
# self.body = cgi.parse_qs(qs, keep_blank_values=1)
# else:
# self.body = {} # Unknown content-type
# # throw away additional data [see bug #427345]
# while select.select([self.rfile._sock], [], [], 0)[0]:
# if not self.rfile._sock.recv(1):
# break
# self.handle_data()

where handle_data() is the method where you will process the data received

The part related to bug #427345 is copied from CGIHTTPServer

For an example of use you can take a look at the CustomHTTPServer in
Karrigell (http://karrigell.sourceforge.net)

A+,
Pierre
 
D

Dan Perl

Dan said:
how is a multipart POST request parsed by CGIHTTPServer?
[...]
CGIHTTPServer, on the other hand, I have never really trusted. I would
suspect that fella.

It turns out I was wrong thinking that the POST requests I was handling were
multipart. Blame that on my limited knowledge of HTTP. At least the
content-type header in the request doesn't say that they are multipart. I
checked with the an Ethereal sniffer and my Firefox browser on Windows sends
a request with 'content-type: application/x-www-form-urlencoded' and the
form data in the body of the request, after the headers. I'll assume for
now that this is a valid HTTP POST request as I believe that Firefox is
responsible for it alone and that it is not determined by my HTML form.

Pierre Quentel's suggestion for an implementation of do_POST in the "How to
read POSTed data" seems to handle requests of this kind, although I didn't
personally try it. But the run_cgi method in CGIHTTPServer expects the form
data to be only in the POST header line, in the path of the URL, like in a
GET request. I cannot find form data being parsed any other way if the
content-type is x-www-form-urlencoded. Am I missing something? Also, I
don't know if this means anything, but shouldn't CGIHTTPServer use the cgi
module if cgi has all those useful parsing utilities?

Dan
 
D

Dan Perl

Pierre Quentel said:
Here is an example of how to get the POST data :

# def do_POST(self):
# ctype, pdict =
cgi.parse_header(self.headers.getheader('content-type'))
# length = int(self.headers.getheader('content-length'))
# if ctype == 'multipart/form-data':
# self.body = cgi.parse_multipart(self.rfile, pdict)
# elif ctype == 'application/x-www-form-urlencoded':
# qs = self.rfile.read(length)
# self.body = cgi.parse_qs(qs, keep_blank_values=1)
# else:
# self.body = {} # Unknown content-type
# # throw away additional data [see bug #427345]
# while select.select([self.rfile._sock], [], [], 0)[0]:
# if not self.rfile._sock.recv(1):
# break
# self.handle_data()

where handle_data() is the method where you will process the data received

The part related to bug #427345 is copied from CGIHTTPServer

For an example of use you can take a look at the CustomHTTPServer in
Karrigell (http://karrigell.sourceforge.net)

Pierre, I am repeating some questions I already stated in another thread,
'CGI POST problem', but do you have any opinions on how CGIHTTPServer's
do_POST handles requests? It looks to me like it always expects form data
to be part of the POST command header, in the path of the URL, just like a
GET request. Am I understanding the code incorrectly? It would also make
sense to me that CGIHTTPServer should use the cgi module like you do in your
example, but it doesn't use cgi. Any opinions on that? Is there a history
there? I don't know enough about HTTP, especially about its history, but
was this a change in the HTTP specification at some point?

Thanks,

Dan
 
P

Pierre Quentel

Pierre, I am repeating some questions I already stated in another thread,
'CGI POST problem', but do you have any opinions on how CGIHTTPServer's
do_POST handles requests? It looks to me like it always expects form data
to be part of the POST command header, in the path of the URL, just like a
GET request. Am I understanding the code incorrectly?

The code in CGIHTTPServer is not very easy to understand, but it does
read the request body, as many bytes as indicated in the Content-Length
header. See line 262 (in the Python 2.4 distribution) or 250 in Python
2.3 (this is the Windows version) :

data = self.rfile.read(nbytes)

Then this data is sent to the standard input of the CGI script. If this
script is a Python program using the cgi module, it usually creates a
cgi.FieldStorage() instance : upon creation, the standard input is read
(in self.read_urlencoded() for instance) and the string collected is
processed to produce a dictionary-like object, with keys matching the
form field names

This is compliant with the CGI specification (HTTP doesn't say anything
about the management of data sent by POST requests). The code I sent is
an alternative to CGI, leaving the management of this data (available in
self.body) to a method of the RequestHandler instance

Regards,
Pierre
 
D

Dan Perl

Pierre Quentel said:
The code in CGIHTTPServer is not very easy to understand, but it does
read the request body, as many bytes as indicated in the Content-Length
header. See line 262 (in the Python 2.4 distribution) or 250 in Python 2.3
(this is the Windows version) :

data = self.rfile.read(nbytes)

Then this data is sent to the standard input of the CGI script. If this
script is a Python program using the cgi module, it usually creates a
cgi.FieldStorage() instance : upon creation, the standard input is read
(in self.read_urlencoded() for instance) and the string collected is
processed to produce a dictionary-like object, with keys matching the form
field names

This is compliant with the CGI specification (HTTP doesn't say anything
about the management of data sent by POST requests). The code I sent is an
alternative to CGI, leaving the management of this data (available in
self.body) to a method of the RequestHandler instance

Thanks, Pierre, this got me much further but I hit another stumbling block.
I can see now that CGIHTTPServer writes all the header lines into os.environ
and creates a subprocess for the script with os.popen2 or os.popen3 (it's
Windows), passing the form data to the new process through sys.stdin (I
believe it is sys.stdin although the Library Reference descriptions of
popen2 and popen3 say that would be sys.stdout). But I don't see any of
those headers updated in the os.environ of the cgi script's process. Is the
parent's os.environ passed to the subprocesses created with popen2/popen3 on
Windows?

cgi.FieldStorage.read_urlencoded needs the content-length that should be
passed through os.environ.

Dan
 
D

Dan Perl

Dan Perl said:
Thanks, Pierre, this got me much further but I hit another stumbling
block. I can see now that CGIHTTPServer writes all the header lines into
os.environ and creates a subprocess for the script with os.popen2 or
os.popen3 (it's Windows), passing the form data to the new process through
sys.stdin (I believe it is sys.stdin although the Library Reference
descriptions of popen2 and popen3 say that would be sys.stdout). But I
don't see any of those headers updated in the os.environ of the cgi
script's process. Is the parent's os.environ passed to the subprocesses
created with popen2/popen3 on Windows?

cgi.FieldStorage.read_urlencoded needs the content-length that should be
passed through os.environ.

I was about to report a bug but then I found bug report "[ 1100235 ] Scripts
started with CGIHTTPServer: missing cgi environment" which is actually
caused by "[ 1110478 ] os.environ.update doesn't work". They are fixed with
a patch in 1110478. I still have to try it. I do like the alternate
solution proposed by the originator of 1100235 though. I also thought that
CGIHTTPServer should use the new subprocess.Popen class instead of popen2,
popen3 and popen4.
 
D

Dan Perl

CGIHTTPServer, on the other hand, I have never really trusted. I would
suspect that fella.

CGIHTTPServer wasn't the culprit after all, it was os.py. See bug report
"[ 1100235 ] Scripts started with CGIHTTPServer: missing cgi environment"
which is actually caused by "[ 1110478 ] os.environ.update doesn't work".
They are fixed with a patch in 1110478. I applied the patch in os.py and my
web app is working now.
 
M

M.E.Farmer

Sweet!
Glad you fixed it, and documented it all!
Thanks for the followups.
Now the next poor soul to stumble in can get the right fix.
Never know when it could be me ;)
M.E.Farmer
 
D

Dan Perl

M.E.Farmer said:
Sweet!
Glad you fixed it, and documented it all!
Thanks for the followups.
Now the next poor soul to stumble in can get the right fix.
Never know when it could be me ;)

Thanks for the comments. I did indeed post the info on the bug to let other
people know about it.

Anyway, this was more of a learning experience than I intended it to be.
And now it's time for me to move on and to get into CherryPy.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,997
Messages
2,570,239
Members
46,827
Latest member
DMUK_Beginner

Latest Threads

Top