BaseHTTPServer weirdness

R

Ron Garret

I'm trying to figure out how to use BaseHTTPServer. Here's my little
test app:

=================================

#!/usr/bin/python

from BaseHTTPServer import *

import cgi

class myHandler(BaseHTTPRequestHandler):

def do_GET(r):
s = ''
try:
s = cgi.parse_qs(r.rfile.read(int(r.headers.get
("Content-length"))), 1)
except:
pass

r.send_response(200)
r.send_header("Content-type", "text/html")
r.end_headers()
r.wfile.write("""
<form method=post action=foo>
<input type=text name=text1 value="">
<input type=text name=text2 value="">
<input type=submit>
</form> %s
""" % s)

def do_POST(r):
r.do_GET()


d = HTTPServer(('', 1024), myHandler)
d.serve_forever()

===================================

Two questions:

1. The line:

s = cgi.parse_qs(r.rfile.read(int(r.headers.get("Content-length"))), 1)

feels like a horrible hack. It seems like this would be a better
alternative:

s = cgi.parse(r.rfile)

but that doesn't actually work. Why? What is the Right Way to parse
form data in a BaseHTTPServer?

2. Despite the fact that I'm passing a 1 for the keep_blank_values
argument to cgi.parse_qs, it doesn't actually keep blank values. Is
this a bug, or am I doing something wrong?

Thanks,
rg
 
S

Steve Holden

Ron said:
I'm trying to figure out how to use BaseHTTPServer. Here's my little
test app:

=================================

#!/usr/bin/python

from BaseHTTPServer import *

import cgi

class myHandler(BaseHTTPRequestHandler):

def do_GET(r):
s = ''
try:
s = cgi.parse_qs(r.rfile.read(int(r.headers.get
("Content-length"))), 1)
except:
pass

r.send_response(200)
r.send_header("Content-type", "text/html")
r.end_headers()
r.wfile.write("""
<form method=post action=foo>
<input type=text name=text1 value="">
<input type=text name=text2 value="">
<input type=submit>
</form> %s
""" % s)

def do_POST(r):
r.do_GET()


d = HTTPServer(('', 1024), myHandler)
d.serve_forever()

===================================

Two questions:

1. The line:

s = cgi.parse_qs(r.rfile.read(int(r.headers.get("Content-length"))), 1)

feels like a horrible hack. It seems like this would be a better
alternative:

s = cgi.parse(r.rfile)

but that doesn't actually work. Why? What is the Right Way to parse
form data in a BaseHTTPServer?
The normal way is

s = cgi.parse()

since the CGI script sees the client network socket (after consumption
of HTTP headers) as its standard input. However I'm not sure how much it
currently does in the way on handling strange inputs like gzip
compressed data.
2. Despite the fact that I'm passing a 1 for the keep_blank_values
argument to cgi.parse_qs, it doesn't actually keep blank values. Is
this a bug, or am I doing something wrong?
Sounds like a bug, but then since your parsing looks buggy I'm surprised
you get anything at all. Try using a keyword argument
keep_blank_values=1 just in case the order has changed or something
daft. But fix your parsing first.

The other thing to note is that since you are putting a dictionary's
string representation out straight into your HTML if there are odd
characters in it this may give you strange output in the browser, so you
should view the page source to ensure that's not the case. Which it
probably isn't ...

regards
Steve
 
R

Ron Garret

Steve Holden said:
The normal way is

s = cgi.parse()

since the CGI script sees the client network socket (after consumption
of HTTP headers) as its standard input.

Doesn't work. (I even tried sys.stdin=r.rfile; s=cgi.parse()) Don't
forget, this is not a CGI script, it's a handler for a BaseHTTPServer.

Sounds like a bug, but then since your parsing looks buggy I'm surprised
you get anything at all. Try using a keyword argument
keep_blank_values=1 just in case the order has changed or something
daft. But fix your parsing first.

The other thing to note is that since you are putting a dictionary's
string representation out straight into your HTML if there are odd
characters in it this may give you strange output in the browser, so you
should view the page source to ensure that's not the case. Which it
probably isn't ...

I know that's not a problem because it does work when I use parse_qs.
(I know about escaping HTML and all that, but this is just a little test
program.)

rg
 
S

Steve Holden

Ron said:
Doesn't work. (I even tried sys.stdin=r.rfile; s=cgi.parse()) Don't
forget, this is not a CGI script, it's a handler for a BaseHTTPServer.
Right. My bad. However there's clearly something screwy going on,
because otherwise you'd expect to see at least an empty dictionary in
the output.

Reading the source of the 2.4.3 library shows that someone added an
environ=os.environ argument, which will be the second argument on a
positional call, so that clears that mystery up. The doicumentation
should really show these as keyword arguments rather than implying they
are positionals. It'd be nice if you could report this as a
documentation bug - though I believe by now the 2.5rc2 release will be
frozen.
I know that's not a problem because it does work when I use parse_qs.
(I know about escaping HTML and all that, but this is just a little test
program.)
I suspect that the remainder of your problems (cgi_parse appears to be
returning a *string*, dammit) are due to the fact that the process you
are running the HTTP server in doesn't have the environment variables
set that a server would set if it really were being called in a CGI
context, and which the CGI library expects to be set. You could try
passing them as an explicit environ argument and see if that worked.

But basically, you aren't providing a CGI environment, and that's why
cgi.parse() isn't working.

regards
Steve
 
R

Ron Garret

Steve Holden said:
But basically, you aren't providing a CGI environment, and that's why
cgi.parse() isn't working.

Clearly. So what should I be doing? Surely I'm not the first person to
have this problem?

I have managed to work around this for now by copying and modifying the
code in cgi.parse, but this still feels like a Horrible Hack to me.

rg
 
D

Damjan

But basically, you aren't providing a CGI environment, and that's why
Clearly. So what should I be doing?

Probably you'll need to read the source of cgi.parse_qs (like Steve did) and
see what it needs from os.environ and then provide that (either in
os.environ or in a custom environ dictionary).

BUT why don't you use WSGI?
 
S

Steve Holden

Ron said:
Clearly. So what should I be doing? Surely I'm not the first person to
have this problem?

I have managed to work around this for now by copying and modifying the
code in cgi.parse, but this still feels like a Horrible Hack to me.
Let me get this right. You are aware that CGIHTTPServer module exists.
But you don't want to use that. Instead you want to use your own code.
So you have ended up duplicating some of the functionality of the cgi
library. And it feels like a hack.

Have I missed anything? :)

regards
Steve
 
K

Kent Johnson

Steve said:
Let me get this right. You are aware that CGIHTTPServer module exists.
But you don't want to use that. Instead you want to use your own code.
So you have ended up duplicating some of the functionality of the cgi
library. And it feels like a hack.

Have I missed anything? :)

Hey, be nice. Wanting to write a request handler that actually handles a
POST request doesn't seem so unreasonable.

Except...when there are about a bazillion Python web frameworks to
choose from, why start from BaseHTTPServer? Why not use one of the
simpler frameworks like Karrigell or Snakelets or CherryPy?

Here is the query-handling code from Karrigell's CustomHTTPServer.py,
good at least for a second opinion:

def do_POST(self):
"""Begin serving a POST request. The request data must be readable
on a file-like object called self.rfile"""
ctype, pdict =
cgi.parse_header(self.headers.getheader('content-type'))
self.body = cgi.FieldStorage(fp=self.rfile,
headers=self.headers, environ = {'REQUEST_METHOD':'POST'},
keep_blank_values = 1, strict_parsing = 1)
# throw away additional data [see bug #427345]
while select.select([self.rfile._sock], [], [], 0)[0]:
if not self.rfile._sock.recv(1):
break
self.handle_data()

Here is CherryPy's version from CP 2.1:

# Create a copy of headerMap with lowercase keys because
# FieldStorage doesn't work otherwise
lowerHeaderMap = {}
for key, value in request.headerMap.items():
lowerHeaderMap[key.lower()] = value

# FieldStorage only recognizes POST, so fake it.
methenv = {'REQUEST_METHOD': "POST"}
try:
forms = _cpcgifs.FieldStorage(fp=request.rfile,
headers=lowerHeaderMap,
environ=methenv,
keep_blank_values=1)

where _cpcgifs.FieldStorage is cgi.FieldStorage with some extra accessors.

HTH,
Kent
 
R

Ron Garret

Steve Holden said:
Let me get this right. You are aware that CGIHTTPServer module exists.
But you don't want to use that.

That's right. I don't want to run CGI scripts. I don't want to launch
a new process for every request. I want all requests handled in the
server process.
Instead you want to use your own code.

No, the whole reason I'm asking this question is because I *don't* want
to write my own code. It seems to me that the code to do what I want
ought to be out there (or in there) somewhere and I shouldn't have to
reinvent this wheel. But I can't find it.
So you have ended up duplicating some of the functionality of the cgi
library. And it feels like a hack.

Yep.

rg
 
R

Ron Garret

Kent Johnson said:
Hey, be nice. Wanting to write a request handler that actually handles a
POST request doesn't seem so unreasonable.

Except...when there are about a bazillion Python web frameworks to
choose from, why start from BaseHTTPServer? Why not use one of the
simpler frameworks like Karrigell or Snakelets or CherryPy?

It may come to that. I just thought that what I'm trying to do is so
basic that it ought to be part of the standard library. I mean, what do
people use BaseHTTPServer for if you can't parse form input?
Here is the query-handling code from Karrigell's CustomHTTPServer.py,
good at least for a second opinion:

def do_POST(self):
"""Begin serving a POST request. The request data must be readable
on a file-like object called self.rfile"""
ctype, pdict =
cgi.parse_header(self.headers.getheader('content-type'))
self.body = cgi.FieldStorage(fp=self.rfile,
headers=self.headers, environ = {'REQUEST_METHOD':'POST'},
keep_blank_values = 1, strict_parsing = 1)
# throw away additional data [see bug #427345]
while select.select([self.rfile._sock], [], [], 0)[0]:
if not self.rfile._sock.recv(1):
break
self.handle_data()

Here is CherryPy's version from CP 2.1:

# Create a copy of headerMap with lowercase keys because
# FieldStorage doesn't work otherwise
lowerHeaderMap = {}
for key, value in request.headerMap.items():
lowerHeaderMap[key.lower()] = value

# FieldStorage only recognizes POST, so fake it.
methenv = {'REQUEST_METHOD': "POST"}
try:
forms = _cpcgifs.FieldStorage(fp=request.rfile,
headers=lowerHeaderMap,
environ=methenv,
keep_blank_values=1)

where _cpcgifs.FieldStorage is cgi.FieldStorage with some extra accessors.

Here's what I actually ended up doing:

def parse(r):
ctype = r.headers.get('content-type')
if not ctype: return None
ctype, pdict = cgi.parse_header(ctype)
if ctype == 'multipart/form-data':
return cgi.parse_multipart(r.rfile, pdict)
elif ctype == 'application/x-www-form-urlencoded':
clength = int(r.headers.get('Content-length'))
if maxlen and clength > maxlen:
raise ValueError, 'Maximum content length exceeded'
return cgi.parse_qs(r.rfile.read(clength), 1)
else:
return None

which is copied more or less directly from cgi.py. But it still seems
to me like this (or something like it) ought to be standardized in one
of the *HTTPServer.py modules.

But what do I know?

rg
 
R

Ron Garret

Clearly. So what should I be doing?

Probably you'll need to read the source of cgi.parse_qs (like Steve did) and
see what it needs from os.environ and then provide that (either in
os.environ or in a custom environ dictionary).[/QUOTE]

I ended up just copying and hacking the code. It was only a dozen lines
or so. But it still feels wrong.
BUT why don't you use WSGI?

Because BaseHTTPServer does everything I need except for this one thing.
Why use a sledge hammer to squish a gnat?

rg
 
S

Steve Holden

Ron said:
Hey, be nice. Wanting to write a request handler that actually handles a
POST request doesn't seem so unreasonable.

Except...when there are about a bazillion Python web frameworks to
choose from, why start from BaseHTTPServer? Why not use one of the
simpler frameworks like Karrigell or Snakelets or CherryPy?


It may come to that. I just thought that what I'm trying to do is so
basic that it ought to be part of the standard library. I mean, what do
people use BaseHTTPServer for if you can't parse form input?

Here is the query-handling code from Karrigell's CustomHTTPServer.py,
good at least for a second opinion:

def do_POST(self):
"""Begin serving a POST request. The request data must be readable
on a file-like object called self.rfile"""
ctype, pdict =
cgi.parse_header(self.headers.getheader('content-type'))
self.body = cgi.FieldStorage(fp=self.rfile,
headers=self.headers, environ = {'REQUEST_METHOD':'POST'},
keep_blank_values = 1, strict_parsing = 1)
# throw away additional data [see bug #427345]
while select.select([self.rfile._sock], [], [], 0)[0]:
if not self.rfile._sock.recv(1):
break
self.handle_data()

Here is CherryPy's version from CP 2.1:

# Create a copy of headerMap with lowercase keys because
# FieldStorage doesn't work otherwise
lowerHeaderMap = {}
for key, value in request.headerMap.items():
lowerHeaderMap[key.lower()] = value

# FieldStorage only recognizes POST, so fake it.
methenv = {'REQUEST_METHOD': "POST"}
try:
forms = _cpcgifs.FieldStorage(fp=request.rfile,
headers=lowerHeaderMap,
environ=methenv,
keep_blank_values=1)

where _cpcgifs.FieldStorage is cgi.FieldStorage with some extra accessors.


Here's what I actually ended up doing:

def parse(r):
ctype = r.headers.get('content-type')
if not ctype: return None
ctype, pdict = cgi.parse_header(ctype)
if ctype == 'multipart/form-data':
return cgi.parse_multipart(r.rfile, pdict)
elif ctype == 'application/x-www-form-urlencoded':
clength = int(r.headers.get('Content-length'))
if maxlen and clength > maxlen:
raise ValueError, 'Maximum content length exceeded'
return cgi.parse_qs(r.rfile.read(clength), 1)
else:
return None

which is copied more or less directly from cgi.py. But it still seems
to me like this (or something like it) ought to be standardized in one
of the *HTTPServer.py modules.

But what do I know?
I wouldn't necessarily say you are wrong here, It's just that the cgi
module has sort of "just growed", so it isn't conveniently factyored for
reusability in other contexts. Several people (including me) have taken
a look at it with a view to possible re-engineering and backed away
because of the difficulty of maintaining compatibility. Python 3K will
be an ideal oppoertunity to replace it, but until then it's probably
going to stay in the same rather messy but working state.

regards
Steve
 
R

Ron Garret

Steve Holden said:
I wouldn't necessarily say you are wrong here, It's just that the cgi
module has sort of "just growed", so it isn't conveniently factyored for
reusability in other contexts. Several people (including me) have taken
a look at it with a view to possible re-engineering and backed away
because of the difficulty of maintaining compatibility. Python 3K will
be an ideal oppoertunity to replace it, but until then it's probably
going to stay in the same rather messy but working state.

It's not necessary to re-engineer cgi, just cutting and pasting and
editing the code as I've done would seem to suffice.

But all I'm really looking for here at this point is confirmation that
I'm not in fact doing something stupid. In the past I've found that
nine times out of ten if I find myself wanting to rewrite or add
something to a Python module it's an indication that I'm doing something
wrong.

rg
 
E

Eddie Corns

Ron Garret said:
It's not necessary to re-engineer cgi, just cutting and pasting and
editing the code as I've done would seem to suffice.
But all I'm really looking for here at this point is confirmation that
I'm not in fact doing something stupid. In the past I've found that
nine times out of ten if I find myself wanting to rewrite or add
something to a Python module it's an indication that I'm doing something
wrong.

Well if it's any consolation; that's exactly what I did - cut about 7 lines
from CGIHTTPSERVER into my do_POST method. Maybe we're both stoopid. This
was at least 3 years ago before I moved on to Quixote and then
CherryPy/TurboGears but I recall thinking at the time that it was probably
just one of those little cracks that show up from time to time in the library
(there aren't so very many of them).

Eddie
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,967
Messages
2,570,148
Members
46,694
Latest member
LetaCadwal

Latest Threads

Top