BaseHTTPServer weirdness

Ron Garret · Sep 11, 2006

I'm trying to figure out how to use BaseHTTPServer. Here's my little
test app:

=================================

#!/usr/bin/python

from BaseHTTPServer import *

import cgi

class myHandler(BaseHTTPRequestHandler):

def do_GET(r):
s = ''
try:
s = cgi.parse_qs(r.rfile.read(int(r.headers.get
("Content-length"))), 1)
except:
pass

r.send_response(200)
r.send_header("Content-type", "text/html")
r.end_headers()
r.wfile.write("""
<form method=post action=foo>
<input type=text name=text1 value="">
<input type=text name=text2 value="">
<input type=submit>
</form> %s
""" % s)

def do_POST(r):
r.do_GET()

d = HTTPServer(('', 1024), myHandler)
d.serve_forever()

===================================

Two questions:

1. The line:

s = cgi.parse_qs(r.rfile.read(int(r.headers.get("Content-length"))), 1)

feels like a horrible hack. It seems like this would be a better
alternative:

s = cgi.parse(r.rfile)

but that doesn't actually work. Why? What is the Right Way to parse
form data in a BaseHTTPServer?

2. Despite the fact that I'm passing a 1 for the keep_blank_values
argument to cgi.parse_qs, it doesn't actually keep blank values. Is
this a bug, or am I doing something wrong?

Thanks,
rg

Steve Holden · Sep 11, 2006

Ron said:
I'm trying to figure out how to use BaseHTTPServer. Here's my little
test app:

=================================

#!/usr/bin/python

from BaseHTTPServer import *

import cgi

class myHandler(BaseHTTPRequestHandler):

def do_GET(r):
s = ''
try:
s = cgi.parse_qs(r.rfile.read(int(r.headers.get
("Content-length"))), 1)
except:
pass

r.send_response(200)
r.send_header("Content-type", "text/html")
r.end_headers()
r.wfile.write("""
<form method=post action=foo>
<input type=text name=text1 value="">
<input type=text name=text2 value="">
<input type=submit>
</form> %s
""" % s)

def do_POST(r):
r.do_GET()

d = HTTPServer(('', 1024), myHandler)
d.serve_forever()

===================================

Two questions:

1. The line:

s = cgi.parse_qs(r.rfile.read(int(r.headers.get("Content-length"))), 1)

feels like a horrible hack. It seems like this would be a better
alternative:

s = cgi.parse(r.rfile)

but that doesn't actually work. Why? What is the Right Way to parse
form data in a BaseHTTPServer?

The normal way is

s = cgi.parse()

since the CGI script sees the client network socket (after consumption
of HTTP headers) as its standard input. However I'm not sure how much it
currently does in the way on handling strange inputs like gzip
compressed data.

2. Despite the fact that I'm passing a 1 for the keep_blank_values
argument to cgi.parse_qs, it doesn't actually keep blank values. Is
this a bug, or am I doing something wrong?

Sounds like a bug, but then since your parsing looks buggy I'm surprised
you get anything at all. Try using a keyword argument
keep_blank_values=1 just in case the order has changed or something
daft. But fix your parsing first.

The other thing to note is that since you are putting a dictionary's
string representation out straight into your HTML if there are odd
characters in it this may give you strange output in the browser, so you
should view the page source to ensure that's not the case. Which it
probably isn't ...

regards
Steve

Ron Garret · Sep 11, 2006

Steve Holden said:
The normal way is

s = cgi.parse()

since the CGI script sees the client network socket (after consumption
of HTTP headers) as its standard input.

Doesn't work. (I even tried sys.stdin=r.rfile; s=cgi.parse()) Don't
forget, this is not a CGI script, it's a handler for a BaseHTTPServer.

Sounds like a bug, but then since your parsing looks buggy I'm surprised
you get anything at all. Try using a keyword argument
keep_blank_values=1 just in case the order has changed or something
daft. But fix your parsing first.

The other thing to note is that since you are putting a dictionary's
string representation out straight into your HTML if there are odd
characters in it this may give you strange output in the browser, so you
should view the page source to ensure that's not the case. Which it
probably isn't ...

I know that's not a problem because it does work when I use parse_qs.
(I know about escaping HTML and all that, but this is just a little test
program.)

rg

Steve Holden · Sep 11, 2006

Ron said:
Doesn't work. (I even tried sys.stdin=r.rfile; s=cgi.parse()) Don't
forget, this is not a CGI script, it's a handler for a BaseHTTPServer.

Right. My bad. However there's clearly something screwy going on,
because otherwise you'd expect to see at least an empty dictionary in
the output.

Reading the source of the 2.4.3 library shows that someone added an
environ=os.environ argument, which will be the second argument on a
positional call, so that clears that mystery up. The doicumentation
should really show these as keyword arguments rather than implying they
are positionals. It'd be nice if you could report this as a
documentation bug - though I believe by now the 2.5rc2 release will be
frozen.

I know that's not a problem because it does work when I use parse_qs.
(I know about escaping HTML and all that, but this is just a little test
program.)

I suspect that the remainder of your problems (cgi_parse appears to be
returning a *string*, dammit) are due to the fact that the process you
are running the HTTP server in doesn't have the environment variables
set that a server would set if it really were being called in a CGI
context, and which the CGI library expects to be set. You could try
passing them as an explicit environ argument and see if that worked.

But basically, you aren't providing a CGI environment, and that's why
cgi.parse() isn't working.

regards
Steve

Ron Garret · Sep 12, 2006

Steve Holden said:
But basically, you aren't providing a CGI environment, and that's why
cgi.parse() isn't working.

Clearly. So what should I be doing? Surely I'm not the first person to
have this problem?

I have managed to work around this for now by copying and modifying the
code in cgi.parse, but this still feels like a Horrible Hack to me.

rg

Damjan · Sep 12, 2006

But basically, you aren't providing a CGI environment, and that's why

Clearly. So what should I be doing?

Probably you'll need to read the source of cgi.parse_qs (like Steve did) and
see what it needs from os.environ and then provide that (either in
os.environ or in a custom environ dictionary).

BUT why don't you use WSGI?

Steve Holden · Sep 12, 2006

Ron said:
Clearly. So what should I be doing? Surely I'm not the first person to
have this problem?

I have managed to work around this for now by copying and modifying the
code in cgi.parse, but this still feels like a Horrible Hack to me.

Let me get this right. You are aware that CGIHTTPServer module exists.
But you don't want to use that. Instead you want to use your own code.
So you have ended up duplicating some of the functionality of the cgi
library. And it feels like a hack.

Have I missed anything?

regards
Steve

Kent Johnson · Sep 12, 2006

Steve said:
Let me get this right. You are aware that CGIHTTPServer module exists.
But you don't want to use that. Instead you want to use your own code.
So you have ended up duplicating some of the functionality of the cgi
library. And it feels like a hack.

Have I missed anything?

Hey, be nice. Wanting to write a request handler that actually handles a
POST request doesn't seem so unreasonable.

Except...when there are about a bazillion Python web frameworks to
choose from, why start from BaseHTTPServer? Why not use one of the
simpler frameworks like Karrigell or Snakelets or CherryPy?

Here is the query-handling code from Karrigell's CustomHTTPServer.py,
good at least for a second opinion:

def do_POST(self):
"""Begin serving a POST request. The request data must be readable
on a file-like object called self.rfile"""
ctype, pdict =
cgi.parse_header(self.headers.getheader('content-type'))
self.body = cgi.FieldStorage(fp=self.rfile,
headers=self.headers, environ = {'REQUEST_METHOD':'POST'},
keep_blank_values = 1, strict_parsing = 1)
# throw away additional data [see bug #427345]
while select.select([self.rfile._sock], [], [], 0)[0]:
if not self.rfile._sock.recv(1):
break
self.handle_data()

Here is CherryPy's version from CP 2.1:

# Create a copy of headerMap with lowercase keys because
# FieldStorage doesn't work otherwise
lowerHeaderMap = {}
for key, value in request.headerMap.items():
lowerHeaderMap[key.lower()] = value

# FieldStorage only recognizes POST, so fake it.
methenv = {'REQUEST_METHOD': "POST"}
try:
forms = _cpcgifs.FieldStorage(fp=request.rfile,
headers=lowerHeaderMap,
environ=methenv,
keep_blank_values=1)

where _cpcgifs.FieldStorage is cgi.FieldStorage with some extra accessors.

HTH,
Kent

Ron Garret · Sep 12, 2006

Steve Holden said:
Let me get this right. You are aware that CGIHTTPServer module exists.
But you don't want to use that.

That's right. I don't want to run CGI scripts. I don't want to launch
a new process for every request. I want all requests handled in the
server process.

Instead you want to use your own code.

No, the whole reason I'm asking this question is because I *don't* want
to write my own code. It seems to me that the code to do what I want
ought to be out there (or in there) somewhere and I shouldn't have to
reinvent this wheel. But I can't find it.

So you have ended up duplicating some of the functionality of the cgi
library. And it feels like a hack.

Yep.

rg

Ron Garret · Sep 12, 2006

Kent Johnson said:
Hey, be nice. Wanting to write a request handler that actually handles a
POST request doesn't seem so unreasonable.

Except...when there are about a bazillion Python web frameworks to
choose from, why start from BaseHTTPServer? Why not use one of the
simpler frameworks like Karrigell or Snakelets or CherryPy?

It may come to that. I just thought that what I'm trying to do is so
basic that it ought to be part of the standard library. I mean, what do
people use BaseHTTPServer for if you can't parse form input?

Here is the query-handling code from Karrigell's CustomHTTPServer.py,
good at least for a second opinion:

def do_POST(self):
"""Begin serving a POST request. The request data must be readable
on a file-like object called self.rfile"""
ctype, pdict =
cgi.parse_header(self.headers.getheader('content-type'))
self.body = cgi.FieldStorage(fp=self.rfile,
headers=self.headers, environ = {'REQUEST_METHOD':'POST'},
keep_blank_values = 1, strict_parsing = 1)
# throw away additional data [see bug #427345]
while select.select([self.rfile._sock], [], [], 0)[0]:
if not self.rfile._sock.recv(1):
break
self.handle_data()

Here is CherryPy's version from CP 2.1:

# Create a copy of headerMap with lowercase keys because
# FieldStorage doesn't work otherwise
lowerHeaderMap = {}
for key, value in request.headerMap.items():
lowerHeaderMap[key.lower()] = value

# FieldStorage only recognizes POST, so fake it.
methenv = {'REQUEST_METHOD': "POST"}
try:
forms = _cpcgifs.FieldStorage(fp=request.rfile,
headers=lowerHeaderMap,
environ=methenv,
keep_blank_values=1)

where _cpcgifs.FieldStorage is cgi.FieldStorage with some extra accessors.

Here's what I actually ended up doing:

def parse(r):
ctype = r.headers.get('content-type')
if not ctype: return None
ctype, pdict = cgi.parse_header(ctype)
if ctype == 'multipart/form-data':
return cgi.parse_multipart(r.rfile, pdict)
elif ctype == 'application/x-www-form-urlencoded':
clength = int(r.headers.get('Content-length'))
if maxlen and clength > maxlen:
raise ValueError, 'Maximum content length exceeded'
return cgi.parse_qs(r.rfile.read(clength), 1)
else:
return None

which is copied more or less directly from cgi.py. But it still seems
to me like this (or something like it) ought to be standardized in one
of the *HTTPServer.py modules.

But what do I know?

rg

Ron Garret · Sep 12, 2006

Clearly. So what should I be doing?

Probably you'll need to read the source of cgi.parse_qs (like Steve did) and
see what it needs from os.environ and then provide that (either in
os.environ or in a custom environ dictionary).[/QUOTE]

I ended up just copying and hacking the code. It was only a dozen lines
or so. But it still feels wrong.

BUT why don't you use WSGI?

Because BaseHTTPServer does everything I need except for this one thing.
Why use a sledge hammer to squish a gnat?

rg

Steve Holden · Sep 12, 2006

Ron said:
Hey, be nice. Wanting to write a request handler that actually handles a
POST request doesn't seem so unreasonable.

Except...when there are about a bazillion Python web frameworks to
choose from, why start from BaseHTTPServer? Why not use one of the
simpler frameworks like Karrigell or Snakelets or CherryPy?

Click to expand...

It may come to that. I just thought that what I'm trying to do is so
basic that it ought to be part of the standard library. I mean, what do
people use BaseHTTPServer for if you can't parse form input?

Here is the query-handling code from Karrigell's CustomHTTPServer.py,
good at least for a second opinion:

def do_POST(self):
"""Begin serving a POST request. The request data must be readable
on a file-like object called self.rfile"""
ctype, pdict =
cgi.parse_header(self.headers.getheader('content-type'))
self.body = cgi.FieldStorage(fp=self.rfile,
headers=self.headers, environ = {'REQUEST_METHOD':'POST'},
keep_blank_values = 1, strict_parsing = 1)
# throw away additional data [see bug #427345]
while select.select([self.rfile._sock], [], [], 0)[0]:
if not self.rfile._sock.recv(1):
break
self.handle_data()

Here is CherryPy's version from CP 2.1:

# Create a copy of headerMap with lowercase keys because
# FieldStorage doesn't work otherwise
lowerHeaderMap = {}
for key, value in request.headerMap.items():
lowerHeaderMap[key.lower()] = value

# FieldStorage only recognizes POST, so fake it.
methenv = {'REQUEST_METHOD': "POST"}
try:
forms = _cpcgifs.FieldStorage(fp=request.rfile,
headers=lowerHeaderMap,
environ=methenv,
keep_blank_values=1)

where _cpcgifs.FieldStorage is cgi.FieldStorage with some extra accessors.

Click to expand...

Here's what I actually ended up doing:

def parse(r):
ctype = r.headers.get('content-type')
if not ctype: return None
ctype, pdict = cgi.parse_header(ctype)
if ctype == 'multipart/form-data':
return cgi.parse_multipart(r.rfile, pdict)
elif ctype == 'application/x-www-form-urlencoded':
clength = int(r.headers.get('Content-length'))
if maxlen and clength > maxlen:
raise ValueError, 'Maximum content length exceeded'
return cgi.parse_qs(r.rfile.read(clength), 1)
else:
return None

which is copied more or less directly from cgi.py. But it still seems
to me like this (or something like it) ought to be standardized in one
of the *HTTPServer.py modules.

But what do I know?

I wouldn't necessarily say you are wrong here, It's just that the cgi
module has sort of "just growed", so it isn't conveniently factyored for
reusability in other contexts. Several people (including me) have taken
a look at it with a view to possible re-engineering and backed away
because of the difficulty of maintaining compatibility. Python 3K will
be an ideal oppoertunity to replace it, but until then it's probably
going to stay in the same rather messy but working state.

regards
Steve

Ron Garret · Sep 12, 2006

Steve Holden said:
I wouldn't necessarily say you are wrong here, It's just that the cgi
module has sort of "just growed", so it isn't conveniently factyored for
reusability in other contexts. Several people (including me) have taken
a look at it with a view to possible re-engineering and backed away
because of the difficulty of maintaining compatibility. Python 3K will
be an ideal oppoertunity to replace it, but until then it's probably
going to stay in the same rather messy but working state.

It's not necessary to re-engineer cgi, just cutting and pasting and
editing the code as I've done would seem to suffice.

But all I'm really looking for here at this point is confirmation that
I'm not in fact doing something stupid. In the past I've found that
nine times out of ten if I find myself wanting to rewrite or add
something to a Python module it's an indication that I'm doing something
wrong.

rg

Eddie Corns · Sep 12, 2006

Ron Garret said:
It's not necessary to re-engineer cgi, just cutting and pasting and
editing the code as I've done would seem to suffice.

Click to expand...

But all I'm really looking for here at this point is confirmation that
I'm not in fact doing something stupid. In the past I've found that
nine times out of ten if I find myself wanting to rewrite or add
something to a Python module it's an indication that I'm doing something
wrong.

Click to expand...

Well if it's any consolation; that's exactly what I did - cut about 7 lines
from CGIHTTPSERVER into my do_POST method. Maybe we're both stoopid. This
was at least 3 years ago before I moved on to Quixote and then
CherryPy/TurboGears but I recall thinking at the time that it was probably
just one of those little cracks that show up from time to time in the library
(there aren't so very many of them).

Eddie

Capture the request/response log for local web server through python.	1	Jun 15, 2010
BaseHTTPServer ThreadMixIn not working	1	Oct 3, 2011
HTTPserver: how to access variables of a higher class?	7	Apr 5, 2013
configuration setting for python server	5	Jun 14, 2010
AttributeError: partially initialized module 'cgi' has no attribute 'FieldStorage' (most likely due	0	May 17, 2020
Python, HTTPS (SSL), tlslite and POST method (and lots of pain)	0	Feb 23, 2009
Python, HTTPS (SSL), tlslite and metoda POST (and lots of pain)	5	Feb 23, 2009
Vercel/NextJS: How to access serverless functions from frontend during local development?	0	Jul 16, 2021

BaseHTTPServer weirdness

Ron Garret

Steve Holden

Ron Garret

Steve Holden

Ron Garret

Damjan

Steve Holden

Kent Johnson

Ron Garret

Ron Garret

Ron Garret

Steve Holden

Ron Garret

Eddie Corns

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads