cgi.FieldStorage() is slow

N

Nehal

i wanted to created a simple CGI script for uploading files to my
http server, and decided to use python for it. it seems to work
fine except for one issue: there is a lot of CPU overhead.

after profiling, it seems like:
self.read_lines_to_outerboundary()
which is called from
cgi.FieldStorage()
is the reason why it's so slow.

when uploading small files, you won't notice a difference, but if
you upload files larger than 2 megs, you can notice it. this
happens on both win2k and freebsd

is there some other way to process CGI, excluding doing it all
manually? will this code be improved in the future?
-- thx, Nehal
 
S

Steve Holden

Nehal said:
i wanted to created a simple CGI script for uploading files to my
http server, and decided to use python for it. it seems to work
fine except for one issue: there is a lot of CPU overhead.

after profiling, it seems like:
self.read_lines_to_outerboundary()
which is called from
cgi.FieldStorage()
is the reason why it's so slow.

when uploading small files, you won't notice a difference, but if
you upload files larger than 2 megs, you can notice it. this
happens on both win2k and freebsd

is there some other way to process CGI, excluding doing it all
manually? will this code be improved in the future?
-- thx, Nehal

The cgi module is a confusing mess of code munged, over the years, by
many hands. It would take a brave programmer to plunge in and do what's
necessary.

not-brave-enough-ly y'rs - steve
 
N

Nehal

The cgi module is a confusing mess of code munged, over the
years, by many hands. It would take a brave programmer to plunge
in and do what's necessary.

not-brave-enough-ly y'rs - steve
if i were to do it, could i rewrite the functions in the cgi
module?, or would i need to keep the same functions for backward
compatibility?
-- Nehal
 
S

Steve Holden

Nehal said:
if i were to do it, could i rewrite the functions in the cgi
module?, or would i need to keep the same functions for backward
compatibility?
-- Nehal
Well, code breakage is regarded as pretty bad, so you would need to
retain the same interfaces that the module supports now.

Of course you could implement them completely differently, as long as
they worked the same, and you could add new interfaces as well.

regards
Steve
 
A

Andrew Clover

Nehal said:
when uploading small files, you won't notice a difference, but if
you upload files larger than 2 megs, you can notice it.

Yep. Large file upload in cgi.py is slow. I don't immediately see a
way to speed it up without re-architecting some of its internals.

In any case I dislike(*) the cgi module's interface too, so I rewrote
the lot:

http://www.doxdesk.com/software/py/form.html

This isn't drop-in compatible, and is getting a bit crusty (I'm
expecting to rewrite most of it soon to be more objecty/threadable,
and support WSGI), but in my experience it's considerably faster than
cgi for very large files. (We were commonly using files in the 10-50MB
range.)

(* - more then than now; cgi's interface has got slightly better since
Python 1.5.2's time.)
 
N

Nehal

when uploading small files, you won't notice a difference, but
if you upload files larger than 2 megs, you can notice it.

Yep. Large file upload in cgi.py is slow. I don't immediately
see a way to speed it up without re-architecting some of its
internals.

In any case I dislike(*) the cgi module's interface too, so I
rewrote the lot:

http://www.doxdesk.com/software/py/form.html

This isn't drop-in compatible, and is getting a bit crusty (I'm
expecting to rewrite most of it soon to be more
objecty/threadable, and support WSGI), but in my experience it's
considerably faster than cgi for very large files. (We were
commonly using files in the 10-50MB range.)

(* - more then than now; cgi's interface has got slightly better
since Python 1.5.2's time.)
[/QUOTE]

I have tested Andrew's 'form.py' module, and also upload cgi
scripts from other languages, i did some benchmarking, i tried
uploading a 6 meg file to localhost and writing to an output file
on apache 2.0.52 win32. here are the results (note: i checked the
error log to make sure all scripts were working and processing the
data as expected):

ruby: 2 sec
Andrew Clover's form.py: 2.5 sec
perl: 2.5 sec
tcl (3rd party module): 4.5 sec
python: 8 sec

of course in practice, most people won't be receiving data at 3
megs/sec, so you won't have to process data at such a speed.
nevertheless, it will put a greater load on the CPU, which may be
an issue for many servers.

it would not be a good idea to put Andrew's form.py in the
official python distribution and have 2 different modules for
processing CGI. either it would have to somehow merged into CGI
module, and keeping backward compatibility, or the existing CGI
module must be optimized; maybe the above benchmark data will
motivate someone to do so ;). until then, i'll stick to form.py
-- thx, Nehal
 
A

Andrew Clover

Nehal Mistry said:
nevertheless, it will put a greater load on the CPU, which may be
an issue for many servers.

It certainly was for the ricketty old SPARC box we were trying to
upload umpteen-MB files to! Probably not so critical for most people
these days though.
it would not be a good idea to put Andrew's form.py in the
official python distribution and have 2 different modules for
processing CGI.

Fully agree. In any case the current interface is not general enough
for std lib use.
either it would have to somehow merged into CGI
module, and keeping backward compatibility

If there is interest I could certainly look at providing a cgi-alike
interface to the new version. Personally I was not aiming at the
standard library.

cheers,
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Members online

Forum statistics

Threads
474,212
Messages
2,571,101
Members
47,695
Latest member
KayleneBee

Latest Threads

Top