urllib equivalent for HTTP requests

K · Oct 8, 2008

Hello everyone,

I understand that urllib and urllib2 serve as really simple page
request libraries. I was wondering if there is a library out there
that can get the HTTP requests for a given page.

Example:
URL: http://www.google.com/test.html

Something like: urllib.urlopen('http://www.google.com/
test.html').files()

Lists HTTP Requests attached to that URL:
=> http://www.google.com/test.html
=> http://www.google.com/css/google.css
=> http://www.google.com/js/js.css

The other fun part is the inclusion of JS within <script> tags, i.e.
the new Google Analytics script
=> http://www.google-analytics.com/ga.js

or css, @imports
=> http://www.google.com/css/import.css

I would like to keep track of that but I realize that py does not have
a JS engine.

Anyone with ideas on how to track these items or am I
out of luck.

Thanks,
K

Diez B. Roggisch · Oct 8, 2008

K said:
Hello everyone,

I understand that urllib and urllib2 serve as really simple page
request libraries. I was wondering if there is a library out there
that can get the HTTP requests for a given page.

Example:
URL: http://www.google.com/test.html

Something like: urllib.urlopen('http://www.google.com/
test.html').files()

Lists HTTP Requests attached to that URL:
=> http://www.google.com/test.html
=> http://www.google.com/css/google.css
=> http://www.google.com/js/js.css

There are no "Requests attached" to an url. There is a HTML-document
behind it, that might contain further external references.

The other fun part is the inclusion of JS within <script> tags, i.e.
the new Google Analytics script
=> http://www.google-analytics.com/ga.js

or css, @imports
=> http://www.google.com/css/import.css

I would like to keep track of that but I realize that py does not have
a JS engine. Anyone with ideas on how to track these items or am I
out of luck.

You can use e.g. BeautifulSoup to extract all links from the site.

What you can't do though is to get the requests that are issued by
Javascript that is *running*.

Diez

david.lyon · Oct 8, 2008

Hi All,

I have chosen to use a Django app for a customer site and wish to put
it up on the net.

Before I waste all day trying it myself (and probably getting it
wrong) I thought I would ask the experts here.

My questions are:

- can most everyday vanilla linux web hosts run a django site ?

- can most everyday vanilla linux web hosts run python web scripts?

Thanks

David

Tim Chase · Oct 8, 2008

My questions are:

- can most everyday vanilla linux web hosts run a django site ?

- can most everyday vanilla linux web hosts run python web scripts?

Depends on your definition of "most everyday vanilla linux web
hosts".

The bottom-of-the-barrel hosts will often (but not always) offer
Python CGI. Django "can" run in a CGI (google for "django
cgi"[0]), but it's an unpleasant experience because the entire
Django framework gets reloaded for *every* request.
Doable/tolerable for a private development/family page, but it
will likely flounder under the slightest load.

This is like strapping a jet engine (Django) on a bicycle (CGI).
[1] Doable, but more for the macho-factor of "I got it
working" rather than the practical aspects.

Your lowest-end hosting services won't offer mod_python or WSGI
(either Apache with mod_wsgi, or others like lighttpd with a wsgi
interface) though WSGI is becoming more popular. There are still
some shared-hosting solutions that facilitate using Django[2]
pretty well. They're not super-cheap, but they're affordable.
The canonical catalog of Django-friendly & Django-capable hosting
services can be found at [3]. If you're just starting out with
Django, it might help to pay a bit more for one of the click-n-go
hosts, while others you'll have to do some of the heavy lifting
(installing Django, as well as possibly other components,
assembling your wsgi startup script, etc) yourself.

Hope this helps,

-tkc

[0]
http://www.google.com/search?q=django cgi

[1]

[2]
http://groups.google.com/group/django-users/browse_thread/thread/cfaf8c04f3dfb56e/

[3]
http://code.djangoproject.com/wiki/DjangoFriendlyWebHosts

Steve Holden · Oct 9, 2008

Tim Chase wrote:
[In response t David Lyon]

My questions are:

- can most everyday vanilla linux web hosts run a django site ?

- can most everyday vanilla linux web hosts run python web scripts?

Click to expand...

Depends on your definition of "most everyday vanilla linux web hosts".

The bottom-of-the-barrel hosts will often (but not always) offer Python
CGI. Django "can" run in a CGI (google for "django cgi"[0]), but it's
an unpleasant experience because the entire Django framework gets
reloaded for *every* request. Doable/tolerable for a private
development/family page, but it will likely flounder under the slightest
load.

This is like strapping a jet engine (Django) on a bicycle (CGI). [1]
Doable, but more for the macho-factor of "I got it working" rather than
the practical aspects.

Your lowest-end hosting services won't offer mod_python or WSGI (either
Apache with mod_wsgi, or others like lighttpd with a wsgi interface)
though WSGI is becoming more popular. There are still some
shared-hosting solutions that facilitate using Django[2] pretty well.
They're not super-cheap, but they're affordable. The canonical catalog
of Django-friendly & Django-capable hosting services can be found at
[3]. If you're just starting out with Django, it might help to pay a
bit more for one of the click-n-go hosts, while others you'll have to do
some of the heavy lifting (installing Django, as well as possibly other
components, assembling your wsgi startup script, etc) yourself.

There's recently been a discussion about hosting on the django-users
list, which I recommend you think about joining. Both WebFaction and
SliceHost got high marks from many users. I personally use OpenHosting,
who are very Python-friendly and mostly just let you ge on with what you
want to do, which is great if you are comfortable managing your own
email and web services.

regards
Steve

lkcl · Oct 13, 2008

yep.

What you can't do though is to get the requests that are issued byJavascriptthat is *running*.

yes you can. see recent post i made just a few minutes ago giving
some insights into each of the available options.

look up pyv8; pykhtml; spidermonkey; webkit with the python bindings
to its glib bindings - pywebkitgtk - use my patched version and see
http://pyjd.sf.net to get precompiled versions; pyxpcomext and pydom
on developer.mozilla.org; webkit's DumpRenderTree with the --html
option, to name but a few.

there are _tons_ of options. they're just an absolute bastard to
track down, because javascript is such a popular keyword to search
for, the results are almost useless.

l.

urllib and parsing	0	Oct 4, 2011
How to keep cookies when making http requests (Python 2.7)	8	Aug 20, 2013
charset problems with urllib/urllib2	0	Feb 23, 2009
Making HTTP requests using Twisted	4	Jul 11, 2006
Count of http requests	0	Oct 3, 2007
help on HTTP 400 Bad Request syntax error on urllib2.urlopen	0	Jan 10, 2012
Using a python web client behind a proxy (urllib and twisted.web)	1	Apr 13, 2005
HTTP GET requests in a Ruby CGI Script	5	Mar 29, 2008

urllib equivalent for HTTP requests

K

Diez B. Roggisch

david.lyon

Tim Chase

Steve Holden

lkcl

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads