urllib equivalent for HTTP requests

K

K

Hello everyone,

I understand that urllib and urllib2 serve as really simple page
request libraries. I was wondering if there is a library out there
that can get the HTTP requests for a given page.

Example:
URL: http://www.google.com/test.html

Something like: urllib.urlopen('http://www.google.com/
test.html').files()

Lists HTTP Requests attached to that URL:
=> http://www.google.com/test.html
=> http://www.google.com/css/google.css
=> http://www.google.com/js/js.css

The other fun part is the inclusion of JS within <script> tags, i.e.
the new Google Analytics script
=> http://www.google-analytics.com/ga.js

or css, @imports
=> http://www.google.com/css/import.css

I would like to keep track of that but I realize that py does not have
a JS engine. :( Anyone with ideas on how to track these items or am I
out of luck.

Thanks,
K
 
D

Diez B. Roggisch

K said:
Hello everyone,

I understand that urllib and urllib2 serve as really simple page
request libraries. I was wondering if there is a library out there
that can get the HTTP requests for a given page.

Example:
URL: http://www.google.com/test.html

Something like: urllib.urlopen('http://www.google.com/
test.html').files()

Lists HTTP Requests attached to that URL:
=> http://www.google.com/test.html
=> http://www.google.com/css/google.css
=> http://www.google.com/js/js.css


There are no "Requests attached" to an url. There is a HTML-document
behind it, that might contain further external references.
The other fun part is the inclusion of JS within <script> tags, i.e.
the new Google Analytics script
=> http://www.google-analytics.com/ga.js

or css, @imports
=> http://www.google.com/css/import.css

I would like to keep track of that but I realize that py does not have
a JS engine. :( Anyone with ideas on how to track these items or am I
out of luck.

You can use e.g. BeautifulSoup to extract all links from the site.

What you can't do though is to get the requests that are issued by
Javascript that is *running*.

Diez
 
D

david.lyon

Hi All,

I have chosen to use a Django app for a customer site and wish to put
it up on the net.

Before I waste all day trying it myself (and probably getting it
wrong) I thought I would ask the experts here.

My questions are:

- can most everyday vanilla linux web hosts run a django site ?

- can most everyday vanilla linux web hosts run python web scripts?

Thanks

David
 
T

Tim Chase

My questions are:
- can most everyday vanilla linux web hosts run a django site ?

- can most everyday vanilla linux web hosts run python web scripts?

Depends on your definition of "most everyday vanilla linux web
hosts". :)

The bottom-of-the-barrel hosts will often (but not always) offer
Python CGI. Django "can" run in a CGI (google for "django
cgi"[0]), but it's an unpleasant experience because the entire
Django framework gets reloaded for *every* request.
Doable/tolerable for a private development/family page, but it
will likely flounder under the slightest load.

This is like strapping a jet engine (Django) on a bicycle (CGI).
[1] Doable, but more for the macho-factor of "I got it
working" rather than the practical aspects.

Your lowest-end hosting services won't offer mod_python or WSGI
(either Apache with mod_wsgi, or others like lighttpd with a wsgi
interface) though WSGI is becoming more popular. There are still
some shared-hosting solutions that facilitate using Django[2]
pretty well. They're not super-cheap, but they're affordable.
The canonical catalog of Django-friendly & Django-capable hosting
services can be found at [3]. If you're just starting out with
Django, it might help to pay a bit more for one of the click-n-go
hosts, while others you'll have to do some of the heavy lifting
(installing Django, as well as possibly other components,
assembling your wsgi startup script, etc) yourself.

Hope this helps,

-tkc


[0]
http://www.google.com/search?q=django cgi

[1]

[2]
http://groups.google.com/group/django-users/browse_thread/thread/cfaf8c04f3dfb56e/

[3]
http://code.djangoproject.com/wiki/DjangoFriendlyWebHosts
 
S

Steve Holden

Tim Chase wrote:
[In response t David Lyon]
My questions are:

- can most everyday vanilla linux web hosts run a django site ?

- can most everyday vanilla linux web hosts run python web scripts?

Depends on your definition of "most everyday vanilla linux web hosts". :)

The bottom-of-the-barrel hosts will often (but not always) offer Python
CGI. Django "can" run in a CGI (google for "django cgi"[0]), but it's
an unpleasant experience because the entire Django framework gets
reloaded for *every* request. Doable/tolerable for a private
development/family page, but it will likely flounder under the slightest
load.

This is like strapping a jet engine (Django) on a bicycle (CGI). [1]
Doable, but more for the macho-factor of "I got it working" rather than
the practical aspects.

Your lowest-end hosting services won't offer mod_python or WSGI (either
Apache with mod_wsgi, or others like lighttpd with a wsgi interface)
though WSGI is becoming more popular. There are still some
shared-hosting solutions that facilitate using Django[2] pretty well.
They're not super-cheap, but they're affordable. The canonical catalog
of Django-friendly & Django-capable hosting services can be found at
[3]. If you're just starting out with Django, it might help to pay a
bit more for one of the click-n-go hosts, while others you'll have to do
some of the heavy lifting (installing Django, as well as possibly other
components, assembling your wsgi startup script, etc) yourself.
There's recently been a discussion about hosting on the django-users
list, which I recommend you think about joining. Both WebFaction and
SliceHost got high marks from many users. I personally use OpenHosting,
who are very Python-friendly and mostly just let you ge on with what you
want to do, which is great if you are comfortable managing your own
email and web services.

regards
Steve
 
L

lkcl

yep.
What you can't do though is to get the requests that are issued byJavascriptthat is *running*.

yes you can. see recent post i made just a few minutes ago giving
some insights into each of the available options.

look up pyv8; pykhtml; spidermonkey; webkit with the python bindings
to its glib bindings - pywebkitgtk - use my patched version and see
http://pyjd.sf.net to get precompiled versions; pyxpcomext and pydom
on developer.mozilla.org; webkit's DumpRenderTree with the --html
option, to name but a few.

there are _tons_ of options. they're just an absolute bastard to
track down, because javascript is such a popular keyword to search
for, the results are almost useless.

l.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,981
Messages
2,570,187
Members
46,728
Latest member
FernMcmull

Latest Threads

Top