Problem with urllib.py

V

Volker M.

Hey,

I want to open a list of URLs with Pythons urllib and the fuction
open(URL) automatically. It is important that the program open ONLY
normal http-sites and no https-sites with user/password-request.
So exists a possibility that I could cancel all site requests with
user/password-dialogues?

Thx
 
T

Thomas Guettler

Am Thu, 22 Jul 2004 10:43:38 +0200 schrieb Volker M.:
Hey,

I want to open a list of URLs with Pythons urllib and the fuction
open(URL) automatically. It is important that the program open ONLY
normal http-sites and no https-sites with user/password-request.
So exists a possibility that I could cancel all site requests with
user/password-dialogues?

Hi,

urllib is not interactive. If you don't send a
login+password you get an "not authorized" response
with the corresponding http error code.
You can check this return code in your script.

By the way, the user/password request (Pop-Up of browser)
is HTTP Basic Authentication, it can be used with
http or https.

HTH,
Thomas
 
J

John J. Lee

Volker M. said:
I want to open a list of URLs with Pythons urllib and the fuction
open(URL) automatically. It is important that the program open ONLY
normal http-sites and no https-sites with user/password-request.
So exists a possibility that I could cancel all site requests with
user/password-dialogues?

Assuming you mean you don't want to handle Basic HTTP Authentication
(and you don't care whether http or https), you can use
urllib2.urlopen() instead of urllib.urlopen() You will then get a
urllib2.HTTPError with a .code of 401 when a site wants Basic
Authentication.

If you do mean https, though, again with urllib2:

class NullHTTPSHandler(urllib2.HTTPSHandler):
def https_open(self, request):
return None

o = urllib2.build_opener(NullHTTPSHandler())

response = o.open(url)


In general, urllib2 splits up the job of opening URLs into handlers,
so it's more 'turn-off-and-on-able' than urllib.

Since you're writing a robot, one other thing: the alpha version of my
ClientCookie package (urllib2-replacement with addons) contains code
for obeying robots.txt files (albeit not yet well tested, IIRC):

import ClientCookie
o = ClientCookie.build_opener(ClientCookie.HTTPRobotRulesProcessor())

response = o.open(url)


Some time soon I'll have to make a distribution of this stuff that
works properly with 2.4 (which includes changes to urllib2 from
ClientCookie)...


John
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,202
Messages
2,571,057
Members
47,665
Latest member
salkete

Latest Threads

Top