Python Scripts to logon to websites

B

BartlebyScrivener

New to Python and Programming. Trying to make scripts that will open
sites and automatically log me on.

The following example is from the urllib2 module.

What are "realm" and "host" in this example.

import urllib2
# Create an OpenerDirector with support for Basic HTTP
Authentication...
auth_handler = urllib2.HTTPBasicAuthHandler()
auth_handler.add_password('realm', 'host', 'username', 'password')
opener = urllib2.build_opener(auth_handler)
# ...and install it globally so it can be used with urlopen.
urllib2.install_opener(opener)
urllib2.urlopen('http://www.example.com/login.html')

Does anyone have a simple example of a script that opens, say, gmail or
some other commonly accessed site that requires a username and password
so that I can see how one is made?

Thanks very much for any help.

rpd
 
P

Peter Hansen

BartlebyScrivener said:
New to Python and Programming. Trying to make scripts that will open
sites and automatically log me on.

The following example is from the urllib2 module.

What are "realm" and "host" in this example.

http://www.ietf.org/rfc/rfc2617.txt probably provides more background
than you want on that topic, but googling for "basic authentication" and
maybe "realm" and/or "host" will find you other sites with less
technically detailed material. The first hit has a little summary
amidst some Apache-specific detail.
Does anyone have a simple example of a script that opens, say, gmail or
some other commonly accessed site that requires a username and password
so that I can see how one is made?

"realm" and "host" are associated with "basic authentication" and not
all sites use that. If the browser pops up a little dialog box of its
own (i.e not some Javascript-triggered thing) and you have to enter your
username and password there, that's probably a "basic auth" (or "digest
auth") site. If you fill that info into a form (as on gmail.com) you
don't want any of that "realm/host" stuff.

I'll leave it to others more expert in this to provide a more directly
useful answer.

-Peter
 
M

Mike Meyer

BartlebyScrivener said:
New to Python and Programming. Trying to make scripts that will open
sites and automatically log me on.

A common enough things to want to do.
The following example is from the urllib2 module.

What are "realm" and "host" in this example.

Host is a domain name that can be mapped to a ip address. Realm is
from HTTP authentication schemes. When the server asks for
authentication, it gives out a "realm" name as well, so that different
parts of the host can use different authentication systems.
Does anyone have a simple example of a script that opens, say, gmail or
some other commonly accessed site that requires a username and password
so that I can see how one is made?

Yes, but its not clear how much good it'll do you. As Peter indicated,
not everyone uses HTTP based authentication. In fact, pretty much
anyone who wants to control how the authentication boxes look (which
seems to be 99% of the people writing web apps, never mind that they
can't really do that) use something other than HTTP-based
authentication. How you go about dealing with such sites depends on
where they put the user name/login information,and how they encode the
fact that you've authenticated as user "xxxx".

So I could show you my script for accessing yahoo. However, it
probably won't work on another site without changes to accomodate the
other site.

<mike
 
?

=?ISO-8859-1?Q?Tomi_Ky=F6stil=E4?=

BartlebyScrivener said:
New to Python and Programming. Trying to make scripts that will open
sites and automatically log me on. [snip]
Does anyone have a simple example of a script that opens, say, gmail or
some other commonly accessed site that requires a username and password
so that I can see how one is made?

I see your example uses HTTP authentication, but I still recommend
checking out mechanoid [1] if you want to access a site with a
form-based login system. The source contains an example that retreives
and sends email through Yahoo.

[1] http://cheeseshop.python.org/pypi/mechanoid/
 
P

Peter Hansen

BartlebyScrivener said:
This looks promising, but it'll take me a week to understand it :)

http://www.voidspace.org.uk/python/articles/authentication.shtm

(Minor typo... needs an extra "l" on the end:

http://www.voidspace.org.uk/python/articles/authentication.shtml
)

By the way, note that neither basic auth nor digest auth provide any
real security, and in fact with basic auth the userid and password are
sent *in cleartext*. For any serious production site these techniques
should probably not be used without additional security measures in
place, such as HTTPS encryption.

-Peter
 
B

BartlebyScrivener

Thanks, Peter.

Peter said:
(Minor typo... needs an extra "l" on the end:

http://www.voidspace.org.uk/python/articles/authentication.shtml
)

By the way, note that neither basic auth nor digest auth provide any
real security, and in fact with basic auth the userid and password are
sent *in cleartext*. For any serious production site these techniques
should probably not be used without additional security measures in
place, such as HTTPS encryption.

-Peter
 
M

Mike Meyer

Peter Hansen said:
By the way, note that neither basic auth nor digest auth provide any
real security, and in fact with basic auth the userid and password are
sent *in cleartext*. For any serious production site these techniques
should probably not be used without additional security measures in
place, such as HTTPS encryption.

To be clear, the HTTP authentication schemes don't provide any
security for the *content* that gets passed back and forth, and they
don't claim to. If someone can intercept that content, they can read
it. For some applications, this is really important. For others, it
doesn't matter at all.

Basic auth doesn't (quite) pass the user name and password in
cleartext. It uses rot-13. For all the protection it provides, it
might as well be cleartext.

Digest passes around md5 sums of varous bits and pieces. While md5 has
been compromised, I don't believe that's happened in a way that
compromises the security of digest auth. The password and username
that pass over the wire are about as secure as they're going to get
without noticably heavier mechanisms than digest auth requires. On the
downside, the server has to have the clear text password available.

<mike
 
P

Peter Hansen

Mike said:
To be clear, the HTTP authentication schemes don't provide any
security for the *content* that gets passed back and forth, and they
don't claim to. If someone can intercept that content, they can read
it. For some applications, this is really important. For others, it
doesn't matter at all.

If someone can see the content, they can also see the userid and
password. If they can see the password, they will (with how most people
operate) now have a userid and password that will work on many other
sites, including possibly someone's banking site, no matter how secure
even the content might be for that site.

Most people on the web are simply too ignorant of security issues for
those of us building systems that require passwords to ignore this
issue. To do so is to endanger the security and privacy of the very
people you are hoping to have as users and customers, which is lazy and
careless (and perhaps in some countries even criminal these days).
Basic auth doesn't (quite) pass the user name and password in
cleartext. It uses rot-13. For all the protection it provides, it
might as well be cleartext.

It's actually base64 encoding, but it amounts to the same thing, as you
say, as cleartext, since it's trivially reversible. The protection is
useless against all but honest people who might otherwise accidentally
see it while looking at packet monitoring dumps or such.
Digest passes around md5 sums of varous bits and pieces. While md5 has
been compromised, I don't believe that's happened in a way that
compromises the security of digest auth. The password and username
that pass over the wire are about as secure as they're going to get
without noticably heavier mechanisms than digest auth requires. On the
downside, the server has to have the clear text password available.

My information about digest was either obsolete or simply wrong, as I
didn't realize it had all the nonce and anti-replay support it appears
to have. (I may have been remembering articles about how much of that
wasn't supported widely at some time in the past, meaning replays were
still quite possible in most cases. No longer sure.) Thanks for the
correction.

In my own opinion, however, requiring that passwords be stored in clear
text on the server is still quite a bad thing to do. I don't think even
system administrators should ever have access to user passwords. But
many people don't seem to agree (or at least, are more than happy to be
lazy rather than diligent in protecting their users' privacy).

-Peter
 
P

Paul Rubin

Peter Hansen said:
My information about digest was either obsolete or simply wrong, as I
didn't realize it had all the nonce and anti-replay support it appears
to have. (I may have been remembering articles about how much of that
wasn't supported widely at some time in the past, meaning replays were
still quite possible in most cases. No longer sure.) Thanks for the
correction.

Digest is actually rarely used, since sites with enough security
requirements to make it worthwhile generally use SSL/TLS with either
basic auth, or with some login mechanism implemented by the
application. Actually, HTTP authentication (basic or digest) is not
used all that much in general these days, since nontrivial web apps
generally prefer to do their own authentication. It was more common
in the early days of the web when most pages were static.
In my own opinion, however, requiring that passwords be stored in
clear text on the server is still quite a bad thing to do.

Digest auth, like basic auth, doesn't require storing the cleartext
password; only a hash of the password needs to be stored. See RFC
2617 for details.
 
M

Mike Meyer

Peter Hansen said:
If someone can see the content, they can also see the userid and
password.

Only if the userid and password are part of the content. If you're
doing the usual form-based authentication, then they are. If you're
doing an HTTP-based authentication, then they aren't - the
authentication information is in the headers, and can be protected
however the protocol designers want it to be.
Most people on the web are simply too ignorant of security issues for
those of us building systems that require passwords to ignore this
issue. To do so is to endanger the security and privacy of the very
people you are hoping to have as users and customers, which is lazy
and careless (and perhaps in some countries even criminal these days).

Most of the people building systems that require passwords on the web
are too ignorant of security issues for me to trust anything crucial
to them. I don't bank online, because the banking systems I've looked
at don't meet *my* minimal requirements for security.
My information about digest was either obsolete or simply wrong, as I
didn't realize it had all the nonce and anti-replay support it appears
to have. (I may have been remembering articles about how much of that
wasn't supported widely at some time in the past, meaning replays were
still quite possible in most cases. No longer sure.) Thanks for the
correction.

Back when I was dealing with this on a regular basis, the major
browser and server vendors were all pushing encrypted session
mechanisms of various kinds. Given that, a secure authentication
mechanism is a waste of time, and would provide competition for their
product in some application domains. So those vendors typically didn't
implement digest authentication. This sucked if you were exchanging
content that didn't need security, but wanted to authenticate
identity.
In my own opinion, however, requiring that passwords be stored in
clear text on the server is still quite a bad thing to do. I don't
think even system administrators should ever have access to user
passwords. But many people don't seem to agree (or at least, are more
than happy to be lazy rather than diligent in protecting their users'
privacy).

Paul Rubin indicates that this isn't required - so my information is
out of date as well.

<mike
 
P

Paul Rubin

Mike Meyer said:
Only if the userid and password are part of the content. If you're
doing the usual form-based authentication, then they are. If you're
doing an HTTP-based authentication, then they aren't - the
authentication information is in the headers, and can be protected
however the protocol designers want it to be.

Well, HTTP Basic and HTTP Digest authentication both send the userid
in the clear. Basic also sends the password in the clear, while
Digest sends a hash of the (salted) password in the clear. Digest is
better than Basic, but since the attacker can see both the salt and
the password hash, he can still run a dictionary attack. Therefore,
using form-based authentication over SSL is more secure than using
HTTP Digest without SSL. (Special tip from Paranoid Pete: have the
downloaded page include some javascript that inserts some padding
chars into a hidden form field, making the form post have constant
length and thereby prevent leaking the password length).
Most of the people building systems that require passwords on the web
are too ignorant of security issues for me to trust anything crucial
to them. I don't bank online, because the banking systems I've looked
at don't meet *my* minimal requirements for security.

Worse than that, the user agreements typically make security failures
the customer's problem even if they're the bank's fault.
Back when I was dealing with this on a regular basis, the major
browser and server vendors were all pushing encrypted session
mechanisms of various kinds. Given that, a secure authentication
mechanism is a waste of time, and would provide competition for their
product in some application domains. So those vendors typically didn't
implement digest authentication. This sucked if you were exchanging
content that didn't need security, but wanted to authenticate
identity.

I don't have the impression that it was that nefarious. It took a
while for the standards for both encryption and digest authentication
to settle. By the time digest authentication was ready for prime
time, SSL was also widely deployed, and anyone doing anything serious
used SSL. So digest authentication was simply not needed.
 
S

Steve Holden

Peter said:
(Minor typo... needs an extra "l" on the end:

http://www.voidspace.org.uk/python/articles/authentication.shtml
)

By the way, note that neither basic auth nor digest auth provide any
real security, and in fact with basic auth the userid and password are
sent *in cleartext*. For any serious production site these techniques
should probably not be used without additional security measures in
place, such as HTTPS encryption.
Underlining your point, the difference between the two is that digest
offers *strong* authentication (i.e. is not subject to replay attacks)
while basic doesn't (anyone can capture the traffic and use the same
tokens to authorize against the site).

Sometimes strong authentication without confidentiality is a legitimate
requirement.

regards
Steve
 
P

Paul Rubin

Steve Holden said:
Underlining your point, the difference between the two is that digest
offers *strong* authentication (i.e. is not subject to replay attacks)

As I mentioned in another post, that's really not enough, since digest
still exposes the password hash to offline dictionary attacks, which
are sure to nab some passwords if you have a lot of users being
sniffed and you don't impose severe amounts of password discipline on
them. There's also usually no way to log out from an http
authenticated session except by completely closing the browser. All
in all, if you have nontrivial security requirements there's not much
point in using Digest. Use form-based authentication over SSL/TLS
instead. Make sure that the application locks out the user account
(at least temporarily) after too many failed login attempts, something
http authentication implementations that I know of don't bother to do.

For higher security applications (e.g. extranets, admin interfaces,
etc), use client certificates on hardware tokens.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,968
Messages
2,570,153
Members
46,699
Latest member
AnneRosen

Latest Threads

Top