How to apply the user's HTML environment in a Python programme?

B

BobAalsma

I'd like to write a programme that will be offered as a web service (Django), in which the user will point to a specific URL and the programme will be used to read the text of that URL.

This text can be behind a username/password, but for several reasons, I don't want to know those.

So I would like to set up a situation where the user logs in (if/when appropriate), points out the URL to my programme and my programme would then be able to read that particular text.

I'm aware this may sound fishy. It should not be: I want the user to be fully aware and in control of this process.

Any thoughts on how to approach this?

Best regards,
Bob
 
J

Joel Goldstick

I'd like to write a programme that will be offered as a web service (Django), in which the user will point to a specific URL and the programme will be used to read the text of that URL.

This text can be behind a username/password, but for several reasons, I don't want to know those.

So I would like to set up a situation where the user logs in (if/when appropriate), points out the URL to my programme and my programme would then be able to read that particular text.

I'm aware this may sound fishy. It should not be: I want the user to be fully aware and in control of this process.

Any thoughts on how to approach this?

There are several python modules to get web pages. urllib, urllib2
and another called requests.
(http://kennethreitz.com/requests-python-http-module.html) Check
those out
 
J

Jerry Hill

Thanks, Joel, yes, but as far as I'm aware these would all require the Python programme to have the user's username and password (or "credentials"), which I wanted to avoid.

No matter what you do, your web service is going to have to
authenticate with the remote web site. The details of that
authentication are going to vary with each remote web site you want to
connect to.
 
B

BobAalsma

Op vrijdag 21 september 2012 15:36:11 UTC+2 schreef Jerry Hill het volgende:
No matter what you do, your web service is going to have to

authenticate with the remote web site. The details of that

authentication are going to vary with each remote web site you want to

connect to.

Hmm, from the previous posts I get the impression that I could best solve this by asking the user for the specific combination of username, password and URL + promising not to keep any of that...

OK, that does sound doable - thank you all

Bob
 
B

BobAalsma

Op vrijdag 21 september 2012 15:36:11 UTC+2 schreef Jerry Hill het volgende:
No matter what you do, your web service is going to have to

authenticate with the remote web site. The details of that

authentication are going to vary with each remote web site you want to

connect to.

Hmm, from the previous posts I get the impression that I could best solve this by asking the user for the specific combination of username, password and URL + promising not to keep any of that...

OK, that does sound doable - thank you all

Bob
 
J

Joel Goldstick

Op vrijdag 21 september 2012 15:36:11 UTC+2 schreef Jerry Hill het volgende:

Hmm, from the previous posts I get the impression that I could best solve this by asking the user for the specific combination of username, password and URL + promising not to keep any of that...

OK, that does sound doable - thank you all


I recommend that you write your program to read pages that are not
protected. Once you get that working, you can go back and figure out
how you want to get the username/password from your 'friends' and add
that in. Also look up Beautiful Soup (version 4) for a great library
to parse the pages that you retrieve
 
P

Peter Otten

BobAalsma said:
Hmm, from the previous posts I get the impression that I could best solve
this by asking the user for the specific combination of username, password
and URL + promising not to keep any of that...

OK, that does sound doable - thank you all

Hmm, promising seems doable, but keeping?
 
D

David Smith

This text can be behind a username/password, but for several reasons, I don't want to know those.

So I would like to set up a situation where the user logs in (if/when appropriate), points out the URL to my programme and my programme would then be able to read that particular text.
I do this from a bat file that I will later translate to Python.
I tell my work wiki which file I want. I use chrome, so for every new
session I'm asked for my credentials. However, that is all transparent
to my bat file.

For that matter, when I download a new build from part of another bat
file, I use Firefox and never see the credential exchange.

I wouldn't expect any different behavior using Python.
 
D

Dennis Lee Bieber

No matter what you do, your web service is going to have to
authenticate with the remote web site. The details of that
authentication are going to vary with each remote web site you want to
connect to.

Hmmm, convoluted but presuming the "login" third party site uses
cookies... Would it be possible to use Javascript on the client "copy"
the HTML from the third-party and then transmit it to the application
rather than having the application trying to do a direct fetch given
just the URL?

This should keep the authentication local to the client machine.
 
B

BobAalsma

Op vrijdag 21 september 2012 17:28:02 UTC+2 schreef David Smith het volgende:
I do this from a bat file that I will later translate to Python.

I tell my work wiki which file I want. I use chrome, so for every new

session I'm asked for my credentials. However, that is all transparent

to my bat file.



For that matter, when I download a new build from part of another bat

file, I use Firefox and never see the credential exchange.



I wouldn't expect any different behavior using Python.

Umm, David, sorry, you've lost me but I think this could be a good solution - at least the division in client side/server side sounds like what I'm looking for. Could you please elaborate?

Bob
 
B

BobAalsma

Op vrijdag 21 september 2012 17:28:02 UTC+2 schreef David Smith het volgende:
I do this from a bat file that I will later translate to Python.

I tell my work wiki which file I want. I use chrome, so for every new

session I'm asked for my credentials. However, that is all transparent

to my bat file.



For that matter, when I download a new build from part of another bat

file, I use Firefox and never see the credential exchange.



I wouldn't expect any different behavior using Python.

Umm, David, sorry, you've lost me but I think this could be a good solution - at least the division in client side/server side sounds like what I'm looking for. Could you please elaborate?

Bob
 
B

BobAalsma

Op vrijdag 21 september 2012 22:10:04 UTC+2 schreef Dennis Lee Bieber het volgende:
Hmmm, convoluted but presuming the "login" third party site uses

cookies... Would it be possible to use Javascript on the client "copy"

the HTML from the third-party and then transmit it to the application

rather than having the application trying to do a direct fetch given

just the URL?



This should keep the authentication local to the client machine.





--

Wulfraed Dennis Lee Bieber AF6VN

[email protected] HTTP://wlfraed.home.netcom.com/

Wulfraed, yes, as with David's proposal: this sounds good, but I wouldn't know the first thing about Javascript...
I'm also concerned that both solutions would seem to imply distributing software (or "software") to the clients systems.
Hmm.

Bob
 
B

BobAalsma

Op vrijdag 21 september 2012 22:10:04 UTC+2 schreef Dennis Lee Bieber het volgende:
Hmmm, convoluted but presuming the "login" third party site uses

cookies... Would it be possible to use Javascript on the client "copy"

the HTML from the third-party and then transmit it to the application

rather than having the application trying to do a direct fetch given

just the URL?



This should keep the authentication local to the client machine.





--

Wulfraed Dennis Lee Bieber AF6VN

[email protected] HTTP://wlfraed.home.netcom.com/

Wulfraed, yes, as with David's proposal: this sounds good, but I wouldn't know the first thing about Javascript...
I'm also concerned that both solutions would seem to imply distributing software (or "software") to the clients systems.
Hmm.

Bob
 
T

Thomas Jollans

I'd like to write a programme that will be offered as a web service (Django), in which the user will point to a specific URL and the programme will be used to read the text of that URL.

This text can be behind a username/password, but for several reasons, I don't want to know those.

So I would like to set up a situation where the user logs in (if/when appropriate), points out the URL to my programme and my programme would then be able to read that particular text.

I'm aware this may sound fishy. It should not be: I want the user to be fully aware and in control of this process.

Any thoughts on how to approach this?

What services are you planning to interface with? Many services (twitter
being a notable pioneer) have systems for external (web) applications to
log in without being given a user's username & password.

I think it's possible to load a page in an iframe and access it using
JavaScript/DOM from the parent page. This is probably what you'll want
to do.

You say you don't know the first thing about JavaScript. Well, my
friend, if you're developing for the web, learn JavaScript, or,
depending on your situation, hire a front end developer who knows
JavaScript. You can only do so much on the web without using JavaScript.
I recently discovered this guide to learning JS; it sounds reasonable:
http://javascriptissexy.com/how-to-learn-javascript-properly/

http://pyjs.org/ may be worth a look too.


-- Thomas

PS: Most of your messages appear to be both To: and Cc: this list.
Please stop sending each message twice, it's rather distracting.
 
D

Dennis Lee Bieber

Wulfraed, yes, as with David's proposal: this sounds good, but I wouldn't know the first thing about Javascript...
I'm also concerned that both solutions would seem to imply distributing software (or "software") to the clients systems.
Hmm.

Unless your clients are running some ancient text-only browser, they
already have the Javascript interpreter running. We aren't talking about
downloading a Java program that then runs as a process on the client's
machine. If your client's ever visit (since I have it up at the moment)
the Amazon forum pages, they are already running Javascript pages.

Here's the start of the page source as an example (don't ask me what
it does):

<html>
<head>
<script type="text/javascript">/* <![CDATA[ */var ue_t0=ue_t0||+new
Date();/* ]]> */</script>
<script type='text/javascript'>/* <![CDATA[ */
var ue_wl_jserr = 1,
ue_csm = window;
(function(a){a.ue_err={ec:0,pec:0,ts:0,erl:[],mxe:50,startTimer:function(){a.ue_err.ts++;setInterval(function(){a.ue&&(a.ue_err.pec<a.ue_err.ec)&&a.uex("at");a.ue_err.pec=a.ue_err.ec},10000)}};a.ueLogError=(function(){function
b(c,e,d){if(a.ue_err.ec>a.ue_err.mxe){return}a.ue_err.ec++;a.ue.log({m:c,f:e,l:d,s:""},"jserr");return
false}if(a.ue_wl_jserr){window.onerror=b}return
function(c){if(a.ue_err.ec>a.ue_err.mxe){return}a.ue_err.ec++;a.ue_err.erl.push(c)}})()})(ue_csm);

/* ]]> */</script>
 
B

BobAalsma

Op vrijdag 21 september 2012 16:15:30 UTC+2 schreef Joel Goldstick het volgende:
I recommend that you write your program to read pages that are not

protected. Once you get that working, you can go back and figure out

how you want to get the username/password from your 'friends' and add

that in. Also look up Beautiful Soup (version 4) for a great library

to parse the pages that you retrieve

Joel,

I've spent some time with this but don't really understand my results - some help would be appreciated.
I've built a tester that will read my LinkedIn home page, which is password protected.
When I use that method for reading other people's pages, the program is redirected to the LinkedIn login page.
When I paste the URLs for the other people's pages in any browser, the requested pages are shown.

Bob
 
R

Ramchandra Apte

Op vrijdag 21 september 2012 16:15:30 UTC+2 schreef Joel Goldstick het volgende:




Joel,



I've spent some time with this but don't really understand my results - some help would be appreciated.

I've built a tester that will read my LinkedIn home page, which is password protected.

When I use that method for reading other people's pages, the program is redirected to the LinkedIn login page.

When I paste the URLs for the other people's pages in any browser, the requested pages are shown.



Bob

Not all the authentication information is in the URL.
Some of it is in cookies in the browser.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,989
Messages
2,570,207
Members
46,783
Latest member
RickeyDort

Latest Threads

Top