download robot

L

larryzhang

Hi,
Being a newbie for Python, I am trying to write a code that can act as
a downloading robot.

The website provides information for companies. Manually, I can search
by company name and then click the “download” button to get the data
in excel or word format, before saving the file in a local directory.
The program is to do this automatically.

I have met several problems when writing the codes:
1. The website needs user ID and password, is there a way that I can
pass my ID and password to the server in my python code?
2. Can Python hit the “download” button automatically and choose the
type of file format as I can do manually?
3. The url of each downloading webpage is not unique (webpages point
to different data files may share the same url), which prevent me from
working directly with the url as the address to find a certain file.
Is there any solution for this? Does this mean I have to work directly
with the database stored in the server rather than with the webpage
displayed?

Thank you very much for any comments and suggestions.

Larry
 
K

Kushal Kumaran

Hi,
Being a newbie for Python, I am trying to write a code that can act as
a downloading robot.

This might be useful: http://wwwsearch.sourceforge.net/mechanize/.
I've only casually gone through the page, not actually used it. If
you feel like it, you can also use the urllib2 in the library to do
all the work yourself. Notes if you go this way are below.
The website provides information for companies. Manually, I can search
by company name and then click the “download†button to get the data
in excel or word format, before saving the file in a local directory.
The program is to do this automatically.

I have met several problems when writing the codes:
1. The website needs user ID and password, is there a way that I can
pass my ID and password to the server in my python code?

See the examples in the urllib2 documentation for how to send a
username and password for Basic authentication. If the authentication
is done using forms, you'll need to put that data with your request.
The website might then use cookies to track you, so your code will
need to be prepared to handle that.
2. Can Python hit the “download†button automatically and choose the
type of file format as I can do manually?

The download button will probably be just an appropriate GET or POST
request. You'll need to be familiar with HTML forms to be able to do
this.
3. The url of each downloading webpage is not unique (webpages point
to different data files may share the same url), which prevent me from
working directly with the url as the address to find a certain file.
Is there any solution for this? Does this mean I have to work directly
with the database stored in the server rather than with the webpage
displayed?

This simply means that the identifiers for the file to download are
being passed in using means other than the URL, most likely as POST
data. Look at the HTML for the page to see how.
Thank you very much for any comments and suggestions.

You'll find tools that let you observe the communication between your
browser and the web server useful. If you use Mozilla Firefox, the
httpfox extension might help.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,225
Members
46,815
Latest member
treekmostly22

Latest Threads

Top