download robot

larryzhang · Apr 13, 2009

Hi,
Being a newbie for Python, I am trying to write a code that can act as
a downloading robot.

The website provides information for companies. Manually, I can search
by company name and then click the “download” button to get the data
in excel or word format, before saving the file in a local directory.
The program is to do this automatically.

I have met several problems when writing the codes:
1. The website needs user ID and password, is there a way that I can
pass my ID and password to the server in my python code?
2. Can Python hit the “download” button automatically and choose the
type of file format as I can do manually?
3. The url of each downloading webpage is not unique (webpages point
to different data files may share the same url), which prevent me from
working directly with the url as the address to find a certain file.
Is there any solution for this? Does this mean I have to work directly
with the database stored in the server rather than with the webpage
displayed?

Thank you very much for any comments and suggestions.

Larry

Kushal Kumaran · Apr 13, 2009

Hi,
Being a newbie for Python, I am trying to write a code that can act as
a downloading robot.

This might be useful: http://wwwsearch.sourceforge.net/mechanize/.
I've only casually gone through the page, not actually used it. If
you feel like it, you can also use the urllib2 in the library to do
all the work yourself. Notes if you go this way are below.

The website provides information for companies. Manually, I can search
by company name and then click the â€œdownloadâ€ button to get the data
in excel or word format, before saving the file in a local directory.
The program is to do this automatically.

I have met several problems when writing the codes:
1. The website needs user ID and password, is there a way that I can
pass my ID and password to the server in my python code?

See the examples in the urllib2 documentation for how to send a
username and password for Basic authentication. If the authentication
is done using forms, you'll need to put that data with your request.
The website might then use cookies to track you, so your code will
need to be prepared to handle that.

2. Can Python hit the â€œdownloadâ€ button automatically and choose the
type of file format as I can do manually?

The download button will probably be just an appropriate GET or POST
request. You'll need to be familiar with HTML forms to be able to do
this.

3. The url of each downloading webpage is not unique (webpages point
to different data files may share the same url), which prevent me from
working directly with the url as the address to find a certain file.
Is there any solution for this? Does this mean I have to work directly
with the database stored in the server rather than with the webpage
displayed?

This simply means that the identifiers for the file to download are
being passed in using means other than the URL, most likely as POST
data. Look at the HTML for the page to see how.

Thank you very much for any comments and suggestions.

You'll find tools that let you observe the communication between your
browser and the web server useful. If you use Mozilla Firefox, the
httpfox extension might help.

IronPython + Selenium2Library + Visual Studio + Robot Framwork	0	Feb 14, 2014
Creating a direct download div link for pdf file	3	Mar 19, 2023
[ANN] Robot Framework 2.5	0	Jun 15, 2010
implementing download using a url call	2	Mar 28, 2014
Improving the web page download code.	5	Aug 27, 2013
wave robot notes	0	Dec 24, 2009
Dynamic block parsing + scrolling	0	May 30, 2024
Trying to access hdml from an open browser using Python.	1	Jan 18, 2023

download robot

larryzhang

Kushal Kumaran

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads