Python script help

C

cool1574

Hello, I am looking for a script that will be able to search an online document (by giving the script the URL) and find all the downloadable links in the document and then download them automatically.
I appreciate your help,
Thank you.
 
U

Ulrich Eckhardt

Am 30.07.2013 16:49, schrieb (e-mail address removed):
Hello, I am looking for a script that will be able to search an
online document (by giving the script the URL) and find all the
downloadable links in the document and then download them
automatically.

Well, that's actually pretty simple. Using the URL, download the
document. Then, parse it in order to extract embedded URLs and finally
download the resulting URLs.

If you have specific problems, please provide more info which part
exactly you're having problems with, along with what you already tried
etc. In short, show some effort yourself. In the meantime, I'd suggest
reading a Python tutorial and Eric Raymonds essay on asking smart questions.

Greetings!

Uli
 
C

cool1574

I know but I think using Python in this situation is good...is that the full script?
 
C

Chris Angelico

I know but I think using Python in this situation is good...is that the full script?

That script just drops out to the system and lets wget do it. So don't
bother with it.

ChrisA
 
C

Chris Angelico

What if I want to use only Python? is that possible? using lib and lib2?

Sure, anything's possible. And a lot easier if you quote context in
your posts. But why do it? wget is exactly what you need.

ChrisA
 
C

Cameron Simpson

| ** urlib, urlib2

Sure. And I'd use BeautifulSoup to do the parse. You'll need to fetch that.
So: urllib[2] to fetch the document and BS to parse it for links,
then urllib[2] to fetch the links you want.

http://www.crummy.com/software/BeautifulSoup/bs4/download/4.0/

Cheers,
--
Cameron Simpson <[email protected]>

You can be psychotic and still be competent.
- John L. Young, American Academy of Psychiatry and the Law on Ted
Kaczynski, and probably most internet users
 
D

Denis McMahon

Hello, I am looking for a script that will be able to search an online
document (by giving the script the URL) and find all the downloadable
links in the document and then download them automatically.
I appreciate your help,

Why use Python? Just:

wget -m url
 
C

cool1574

Here are some scripts, how do I put them together to create the script I want? (to search a online document and download all the links in it)
p.s: can I set a destination folder for the downloads?

urllib.urlopen("http://....")

possible_urls = re.findall(r'\S+:\S+', text)

import urllib2
response = urllib2.urlopen('http://www.example.com/')
html = response.read()
 
A

alex23

Here are some scripts, how do I put them together to create the script I want? (to search a online document and download all the links in it)

1. Think about the requirements.
2. Write some code.
3. Test it.
4. Repeat until requirements are met.
p.s: can I set a destination folder for the downloads?

Yes.

Show us you're actively trying to solve this yourself rather than just
asking us to write the code for you.
 
C

cool1574

I know I should be testing out the script myself but I did, I tried and since I am new in python and I work for a security firm that ask me to scan hundreds of documents a day for unsafe links (by opening them) I thought writing a script will be much easier. I do not know how to combine those three scripts together (the ones I wrote in my previous replay) that is why I camto here for help. please help me build a working script that will do the job.
Thanks in advance.
you can contact me at (e-mail address removed)
 
U

Ulrich Eckhardt

Am 01.08.2013 18:02, schrieb (e-mail address removed):
I know I should be testing out the script myself but I did, I tried
and since I am new in python and I work for a security firm that ask
me to scan hundreds of documents a day for unsafe links (by opening
them) I thought writing a script will be much easier. I do not know
how to combine those three scripts together (the ones I wrote in my
previous replay) that is why I cam to here for help. please help me
build a working script that will do the job.

This first option is to hire a programmer, which should give you the
quickest results. If the most important thing is getting the job done,
then this should be your #1 approach.

Now, if you really want to do it yourself, you will have to do some
learning yourself. Start with http://docs.python.org, which includes
tutorials, references and a bunch of other links, in particular go
through the tutorials. Make sure you pick the documentation
corresponding to your Python version though, versions 2 and 3 are subtly
different!

Then, read http://www.catb.org/esr/faqs/smart-questions.html. This is a
a bit metatopical but still important, and while this doesn't make you a
programmer in an afternoon, it will help you understand various
reactions you received here.

hope that gets you started

Uli
 
C

cool1574

I do know some Python programming, I just dont know enough to put together the various scripts I need...I would really really appreciate if some one can help me with that...
 
C

Chris Angelico

I do know some Python programming, I just dont know enough to put together the various scripts I need...I would really really appreciate if some one can help me with that...

Be aware that you might be paying money for that. If you know "some"
carpentry but not enough to put together a bookcase, and you ask a
professional carpenter to make you a bookcase, you'll have to pay him.
The same is true in programming, except that there are more people
willing to work for nothing, hence the vague "might be" rather than
the inevitable "shall" or the mighty "must" [1]. To get people to work
for you for free, you have to make them (us) want to, which in the
geeky arts generally means making it an interesting problem. Achieving
this is described well in esr's essay on asking smart questions [2],
which Ulrich also just pointed you to. We do this sort of thing for
fun, for love, so if you make your problem appeal to us, there's a
high chance that someone will provide you with code.

[1]
[2] http://www.catb.org/esr/faqs/smart-questions.html

ChrisA
 
C

cool1574

I understand I did not ask the question correctly, but is there any chance you can help me put together this code? I know that you all do this for funand enjoy it and that is why I asked you guys instead of asking some one who will charge me for a very simple line of code.
I would appreciate it, Thank you.
 
C

Chris Angelico

I understand I did not ask the question correctly, but is there any chance you can help me put together this code? I know that you all do this for fun and enjoy it and that is why I asked you guys instead of asking some onewho will charge me for a very simple line of code.
I would appreciate it, Thank you.

There are a million and one projects out there that I could do for
fun. Why should I do yours rather than one of theirs? The key is to
make your problem look more fun, or more useful, than the others. At
the moment, it looks fairly un-fun (just recreating wget with less
features), and not particularly useful (you could just use wget). So
at the moment, I don't feel inclined to put in several hours of unpaid
work for you.

I'll give you a few examples of things I *have* put hours of unpaid
work into, over the past few weeks:

* The Savoynet Performing Group production of The Yeomen of the Guard.
It's fun because the music's great and I'm working with awesome
people. (Also because the director has come up with an interpretation
of the finale that works better than any I've yet seen.) The lead
soprano is very close to going insane, the tragic comic sends a shiver
up my spine with the way he says "Elsie", and we have chocolate at
rehearsal (which I provide at my own expense). Fun and useful.

* The professional company performing Pirates of Penzance and Iolanthe
needs help moving costumes in and out. Again, useful, and working with
the best people. When the organizers of an international festival say
you're invaluable, that's pretty high praise.

* The Gilbert & Sullivan Society back home needs someone to manage its
domain, web hosting, internal Mailman list, etc, etc, etc. Most of it
is fairly mundane and unexciting, but it's useful.

* Gypsum is my designated successor to my somewhat popular MUD client
RosMud, achieving many of the things that I can't do with RosMud. As a
gamer, I like my game clients. Very fun and very useful.

* Related to the above, digging through the uncharted waters of mixed
metaphors and the Pike programming language, discovering language bugs
that probably nobody had ever run into before; and then submitting
patches and, again, seeing the approval and appreciation from people I
respect highly.

* Reading Alice in Wonderland to my eleven-year-old sister who'd never
heard it before. (Also to the rest of the family, who frequently 'just
happened' to hang around as I was reading.)

* Telling people about the Alice: Otherlands Kickstarter campaign [1],
which I'd really like to see succeed (if it reaches $250,000 within
the next few hours, the original voices of Alice and the Cheshire Cat
will be brought in!).

These are all projects that tie in with one of my interests or hobbies
(Gilbert and Sullivan, MUDding, and Alice in Wonderland). That gives
them a huge head-start in the "fun" and "interesting" categories.
You're trying to get me to donate my time and effort to your project;
to do that, you have to make your project look as interesting as one
of those. Okay, maybe not quite; each of the above has had MANY dev
hours donated to it, and you're just looking for maybe 1-2 hours. But
still, that's worth maybe a hundred dollars, so think of your request
as soliciting a donation of that amount. How are you going to pitch
that?

By the way, I am right now donating time towards a meta-cause: your
ability to handle yourself on an internet mailing list. I consider
that cause to be *extremely* useful, because it empowers the world and
you in ways that will make life easier for everyone, most notably
people on this list who I respect quite highly. So I'm happy to donate
ten or fifteen minutes to explaining exactly what it takes to get
something done, because - unless I've completely misread you - you,
and the whole world, will benefit that many times over.

[1] http://www.kickstarter.com/projects/spicyhorse/alice-otherlands

ChrisA
 
M

Michael Torrie

I do know some Python programming, I just dont know enough to put
together the various scripts I need...I would really really
appreciate if some one can help me with that...

Seems like your first task, then, is to become proficient at python so
that you can read the scripts you find and understand how they work so
that you can then take that as inspiration for your own project. We're
happy to answer questions about python programming in general.

Good luck. Python is a really fun language and if you read the docs and
tutorials, and start actually messing around with code (there are lots
of examples of using urllib2 out there, and also parsing libraries),
you'll make good progress.
 
P

Piet van Oostrum

Here are some scripts, how do I put them together to create the script
I want? (to search a online document and download all the links in it)
p.s: can I set a destination folder for the downloads?

You can use os.chdir to go to the desired folder.
urllib.urlopen("http://....")

possible_urls = re.findall(r'\S+:\S+', text)

import urllib2
response = urllib2.urlopen('http://www.example.com/')
html = response.read()

If you insist on not using wget, here is a simple script with
BeautifulSoup (v4):

########################################################################
from bs4 import BeautifulSoup
from urllib2 import urlopen
from urlparse import urljoin
import os
import re

os.chdir('OUT')

def generate_filename(url):
url = re.sub('^[a-zA-Z0-9+.-]+:/*', '', url)
return url.replace('/', '_')

URL = "http://www.example.com/"
soup = BeautifulSoup(urlopen(URL).read())

links = soup.select('a[href]')
for link in links:
url = urljoin(URL, link['href'])
print url
html = urlopen(url).read()
fn = generate_filename(url)
with open(fn, 'wb') as outfile:
outfile.write(html)
########################################################################

You should add a more intelligent filename generator, filter out mail:
urls and possibly others and add exception handling for HTTP errors.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,219
Messages
2,571,125
Members
47,731
Latest member
PasqualePf

Latest Threads

Top