Podcast catcher in Python

C

Chuck

Hi all,

I would like to code a simple podcast catcher in Python merely as an
exercise in internet programming. I am a CS student and new to
Python, but understand Java fairly well. I understand how to connect
to a server with urlopen, but then I don't understand how to download
the mp3, or whatever, podcast? Do I need to somehow parse the XML
document? I really don't know. Any ideas?

Thanks!

Chuck
 
C

Chuck

Also, if anyone could recommend some books that cover this type of
programming, I would greatly appreciate it.

Thanks!
 
F

Falcolas

Hi all,

I would like to code a simple podcast catcher in Python merely as an
exercise in internet programming.  I am a CS student and new to
Python, but understand Java fairly well.  I understand how to connect
to a server with urlopen, but then I don't understand how to download
the mp3, or whatever, podcast?  Do I need to somehow parse the XML
document?  I really don't know.  Any ideas?

Thanks!

Chuck

You will first have to download the RSS XML file, then parse that file
for the URL for the audio file itself. Something like eTree will help
immensely in this part. You'll also have to keep track of what you've
already downloaded.

I'd recommend taking a look at the RSS XML yourself, so you know what
it is you have to parse out, and where to find it. From there, it
should be fairly easy to come up with the proper query to pull it
automatically out of the XML.

As a kindness to the provider, I would recommend a fairly lengthy
sleep between GETs, particularly if you want to scrape their back
catalog.

Unfortunately, I no longer have the script I created to do just such a
thing in the past, but the process is rather straightforward, once you
know where to look.

~G
 
C

Chuck

You will first have to download the RSS XML file, then parse that file
for the URL for the audio file itself. Something like eTree will help
immensely in this part. You'll also have to keep track of what you've
already downloaded.

I'd recommend taking a look at the RSS XML yourself, so you know what
it is you have to parse out, and where to find it. From there, it
should be fairly easy to come up with the proper query to pull it
automatically out of the XML.

As a kindness to the provider, I would recommend a fairly lengthy
sleep between GETs, particularly if you want to scrape their back
catalog.

Unfortunately, I no longer have the script I created to do just such a
thing in the past, but the process is rather straightforward, once you
know where to look.

~G

Thanks! I will see what I can do.
 
C

Chuck

I am not sure how eTree fits in.  Is that eTree.org?- Hide quoted text -

- Show quoted text -

Can I just use x.read() to download the mp3 file and use x.write() to
write it to a file? Or, do I have to worry about encoding/decoding
etc...? I am under the impression that I can just read the mp3 and
write to a file, then play it in a media player. Is this too
simplified?
 
C

Chuck

Brilliant!  I will give that a try.
Cheers!

Does anyone know how I should read/download the mp3 file, and how I
should write/save it so that I can play it on a media player such as
Windoze media player? Excuse my ignorance, but I am a complete noob
at this. I downloaded the mp3, and I got a ton of hex, I think, but
it could've been unicode.
 
C

Chuck

Thanks Chris!  I will play around with this.

I am using Python 3.1, but I can't figure out why I can't use
xml.dom.minidom. Here is my code:

from xml.dom.minidom import parse, parseString
url = 'http://minnesota.publicradio.org/tools/podcasts/
grammar_grater.xml' #just for test purposes

doc = parse(url) #I have also tried parseString(url), not to mention
a million other methods from xml.Etree, xml.sax etc... all to no
avail


What the heck am I doing wrong? How can I get this xml file and use
the toprettyxml() method. Or something, so I can parse it. I don't
have any books and the documentation for Python kind of sucks. I am a
complete noob to Python and internet programming. (I'm sure that is
obvious :) )

Thanks!

Charlie
 
C

Chris Rebert

I am using Python 3.1, but I can't figure out why I can't use
xml.dom.minidom.  Here is my code:

from xml.dom.minidom import parse, parseString
url = 'http://minnesota.publicradio.org/tools/podcasts/
grammar_grater.xml'  #just for test purposes

doc = parse(url)  #I have also tried parseString(url), not to mention
a million other methods from xml.Etree, xml.sax etc...  all to no
avail


What the heck am I doing wrong?  How can I get this xml file and use
the toprettyxml() method.  Or something, so I can parse it.  I don't
have any books and the documentation for Python kind of sucks.  I am a
complete noob to Python and internet programming.  (I'm sure that is
obvious :) )

Impossible to say, considering *you didn't actually state what
problem/error you're encountering*.

Cheers,
Chris
 
D

Dave Angel

Chuck said:
I am using Python 3.1, but I can't figure out why I can't use
xml.dom.minidom. Here is my code:

from xml.dom.minidom import parse, parseString
url =http://minnesota.publicradio.org/tools/podcasts/
grammar_grater.xml' #just for test purposes

doc =arse(url) #I have also tried parseString(url), not to mention
a million other methods from xml.Etree, xml.sax etc... all to no
avail


What the heck am I doing wrong? How can I get this xml file and use
the toprettyxml() method. Or something, so I can parse it. I don't
have any books and the documentation for Python kind of sucks. I am a
complete noob to Python and internet programming. (I'm sure that is
obvious :) )

Thanks!

Charlie
Wrong? You didn't specify your OS environment, you didn't show the
error message (and traceback), you posted an apparently unrelated
question in the same thread (there's no XML inside a mp3 file).

xml.dom.minidom.parse() takes a filename or a 'file' object as its first
argument. You gave it a URL, so it complained. You can fix that either
by using urllib.urlopen() or by separately copying the data to a local
file and using its filename here.

In general, I'd recommend against testing new code live against the
internet, since errors can occur from the vagaries of the internet as
well as from bugs in your code. Sometimes it's hard to tell the
difference when the symptoms change each time you run.

So I'd download the xml data that you want to test with to a local file,
and test out your parsing logic against that copy. In fact, first
testing will probably be against a simplified version of that copy.

How do you download the file? Well, if you're using Firefox, you can
browse to that page, and do View->Source. Then copy/paste that text
into a text editor, and save it locally. Something similar probably
works in other browsers, maybe even IE.

Or you can use urlretrieve, as suggested earlier in this thread. But
I'd make that a separate script, so that you can separate the bugs in
downloading from the bugs in parsing. After everything mostly works,
you can think about combining them.

DaveA
 
C

Chuck

Wrong?  You didn't specify your OS environment, you didn't show the
error message (and traceback), you posted an apparently unrelated
question in the same thread (there's no XML inside a mp3 file).

xml.dom.minidom.parse() takes a filename or a 'file' object as its first
argument.  You gave it a URL, so it complained.  You can fix that either
by using urllib.urlopen() or by separately copying the data to a local
file and using its filename here.

In general, I'd recommend against testing new code live against the
internet, since errors can occur from the vagaries of the internet as
well as from bugs in your code.  Sometimes it's hard to tell the
difference when the symptoms change each time you run.

So I'd download the xml data that you want to test with to a local file,
and test out your parsing logic against that copy.  In fact, first
testing will probably be against a simplified version of that copy.

How do you download the file?  Well, if you're using Firefox, you can
browse to that page, and do View->Source.  Then copy/paste that text
into a text editor, and save it locally.  Something similar probably
works in other browsers, maybe even IE.

Or you can use urlretrieve, as suggested earlier in this thread.  But
I'd make that a separate script, so that you can separate the bugs in
downloading from the bugs in parsing.  After everything mostly works,
you can think about combining them.

DaveA

Okay, that makes sense. I will try that. Essentially, what I am
trying to learn by just reading articles on the web is how to access
info on the web using Python, i.e. weather data, podcasts, etc... I
have never done it before in any language. So, I am trying to do
something that I've never done before with a language I barely know.
Can be very frustrating. :(

Thanks for all the help!
 
C

Chuck

Wrong?  You didn't specify your OS environment, you didn't show the
error message (and traceback), you posted an apparently unrelated
question in the same thread (there's no XML inside a mp3 file).

xml.dom.minidom.parse() takes a filename or a 'file' object as its first
argument.  You gave it a URL, so it complained.  You can fix that either
by using urllib.urlopen() or by separately copying the data to a local
file and using its filename here.

In general, I'd recommend against testing new code live against the
internet, since errors can occur from the vagaries of the internet as
well as from bugs in your code.  Sometimes it's hard to tell the
difference when the symptoms change each time you run.

So I'd download the xml data that you want to test with to a local file,
and test out your parsing logic against that copy.  In fact, first
testing will probably be against a simplified version of that copy.

How do you download the file?  Well, if you're using Firefox, you can
browse to that page, and do View->Source.  Then copy/paste that text
into a text editor, and save it locally.  Something similar probably
works in other browsers, maybe even IE.

Or you can use urlretrieve, as suggested earlier in this thread.  But
I'd make that a separate script, so that you can separate the bugs in
downloading from the bugs in parsing.  After everything mostly works,
you can think about combining them.

DaveA

Oh yeah! I am using Windows XP.
 
C

Chuck

Wrong?  You didn't specify your OS environment, you didn't show the
error message (and traceback), you posted an apparently unrelated
question in the same thread (there's no XML inside a mp3 file).

xml.dom.minidom.parse() takes a filename or a 'file' object as its first
argument.  You gave it a URL, so it complained.  You can fix that either
by using urllib.urlopen() or by separately copying the data to a local
file and using its filename here.

In general, I'd recommend against testing new code live against the
internet, since errors can occur from the vagaries of the internet as
well as from bugs in your code.  Sometimes it's hard to tell the
difference when the symptoms change each time you run.

So I'd download the xml data that you want to test with to a local file,
and test out your parsing logic against that copy.  In fact, first
testing will probably be against a simplified version of that copy.

How do you download the file?  Well, if you're using Firefox, you can
browse to that page, and do View->Source.  Then copy/paste that text
into a text editor, and save it locally.  Something similar probably
works in other browsers, maybe even IE.

Or you can use urlretrieve, as suggested earlier in this thread.  But
I'd make that a separate script, so that you can separate the bugs in
downloading from the bugs in parsing.  After everything mostly works,
you can think about combining them.

DaveA

Okay, I have tried to use urllib.request.urlretrieve() to download an
mp3, but my cursor just sits and blinks at me. This a small file, so
it should take more that a few minutes. Hmmm....?

Here is my code:

url = 'http://download.publicradio.org/podcast/minnesota/news/programs/
2009/09/10/grammar_20090910_64.mp3'
file = 'C:\\Documents and Settings\\Compaq_Owner\\Desktop\
\GramGrate.mp3'

import urllib.request
b4 = urllib.request.urlretrieve(url, file)

at this point, the cursor just sits and blinks forever.

Any ideas? I really appreciate this guys.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,001
Messages
2,570,254
Members
46,849
Latest member
Fira

Latest Threads

Top