Podcast catcher in Python

Chuck · Sep 11, 2009

Hi all,

I would like to code a simple podcast catcher in Python merely as an
exercise in internet programming. I am a CS student and new to
Python, but understand Java fairly well. I understand how to connect
to a server with urlopen, but then I don't understand how to download
the mp3, or whatever, podcast? Do I need to somehow parse the XML
document? I really don't know. Any ideas?

Thanks!

Chuck

Chuck · Sep 11, 2009

Also, if anyone could recommend some books that cover this type of
programming, I would greatly appreciate it.

Thanks!

Falcolas · Sep 11, 2009

Hi all,

I would like to code a simple podcast catcher in Python merely as an
exercise in internet programming. I am a CS student and new to
Python, but understand Java fairly well. I understand how to connect
to a server with urlopen, but then I don't understand how to download
the mp3, or whatever, podcast? Do I need to somehow parse the XML
document? I really don't know. Any ideas?

Thanks!

Chuck

You will first have to download the RSS XML file, then parse that file
for the URL for the audio file itself. Something like eTree will help
immensely in this part. You'll also have to keep track of what you've
already downloaded.

I'd recommend taking a look at the RSS XML yourself, so you know what
it is you have to parse out, and where to find it. From there, it
should be fairly easy to come up with the proper query to pull it
automatically out of the XML.

As a kindness to the provider, I would recommend a fairly lengthy
sleep between GETs, particularly if you want to scrape their back
catalog.

Unfortunately, I no longer have the script I created to do just such a
thing in the past, but the process is rather straightforward, once you
know where to look.

~G

Chuck · Sep 11, 2009

You will first have to download the RSS XML file, then parse that file
for the URL for the audio file itself. Something like eTree will help
immensely in this part. You'll also have to keep track of what you've
already downloaded.

I'd recommend taking a look at the RSS XML yourself, so you know what
it is you have to parse out, and where to find it. From there, it
should be fairly easy to come up with the proper query to pull it
automatically out of the XML.

As a kindness to the provider, I would recommend a fairly lengthy
sleep between GETs, particularly if you want to scrape their back
catalog.

Unfortunately, I no longer have the script I created to do just such a
thing in the past, but the process is rather straightforward, once you
know where to look.

~G

Thanks! I will see what I can do.

Chuck · Sep 11, 2009

Thanks! I will see what I can do.- Hide quoted text -

- Show quoted text -

I am not sure how eTree fits in. Is that eTree.org?

Chuck · Sep 11, 2009

I am not sure how eTree fits in. Is that eTree.org?- Hide quoted text -

- Show quoted text -

Can I just use x.read() to download the mp3 file and use x.write() to
write it to a file? Or, do I have to worry about encoding/decoding
etc...? I am under the impression that I can just read the mp3 and
write to a file, then play it in a media player. Is this too
simplified?

Chris Rebert · Sep 12, 2009

I am not sure how eTree fits in. Â Is that eTree.org?

No, he's referring to the `xml.etree.elementtree` standard module:
http://docs.python.org/library/xml.etree.elementtree.html#module-xml.etree.ElementTree

Although since you're dealing with feeds, you might be able to use
Universal Feed Parser, which is specifically for RSS/Atom:
http://www.feedparser.org/

Cheers,
Chris

Chuck · Sep 12, 2009

No, he's referring to the `xml.etree.elementtree` standard module:http://docs.python.org/library/xml.etree.elementtree.html#module-xml....

Although since you're dealing with feeds, you might be able to use
Universal Feed Parser, which is specifically for RSS/Atom:http://www.feedparser.org/

Cheers,
Chris
--http://blog.rebertia.com

Brilliant! I will give that a try.
Cheers!

Chuck · Sep 12, 2009

Brilliant! I will give that a try.
Cheers!

Does anyone know how I should read/download the mp3 file, and how I
should write/save it so that I can play it on a media player such as
Windoze media player? Excuse my ignorance, but I am a complete noob
at this. I downloaded the mp3, and I got a ton of hex, I think, but
it could've been unicode.

Chris Rebert · Sep 12, 2009

Does anyone know how I should read/download the mp3 file, and how I
should write/save it so that I can play it on a media player such as
Windoze media player? Â Excuse my ignorance, but I am a complete noob
at this. Â I downloaded the mp3, and I got a ton of hex, I think, but
it could've been unicode.

urllib.urlretrieve():
http://docs.python.org/library/urllib.html#urllib.urlretrieve

Cheers,
Chris

Chuck · Sep 12, 2009

urllib.urlretrieve():http://docs.python.org/library/urllib.html#urllib.urlretrieve

Cheers,
Chris

Thanks Chris! I will play around with this.

Chuck · Sep 19, 2009

Thanks Chris! I will play around with this.

I am using Python 3.1, but I can't figure out why I can't use
xml.dom.minidom. Here is my code:

from xml.dom.minidom import parse, parseString
url = 'http://minnesota.publicradio.org/tools/podcasts/
grammar_grater.xml' #just for test purposes

doc = parse(url) #I have also tried parseString(url), not to mention
a million other methods from xml.Etree, xml.sax etc... all to no
avail

What the heck am I doing wrong? How can I get this xml file and use
the toprettyxml() method. Or something, so I can parse it. I don't
have any books and the documentation for Python kind of sucks. I am a
complete noob to Python and internet programming. (I'm sure that is
obvious

)

Thanks!

Charlie

Chris Rebert · Sep 19, 2009

On Fri said:
I am using Python 3.1, but I can't figure out why I can't use
xml.dom.minidom. Â Here is my code:

from xml.dom.minidom import parse, parseString
url = 'http://minnesota.publicradio.org/tools/podcasts/
grammar_grater.xml' Â #just for test purposes

doc = parse(url) Â #I have also tried parseString(url), not to mention
a million other methods from xml.Etree, xml.sax etc... Â all to no
avail

What the heck am I doing wrong? Â How can I get this xml file and use
the toprettyxml() method. Â Or something, so I can parse it. Â I don't
have any books and the documentation for Python kind of sucks. Â I am a
complete noob to Python and internet programming. Â (I'm sure that is
obvious )

Impossible to say, considering *you didn't actually state what
problem/error you're encountering*.

Cheers,
Chris

Dave Angel · Sep 19, 2009

Chuck said:
I am using Python 3.1, but I can't figure out why I can't use
xml.dom.minidom. Here is my code:

from xml.dom.minidom import parse, parseString
url =http://minnesota.publicradio.org/tools/podcasts/
grammar_grater.xml' #just for test purposes

doc =arse(url) #I have also tried parseString(url), not to mention
a million other methods from xml.Etree, xml.sax etc... all to no
avail

What the heck am I doing wrong? How can I get this xml file and use
the toprettyxml() method. Or something, so I can parse it. I don't
have any books and the documentation for Python kind of sucks. I am a
complete noob to Python and internet programming. (I'm sure that is
obvious )

Thanks!

Charlie

Wrong? You didn't specify your OS environment, you didn't show the
error message (and traceback), you posted an apparently unrelated
question in the same thread (there's no XML inside a mp3 file).

xml.dom.minidom.parse() takes a filename or a 'file' object as its first
argument. You gave it a URL, so it complained. You can fix that either
by using urllib.urlopen() or by separately copying the data to a local
file and using its filename here.

In general, I'd recommend against testing new code live against the
internet, since errors can occur from the vagaries of the internet as
well as from bugs in your code. Sometimes it's hard to tell the
difference when the symptoms change each time you run.

So I'd download the xml data that you want to test with to a local file,
and test out your parsing logic against that copy. In fact, first
testing will probably be against a simplified version of that copy.

How do you download the file? Well, if you're using Firefox, you can
browse to that page, and do View->Source. Then copy/paste that text
into a text editor, and save it locally. Something similar probably
works in other browsers, maybe even IE.

Or you can use urlretrieve, as suggested earlier in this thread. But
I'd make that a separate script, so that you can separate the bugs in
downloading from the bugs in parsing. After everything mostly works,
you can think about combining them.

DaveA

Chuck · Sep 19, 2009

Wrong? You didn't specify your OS environment, you didn't show the
error message (and traceback), you posted an apparently unrelated
question in the same thread (there's no XML inside a mp3 file).

xml.dom.minidom.parse() takes a filename or a 'file' object as its first
argument. You gave it a URL, so it complained. You can fix that either
by using urllib.urlopen() or by separately copying the data to a local
file and using its filename here.

In general, I'd recommend against testing new code live against the
internet, since errors can occur from the vagaries of the internet as
well as from bugs in your code. Sometimes it's hard to tell the
difference when the symptoms change each time you run.

So I'd download the xml data that you want to test with to a local file,
and test out your parsing logic against that copy. In fact, first
testing will probably be against a simplified version of that copy.

How do you download the file? Well, if you're using Firefox, you can
browse to that page, and do View->Source. Then copy/paste that text
into a text editor, and save it locally. Something similar probably
works in other browsers, maybe even IE.

Or you can use urlretrieve, as suggested earlier in this thread. But
I'd make that a separate script, so that you can separate the bugs in
downloading from the bugs in parsing. After everything mostly works,
you can think about combining them.

DaveA

Okay, that makes sense. I will try that. Essentially, what I am
trying to learn by just reading articles on the web is how to access
info on the web using Python, i.e. weather data, podcasts, etc... I
have never done it before in any language. So, I am trying to do
something that I've never done before with a language I barely know.
Can be very frustrating.

Thanks for all the help!

Chuck · Sep 19, 2009

Wrong? You didn't specify your OS environment, you didn't show the
error message (and traceback), you posted an apparently unrelated
question in the same thread (there's no XML inside a mp3 file).

xml.dom.minidom.parse() takes a filename or a 'file' object as its first
argument. You gave it a URL, so it complained. You can fix that either
by using urllib.urlopen() or by separately copying the data to a local
file and using its filename here.

In general, I'd recommend against testing new code live against the
internet, since errors can occur from the vagaries of the internet as
well as from bugs in your code. Sometimes it's hard to tell the
difference when the symptoms change each time you run.

So I'd download the xml data that you want to test with to a local file,
and test out your parsing logic against that copy. In fact, first
testing will probably be against a simplified version of that copy.

How do you download the file? Well, if you're using Firefox, you can
browse to that page, and do View->Source. Then copy/paste that text
into a text editor, and save it locally. Something similar probably
works in other browsers, maybe even IE.

Or you can use urlretrieve, as suggested earlier in this thread. But
I'd make that a separate script, so that you can separate the bugs in
downloading from the bugs in parsing. After everything mostly works,
you can think about combining them.

DaveA

Oh yeah! I am using Windows XP.

Chuck · Sep 19, 2009

Wrong? You didn't specify your OS environment, you didn't show the
error message (and traceback), you posted an apparently unrelated
question in the same thread (there's no XML inside a mp3 file).

xml.dom.minidom.parse() takes a filename or a 'file' object as its first
argument. You gave it a URL, so it complained. You can fix that either
by using urllib.urlopen() or by separately copying the data to a local
file and using its filename here.

In general, I'd recommend against testing new code live against the
internet, since errors can occur from the vagaries of the internet as
well as from bugs in your code. Sometimes it's hard to tell the
difference when the symptoms change each time you run.

So I'd download the xml data that you want to test with to a local file,
and test out your parsing logic against that copy. In fact, first
testing will probably be against a simplified version of that copy.

How do you download the file? Well, if you're using Firefox, you can
browse to that page, and do View->Source. Then copy/paste that text
into a text editor, and save it locally. Something similar probably
works in other browsers, maybe even IE.

Or you can use urlretrieve, as suggested earlier in this thread. But
I'd make that a separate script, so that you can separate the bugs in
downloading from the bugs in parsing. After everything mostly works,
you can think about combining them.

DaveA

Okay, I have tried to use urllib.request.urlretrieve() to download an
mp3, but my cursor just sits and blinks at me. This a small file, so
it should take more that a few minutes. Hmmm....?

Here is my code:

url = 'http://download.publicradio.org/podcast/minnesota/news/programs/
2009/09/10/grammar_20090910_64.mp3'
file = 'C:\\Documents and Settings\\Compaq_Owner\\Desktop\
\GramGrate.mp3'

import urllib.request
b4 = urllib.request.urlretrieve(url, file)

at this point, the cursor just sits and blinks forever.

Any ideas? I really appreciate this guys.

Chuck · Sep 19, 2009

Never mind, guys I finally got things working. Woo hoo!!!!

"podcast"	2	Aug 7, 2006
Generating Filenames from Feeds	5	Mar 14, 2013
a new Python Podcast series (and the use of Python in creating podcasting tools)	2	Jul 12, 2005
Rock paper scissors in python with "algorithm"	1	Feb 27, 2022
Windows command line to python	0	Sep 29, 2021
Python parsing iTunes XML/COM	20	Jul 29, 2008
PEP/GSoC idea: built-in parser generator module for Python?	0	Mar 14, 2014
XML python to database	3	Nov 2, 2013

Podcast catcher in Python

Chuck

Chuck

Falcolas

Chuck

Chuck

Chuck

Chris Rebert

Chuck

Chuck

Chris Rebert

Chuck

Chuck

Chris Rebert

Dave Angel

Chuck

Chuck

Chuck

Chuck

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads