J
John Salerno
I'm using Beautiful Soup to extract some song information from a radio
station's website that lists the songs it plays as it plays them.
Getting the time that the song is played is easy, because the time is
wrapped in a <div> tag all by itself with a class attribute that has a
specific value I can search for. But the actual song title and artist
information is harder, because the HTML isn't quite as precise. Here's
a sample:
<div class="cmPlaylistContent">
<strong>
<a href="/lsp/t2995/">
Love Without End, Amen
</a>
</strong>
<br/>
<a href="/lsp/a436/">
George Strait
</a>
<br/>
<span class="sprite iconDownload">
</span>
Download Song:
<a href="http://itunes.apple.com/us/album/love-without-end-amen/
id71416?i=71404&uo=4">
iTunes
</a>
|
<a href="http://www.amazon.com/Love-Without-End-Amen/dp/B000V638BQ?
SubscriptionId=1NXYFBZST44V8CCDK182&tag=coxradiointer-20&linkCode=xm2&camp=2025&creative=165953&creativeASIN=B000V638BQ">
Amazon MP3
</a>
<br/>
<span class="sprite iconComments">
Comments (1)
</span>
<span class="sprite iconVoteUp">
Votes (1)
</span>
</div>
This is about as far as I can drill down without getting TOO specific.
I simply find the <div> tags with the "cmPlaylistContent" class. This
tag contains both the song title and the artist name, and sometimes
miscellaneous other information as well, like a way to vote for the
song or links to purchase it from iTunes or Amazon.
So my question is, given the above HTML, how can I best extract the
song title and artist name? It SEEMS like they are always the first
two pieces of information in the tag, such that:
for item in div.stripped_strings: print(item)
Love Without End, Amen
George Strait
Download Song:
iTunes
|
Amazon MP3
Comments (1)
Votes (1)
and I could simply get the first two items returned by that generator.
It's not quite as clean as I'd like, because I have no idea if
anything could ever be inserted before either of these items, thus
messing it all up.
I also don't want to rely on the <strong> tag, which makes me shudder,
or the <a> tag, because I don't know if they will always have an href.
Ideall, the <a> tag would have also had an attribute that labeled the
title as the title, and the artist as the artist, but alas.....
Therefore, I appeal to your greater wisdom in these matters. Given
this HTML, is there a "best practice" for how to refer to the song
title and artist?
Thanks!
station's website that lists the songs it plays as it plays them.
Getting the time that the song is played is easy, because the time is
wrapped in a <div> tag all by itself with a class attribute that has a
specific value I can search for. But the actual song title and artist
information is harder, because the HTML isn't quite as precise. Here's
a sample:
<div class="cmPlaylistContent">
<strong>
<a href="/lsp/t2995/">
Love Without End, Amen
</a>
</strong>
<br/>
<a href="/lsp/a436/">
George Strait
</a>
<br/>
<span class="sprite iconDownload">
</span>
Download Song:
<a href="http://itunes.apple.com/us/album/love-without-end-amen/
id71416?i=71404&uo=4">
iTunes
</a>
|
<a href="http://www.amazon.com/Love-Without-End-Amen/dp/B000V638BQ?
SubscriptionId=1NXYFBZST44V8CCDK182&tag=coxradiointer-20&linkCode=xm2&camp=2025&creative=165953&creativeASIN=B000V638BQ">
Amazon MP3
</a>
<br/>
<span class="sprite iconComments">
Comments (1)
</span>
<span class="sprite iconVoteUp">
Votes (1)
</span>
</div>
This is about as far as I can drill down without getting TOO specific.
I simply find the <div> tags with the "cmPlaylistContent" class. This
tag contains both the song title and the artist name, and sometimes
miscellaneous other information as well, like a way to vote for the
song or links to purchase it from iTunes or Amazon.
So my question is, given the above HTML, how can I best extract the
song title and artist name? It SEEMS like they are always the first
two pieces of information in the tag, such that:
for item in div.stripped_strings: print(item)
Love Without End, Amen
George Strait
Download Song:
iTunes
|
Amazon MP3
Comments (1)
Votes (1)
and I could simply get the first two items returned by that generator.
It's not quite as clean as I'd like, because I have no idea if
anything could ever be inserted before either of these items, thus
messing it all up.
I also don't want to rely on the <strong> tag, which makes me shudder,
or the <a> tag, because I don't know if they will always have an href.
Ideall, the <a> tag would have also had an attribute that labeled the
title as the title, and the artist as the artist, but alas.....
Therefore, I appeal to your greater wisdom in these matters. Given
this HTML, is there a "best practice" for how to refer to the song
title and artist?
Thanks!