retrieving ATOM/FSS feeds

S

_spitFIRE

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I'm using feedparser library to parser ATOM/RSS feeds. However, I don't
get the entire post! but only summaries! How do I retrieve the entire feed?
I believe that the parser library should have support for doing that or the
specification should detail how it can be done? Or should I simply get the
feed link and do HTML scraping?


- --
_ _ _]{5pitph!r3}[_ _ _
__________________________________________________
“I'm smart enough to know that I'm dumb.â€
- Richard P Feynman
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGv77mA0th8WKBUJMRAk80AJ9VCIBXIZVhuPtT7bfY4dRrM15H+gCeOVJG
77Zbl8jmWPsp4QjP85Lbwbc=
=Ho+8
-----END PGP SIGNATURE-----
 
L

Lawrence Oluyede

_spitFIRE said:
I'm using feedparser library to parser ATOM/RSS feeds. However, I don't
get the entire post! but only summaries! How do I retrieve the entire feed?
I believe that the parser library should have support for doing that or the
specification should detail how it can be done? Or should I simply get the
feed link and do HTML scraping?

If the content producer doesn't provide the full article via RSS/ATOM
there's no way you can get it from there. Search for full content feeds
if any, otherwise get the article URL and feed it to BeautifulSoup to
scrape the content.
 
S

_spitFIRE

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Lawrence said:
If the content producer doesn't provide the full article via RSS/ATOM
there's no way you can get it from there. Search for full content feeds
if any, otherwise get the article URL and feed it to BeautifulSoup to
scrape the content.

For the same feed (where the content producer doesn't provide the full
article!) I was able to see the complete post in other RSS aggregators (like
Blam). I wanted to know how they were able to collect the feed!

I knew for sure that you can't do screen scraping separately for each and
every blog and that there has be a standard way or atleast that blogs
maintain a standard template for rendering posts. I mean if each of the site
only offered partial content and the rest had to be scraped from the page,
and the page maintained a non-standard structure which is more likely, then
it would become impossible IMHO for any aggregator to aggregate feeds!

I shall for now try with BeautifulSoup, though I'm still doubtful about it.

- --
_ _ _]{5pitph!r3}[_ _ _
__________________________________________________
“I'm smart enough to know that I'm dumb.â€
- Richard P Feynman
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGwC1SA0th8WKBUJMRAs4eAJ0bLJVzEZls1JtE6e8MUrqdapXGPwCfVO02
yYzezvhJFY1SDHUGxrJdR5M=
=rfLo
-----END PGP SIGNATURE-----
 
L

Lawrence Oluyede

_spitFIRE said:
For the same feed (where the content producer doesn't provide the full
article!) I was able to see the complete post in other RSS aggregators (like
Blam). I wanted to know how they were able to collect the feed!

Perhaps in the feed itself there's the link for the full content feed.
 
D

Diez B. Roggisch

_spitFIRE said:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1



For the same feed (where the content producer doesn't provide the full
article!) I was able to see the complete post in other RSS aggregators
(like Blam). I wanted to know how they were able to collect the feed!

Either it is referred to in the data - or it might be that they do have
affiliate-deals with certain partners.

Diez
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,982
Messages
2,570,190
Members
46,736
Latest member
zacharyharris

Latest Threads

Top