RSS feed issues, or how to read each item exactly once

J

John Nagle

I've been using the "feedparser" module, and it turns out that
some RSS feeds don't quite do RSS right.

For the Reuters RSS feed, about once every fifteen minutes, the "Etag"
changes, even if there are no new stories. I've been logging this in
a program of mine:

WARNING: Feed "http://feeds.reuters.com/reuters/topNews?format=xml": Etag
changed from "YH2PzNGiblDEe3z0hw2T2PLelCs"
to "uGI/GLFvX9zQ+o4cdU2pFAetbEE" but no new content.

Etags are just an optimization, so that's not too serious. But
there are worse problems.

Sometimes the item ID for a story changes, although the story text
didn't. When a story stays on the Reuters feed for more than a day, it gets
a new ID each day.

Then, sometimes a higher priority story pushes an old story out of the
ten stories returned in the feed. But the higher priority story may disappear
from a later feed cycle, and the old story may come back.

So you can't actually trust those fields, and have to back them up with
checks of your own if you want exactly one copy of each item. It's
something that "feedparser" should perhaps do.

John Nagle
 
G

Gabriel Genellina

I've been using the "feedparser" module, and it turns out that
some RSS feeds don't quite do RSS right. [...]
It's
something that "feedparser" should perhaps do.

Better to ask the author than post here, I think. And even if feedparser
were a standard module, it's better to file a feature request in the
tracker (http://bugs.python.org)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,981
Messages
2,570,188
Members
46,731
Latest member
MarcyGipso

Latest Threads

Top