RSS feed clarification?

E

Ed Flecko

Hi folks,
I'm trying to figure out this whole RSS feed thing.

I've created my .xml file to use for my feed, and my browsers
"recognize" that I have an RSS feed, and you can subscribe, etc., etc.

Here's why I "think" I want to use an RSS feed, and what I'm confused
about.

I have one file (and one file only) on my web site that changes
frequently (weekly), but the file name is always the same. I want to
alert people who subscribe to the feed that this file has changed.

Here's my questions:

1.) Will an RSS feed "work" (automatically notify the subscribers) for
a single file whose name is always the same (although the body content
of the file changes)?
2.) I don't understand how RSS feeds actually work, from the clients
perspective, i.e., how do the subscribers RSS client (Internet
Explorer, Firefox, etc.) actually know that the RSS feed has changed
and download it, etc.? Is it just simply a scheduled task, and the
client checks the feed automatically on a schedule?
3.) Since my feed isn't "news", per se, I don't need to bother with
"syndicating" my feed, do I?...or would this somehow benefit me.

Thank you,
Ed
 
A

Andy Dingley

I've created my .xml file to use for my feed, and my browsers
"recognize" that I have an RSS feed, and you can subscribe, etc., etc.

It would help if you told us the URL.

It's also better practice (if you can arrnage this, with your hosting)
to give your RSS file a ".rss" extension and most importantly to serve
it with a correct content-type for RSS, not just for XML. RSS is
robust aginst not doing this (most publishers can't get it right), but
it's still good practice if you're hosted on Apache.

I have one file (and one file only) on my web site that changes
frequently (weekly), but the file name is always the same. I want to
alert people who subscribe to the feed that this file has changed.
1.) Will an RSS feed "work" (automatically notify the subscribers) for
a single file whose name is always the same (although the body content
of the file changes)?

Yes. RSS is all about embedding metadata, and that includes the update
timestamps. The names themself are just one piece of the metadata --
so long as _something_ reflects the change, then you can make it all
work.

You might not be able to use the "permalink" feature of some RSS
versions. This is useful, so if you can, then you should use it. As to
whether it's relevant, then that depends on your particular
application and so we don't know that yet.

It's good practice to offer a "permalink" as a URL that will always
retrieve a particular version of the content, even some time after
this was first served. "Last week's news" is still interesting to many
consumers. You can purge these over time if you wish, but it's still
good practice to make the URL namespace a consumed resource that isn't
re-used.

If you can, then keep "last weeks news" and "this weeks news" stored
as separate files on the web server and make the web server respond to
specific requests for each one appropriately. Just giving a filename
with a datestamp in it can be enough to do this. For the "latest
news" URL, send a 302 redirect to the URL for the current file. This
redirect's value will need to be changed as each new file is uploaded.

Possibly it's just not appropriate to serve "last week's news" in your
application (I don't, and can't, know). If so, then just have the
simple one file, one filename, one URL situation. However make sure
that any URL you publish in RSS is _not_ labelled incorrectly as a
"permalink".

2.) I don't understand how RSS feeds actually work, from the clients
perspective,

Largely you don't and can't know this. You publish the stuff, what
happens next is up to whoever uses it. Don't try to pre-judge what
they can and (especially) what they can't do with it.

how do the subscribers RSS client (Internet
Explorer, Firefox, etc.) actually know that the RSS feed has changed
and download it, etc.?

They'll usually poll it regularly to see (i.e., the client decides).
HTTP polling shoudl be efficient - i.e. a GET or HEAD request should
quickly return a suitable HTTP 304 Not Modified if needs be, or at
least a HTTP 200 with appropriate timestamps. Good clients adjust this
polling time so as not to be a nuisance, to respect any hints you
embed in the syndication information you include inside the RSS
document, and also to fine-tune this on the basis of how often you
actually make changes to the content.

It's important that an RSS server can efficiently serve polling
clients when the content _hasn't_ changed, otherwise it can soon be
overloaded, even when it's not serving any content. This is a real
problem for dumb-coded servers with database-generated content. If
your RSS content is coming from static files, then Apache will get it
right automatically. If you're generating it dynamically, then make
sure your "last updated" timestamps are calculated and returned
quickly, and also that they represent the "last change" not the "last
request" timestamps.

3.) Since my feed isn't "news", per se, I don't need to bother with
"syndicating" my feed, do I?.

You don't really ever syndicate your own feed, you offer it up for
syndication and some aggregator might decide to syndicate it elsewhere
if it wishes to. Or else it might not, I cannot be positive which.
 
E

Ed Flecko

It would help if you told us the URL.

It's also better practice (if you can arrnage this, with your hosting)
to give your RSS file a ".rss" extension and most importantly to serve
it with a correct content-type for RSS, not just for XML. RSS is
robust aginst not doing this (most publishers can't get it right), but
it's still good practice if you're hosted on Apache.


Yes. RSS is all about embedding metadata, and that includes the update
timestamps. The names themself are just one piece of the metadata --
so long as _something_ reflects the change, then you can make it all
work.

You might not be able to use the "permalink" feature of some RSS
versions. This is useful, so if you can, then you should use it. As to
whether it's relevant, then that depends on your particular
application and so we don't know that yet.

It's good practice to offer a "permalink" as a URL that will always
retrieve a particular version of the content, even some time after
this was first served. "Last week's news" is still interesting to many
consumers. You can purge these over time if you wish, but it's still
good practice to make the URL namespace a consumed resource that isn't
re-used.

If you can, then keep "last weeks news" and "this weeks news" stored
as separate files on the web server and make the web server respond to
specific requests for each one appropriately. Just giving a filename
with a datestamp in it can be enough to do this. For the "latest
news" URL, send a 302 redirect to the URL for the current file. This
redirect's value will need to be changed as each new file is uploaded.

Possibly it's just not appropriate to serve "last week's news" in your
application (I don't, and can't, know). If so, then just have the
simple one file, one filename, one URL situation. However make sure
that any URL you publish in RSS is _not_ labelled incorrectly as a
"permalink".


Largely you don't and can't know this. You publish the stuff, what
happens next is up to whoever uses it. Don't try to pre-judge what
they can and (especially) what they can't do with it.


They'll usually poll it regularly to see (i.e., the client decides).
HTTP polling shoudl be efficient - i.e. a GET or HEAD request should
quickly return a suitable HTTP 304 Not Modified if needs be, or at
least a HTTP 200 with appropriate timestamps. Good clients adjust this
polling time so as not to be a nuisance, to respect any hints you
embed in the syndication information you include inside the RSS
document, and also to fine-tune this on the basis of how often you
actually make changes to the content.

It's important that an RSS server can efficiently serve polling
clients when the content _hasn't_ changed, otherwise it can soon be
overloaded, even when it's not serving any content. This is a real
problem for dumb-coded servers with database-generated content. If
your RSS content is coming from static files, then Apache will get it
right automatically. If you're generating it dynamically, then make
sure your "last updated" timestamps are calculated and returned
quickly, and also that they represent the "last change" not the "last
request" timestamps.


You don't really ever syndicate your own feed, you offer it up for
syndication and some aggregator might decide to syndicate it elsewhere
if it wishes to. Or else it might not, I cannot be positive which.

Hi Andy,
Hey, thanks for the reply. I'll take all the suggestions and help I
can get! :)

O.K., I've changed the name of my basic RSS file so it has an .rss
extension.

The site is: www.fivestarbank.com, and the specific file is our CD
rates that I know customers would like to keep current on...that's why
I think the RSS feed would be a smart idea.

Comments? Further suggestions?

Thank you!
 
A

Andy Dingley

O.K., I've changed the name of my basic RSS file so it has an .rss
extension.

It's now served under a content-type of text/plain when it ought to
be application/rss+xml. Fix that if you can (Apache and .htaccess),
otherwise it _might_ be better as .xml and at least served as text/xml
or application/xml. Don't sweat this though: it's good practice, but
RSS is deliberately robust against it being mis-configured.

Also validate it with feed validator
http://feedvalidator.org/check.cgi?url=http://www.fivestarbank.com/fsb.rss

As it stands, it's valid but still needs a couple of tweaks.

You're using RSS 2.0, which is probably the best choice for you,
although the spec is unfortunately badly written and ambiguous. Worth
reading anyway though:
<http://cyber.law.harvard.edu/rss/rss.html>


Line by line:

<title>Welcome to the Five Star Bank RSS feed</title>
Don't welcome people, tell them what it is. It's not a web site, it's
an RSS feed. They don't "visit" this, they have it delivered to them.
Remeber that they might be reading this on their fridge screen
display, along with the morning's news and last night's baseball
result.


<link>http://www.fivestarbank.com</link>
Good. This should be to the human-readable website, not any part of
the feed


<description>Where Excellence Exceeds Expectations</description>
Lose the marketing flannel. Put some content here. Try "Five Star Bank
CD rates at 15th May 2007, valid for the next 5 days" or similar


<item>
One item. It's all you need. Not common practice, but entirely valid
in your application.

<title>Current CD Rates</title>
Be careful with words like "current" in any syndicatable protocol (it
might not still be current when yourr reader gets to see it). Only use
them with items that are clearly timestamped, otherwise you will
confuse users.

<link>http://www.fivestarbank.com/documents/Current_Rates.pdf</link>

I would still be happier if this pointed to a series of files called
"rates at 2007-05-15" etc. Delete them as soon as they're obsolete if
you wish, but at least it avoids confusion of mapping an old
"currrent" onto a new file with a changed rate. If you don't do this
then you are losing most of the advantages of RSS.

You can still make "current" 302 redirect to this week's file.

There's a separate commercial decision to be made as to whether you
want to have your historical rate history visible so easily (by
leaving the old files available). It's your call (but if you ever
make this information publically visible even temporarily, someone
will make a business out of recording it and selling histories of it).
Obviously a single filename kills this anyway.

<description></description>
Put something in there. Probably (for this one-item case) a
restatement of the channel's description.

There are several elements missing from <channel>. Some are important.

<pubDate>
This is vital, because it's how an aggregator identifies the channel /
item as having been updated. If you don't have it, and you don't
change the item link URL, then most correct aggregators will simply
see your content as stale and unchanging, even if the PDF contents
themselves are changing. Put this on both channel and item -- channel
is just the latest pubDate across all <item>s, so in your case they're
currently the same.


<skipHours> & <ttl>
This is poorly done in RSS 2.0, but you should still use it. It's part
of how they hint at the update schedule for the channel. Personally
I'd use the RSS 1.0 syndication module instead, or as well.
<http://web.resource.org/rss/1.0/modules/syndication/>

<copyright>
This can be important, particularly if you wish to indicate that
financial information brokers can't republish your content. I suggest
reading the Creative Commons site for advice on indicating this.

<managingEditor>
It's now a legal requirement for UK commercial feeds to include this
(with some wiggle room for the technical details of "how"), so as to
identify the legal entity publishing this business communication. I'm
sure US retail banking laws have similar requirements.


There are also elements missing from <item>. Some are already
described, some important.

Remember that many syndication / aggregation environments syndicate
_items_, not _channels_. They'll strip out the items they want from
several sources of channel, then republish them as an aggregation. If
you want to swim in this world, make sure that your <item>s carry the
appropriate metadata, don't just stick it once one the overall channel
and hope.

<guid>
This is essential if you expect any syndication to work. It's how they
recognise <item>s that are different or (in conjunction with pubDate)
have been updated. Don't use isPermaLink=true though unless you're
disambiguated between each weeks' set of rates (as I suggest anyway).

<enclosure>
Your linked content is a PDF, so it's unclear as to whether it ought
to be addressed via a <link> or via <enclosure>. It's possible to use
either. It's better to not use a PDF at all, but to use HTML (with my
Semantic Web pointy hat on). In that case you'd clearly use a <link>
and we'd all start building a world of automatically machine-readable
smart content, intelligent agents and all the rest of it.

However you probably have a corporate brand manager who forces you to
use a PDF so that they can control the exact choice of corporate
typeface. This is a Bad and Wrong policy and the sooner these
dinosaurs are put out to grass the better, but I appreciate that it
happens. So is a PDF a piece of "web content" (use <link>) or is it a
monstrous great piece of opaque brochureware that's only fit to be
downloaded and printed, with no hope of ever being automatically read
and used by agents (use <enclosure>).
 
E

Ed Flecko

It's now served under a content-type of text/plain when it ought to
be application/rss+xml. Fix that if you can (Apache and .htaccess),
otherwise it _might_ be better as .xml and at least served as text/xml
or application/xml. Don't sweat this though: it's good practice, but
RSS is deliberately robust against it being mis-configured.

Also validate it with feed validatorhttp://feedvalidator.org/check.cgi?url=http%3A%2F%2Fwww.fivestarbank....

As it stands, it's valid but still needs a couple of tweaks.

You're using RSS 2.0, which is probably the best choice for you,
although the spec is unfortunately badly written and ambiguous. Worth
reading anyway though:
<http://cyber.law.harvard.edu/rss/rss.html>

Line by line:

<title>Welcome to the Five Star Bank RSS feed</title>
Don't welcome people, tell them what it is. It's not a web site, it's
an RSS feed. They don't "visit" this, they have it delivered to them.
Remeber that they might be reading this on their fridge screen
display, along with the morning's news and last night's baseball
result.

<link>http://www.fivestarbank.com</link>
Good. This should be to the human-readable website, not any part of
the feed

<description>Where Excellence Exceeds Expectations</description>
Lose the marketing flannel. Put some content here. Try "Five Star Bank
CD rates at 15th May 2007, valid for the next 5 days" or similar

<item>
One item. It's all you need. Not common practice, but entirely valid
in your application.

<title>Current CD Rates</title>
Be careful with words like "current" in any syndicatable protocol (it
might not still be current when yourr reader gets to see it). Only use
them with items that are clearly timestamped, otherwise you will
confuse users.

<link>http://www.fivestarbank.com/documents/Current_Rates.pdf</link>

I would still be happier if this pointed to a series of files called
"rates at 2007-05-15" etc. Delete them as soon as they're obsolete if
you wish, but at least it avoids confusion of mapping an old
"currrent" onto a new file with a changed rate. If you don't do this
then you are losing most of the advantages of RSS.

You can still make "current" 302 redirect to this week's file.

There's a separate commercial decision to be made as to whether you
want to have your historical rate history visible so easily (by
leaving the old files available). It's your call (but if you ever
make this information publically visible even temporarily, someone
will make a business out of recording it and selling histories of it).
Obviously a single filename kills this anyway.

<description></description>
Put something in there. Probably (for this one-item case) a
restatement of the channel's description.

There are several elements missing from <channel>. Some are important.

<pubDate>
This is vital, because it's how an aggregator identifies the channel /
item as having been updated. If you don't have it, and you don't
change the item link URL, then most correct aggregators will simply
see your content as stale and unchanging, even if the PDF contents
themselves are changing. Put this on both channel and item -- channel
is just the latest pubDate across all <item>s, so in your case they're
currently the same.

<skipHours> & <ttl>
This is poorly done in RSS 2.0, but you should still use it. It's part
of how they hint at the update schedule for the channel. Personally
I'd use the RSS 1.0 syndication module instead, or as well.
<http://web.resource.org/rss/1.0/modules/syndication/>

<copyright>
This can be important, particularly if you wish to indicate that
financial information brokers can't republish your content. I suggest
reading the Creative Commons site for advice on indicating this.

<managingEditor>
It's now a legal requirement for UK commercial feeds to include this
(with some wiggle room for the technical details of "how"), so as to
identify the legal entity publishing this business communication. I'm
sure US retail banking laws have similar requirements.

There are also elements missing from <item>. Some are already
described, some important.

Remember that many syndication / aggregation environments syndicate
_items_, not _channels_. They'll strip out the items they want from
several sources of channel, then republish them as an aggregation. If
you want to swim in this world, make sure that your <item>s carry the
appropriate metadata, don't just stick it once one the overall channel
and hope.

<guid>
This is essential if you expect any syndication to work. It's how they
recognise <item>s that are different or (in conjunction with pubDate)
have been updated. Don't use isPermaLink=true though unless you're
disambiguated between each weeks' set of rates (as I suggest anyway).

<enclosure>
Your linked content is a PDF, so it's unclear as to whether it ought
to be addressed via a <link> or via <enclosure>. It's possible to use
either. It's better to not use a PDF at all, but to use HTML (with my
Semantic Web pointy hat on). In that case you'd clearly use a <link>
and we'd all start building a world of automatically machine-readable
smart content, intelligent agents and all the rest of it.

However you probably have a corporate brand manager who forces you to
use a PDF so that they can control the exact choice of corporate
typeface. This is a Bad and Wrong policy and the sooner these
dinosaurs are put out to grass the better, but I appreciate that it
happens. So is a PDF a piece of "web content" (use <link>) or is it a
monstrous great piece of opaque brochureware that's only fit to be
downloaded and printed, with no hope of ever being automatically read
and used by agents (use <enclosure>).

Thank you, Andy.

I'll try your suggestions!

:)
 
J

Joseph Kesselman

Quick reminder, not directed only at Ed: Please remember to trim quotes!
Reposting a hundred lines of text just to add seven words of thanks is
not a very good use of Internet resources (or of readers' time).

In general, your new text should be larger than what you're quoting,
with a *bit* of leeway allowed when the quote itself is also short.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,981
Messages
2,570,188
Members
46,732
Latest member
ArronPalin

Latest Threads

Top