RSS and namespaces

P

Pete

I've been playing with writing my own RSS reader (in Ruby -- yes, I
know there are lots out there, but it's mostly for my own experience
and amusement). I tailored it to read the XML from the BBC feeds,
but it's been happy with others as well (occasionally needing minor
tweaking because of missing elements).

For interest, though, I downloaded a 'Have Your Say' forum RSS feed,
again from the BBC, and initially all I got in the HTML conversion
was a bunch of little titles, all the same... [That similarity was
expected because they're just the title of the thread.]

When I looked in detail at the source, I saw that, first, the 'description'
element was in CDATA form, which my reader wasn't set up for. I suppose
this makes sense, because some of the descriptions included HTML tags.
(Though looking at the 'XML FAQ' I see that the contents of CDATA are
*not* supposed to be inviolate from translation into entities, so I
gather that if I was using say XSLT for the conversion it wouldn't
work right anyway.)

However, the other strangeness was that important items like the
author and creationDate were in the 'jf:' namespace ("JiveSoftware").
This meant that I had to specifically add these tags to my reader
to be able to see this data. (The <item>s didn't have any <pubDate>
element at all.)

Isn't this sort of against the intent of RSS? Is this a case of
proprietary selfishness, or is there some good reason for using
a namespace to handle important generic parts of the content?
I see that some other feeds use 'itunes:' or other spaces, but
they seem to be for more specific data and don't really impact
reading by an app that doesn't know about them.

-- Pete --
 
A

Andy Dingley

Pete said:
When I looked in detail at the source, I saw that, first, the 'description'
element was in CDATA form,

There's a famously excellent article on DiveIntoMark.com that talks
about HTML encoding in RSS and the multiplicity of RSS versioning.

Isn't this sort of against the intent of RSS?

Sort of. It's not against RSS to do this (it's just adding a new
element), but it is against RSS to drop the important and useful
element in the default namespace. The fallacy is to think that they're
the same thing, or to produce a feed with <jf:pubDate> in it and think
that by doing this you're also implementing <pubDate>

Then again, there's never any shortage of Stupid in the RSS world 8-(

You might be interested to read some background on Dublin Core, in
particular the old "dumbing down" principle. DC has de-evolved a bit in
recent years but it used to have a very clear and very good strategy
for ad hoc extensions like this. You were allowed (and encouraged) to
add anything you liked, with the proviso that the qualification
mechanism (how you refine a standard general-purpose property into a
more specific one) will be unreliable for clients who don't understand
the new more-specific form. Provided that the new more-specific form
was still "meaningful" (or at least not misleading) in the general
context, then you were encouraged to add it anyway. The core element is
"creator" and refined forms might be "author", "editor" or "translator"
-- all of whom are still validly regardable as creators. Less capable
clients would only see the "dumbed-down" general form, but it's still
useful to them, for as great a value of "useful" as they're capable of
grasping.
 
P

Pete

There's a famously excellent article on DiveIntoMark.com that talks
about HTML encoding in RSS and the multiplicity of RSS versioning.

Thanks for the pointer. It's clear that there's a lot more disarray
in the field than I anticipated! :)-/)

And more specifically I see in several places that HTML within a
description is supposed to be entity-escaped. No mention of CDATA
anywhere... Eek. (And then there's the danger of arbitrary HTML...)
Sort of. It's not against RSS to do this (it's just adding a new
element), but it is against RSS to drop the important and useful
element in the default namespace. The fallacy is to think that they're
the same thing, or to produce a feed with <jf:pubDate> in it and think
that by doing this you're also implementing <pubDate>

Then again, there's never any shortage of Stupid in the RSS world 8-(

Looks like it. I'm sort of glad that I tend to write my own apps for
things like this -- and take a minimalist approach. Ignore everything
I'm not specifically interested in. I may have to keep tweaking every
time I find a new feed, though...
[Dublin Core snipped]
Skimmed some of the stuff on that. As you say, extending, rather than
replacing, seems the obvious way to go.

Cheers,
-- Pete --
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,981
Messages
2,570,188
Members
46,733
Latest member
LonaMonzon

Latest Threads

Top