how to stuff HTML into RSS??

L

lkrubner

Me and some friends are working on some PHP based templates for web
pages. We've templates that look like this (simplified):

<html>
<head>
<title>
The green and blue design for carpentry companies
</title>
</head>
<body>
<?php showMainContent(); ?>
<div style="width:200px; float:right">
<?php showLinkArea(3); ?>
</div>
</body>
</html>


I'd like to publish all the templates in our database in an RSS feed so
it will be easier to import them on other sites. Does it screw things
up if I stuff HTML into the DESCRIPTION tag on an RSS .91 feed?
 
A

Andy Dingley

Does it screw things
up if I stuff HTML into the DESCRIPTION tag on an RSS .91 feed?

It's not what you stuff, it's how you stuff it.

You should encode HTML, so that

<description><p>Some <b>HTML</b> in RSS</description>

becomes this

<description>&lt;p&gt;Some &lt;b&gt;HTML&lt;/b&gt; in
RSS&lt;/p&gt;</description>

Watch out as well for & (becomes &amp;) and for &eacute; etc. (turn
them into the equivalent numeric entity)

I'd also suggest that you make your HTML fragments into well-formed,
balanced XHTML fragments before you embed them (lower case element
names, close open elements). Although this isn't required, it can make
life easier with XML toolsets.

This stuff isn't hard to do, but it's very poorly documented. There
are many RSS versions, and few of them describe it fully. This is a
useful read
http://diveintomark.org/archives/2004/02/04/incompatible-rss


I'd also avoid the obsolete RSS 0.91 in favour of RSS 1.0 (far
better), or you might prefer the more popular RSS 2.0
 
J

Joris Gillis

You should encode HTML, so that
<description><p>Some <b>HTML</b> in RSS</description>

becomes this

<description>&lt;p&gt;Some &lt;b&gt;HTML&lt;/b&gt; in
RSS&lt;/p&gt;</description>
Hi,

I don't know anything about RSS, but wouldn'it be easier and more logical to insert the XHTML as elements using namespaces? And if that wouldn't be possible yet, shouldn't it become possible?

regards,
 
A

Andy Dingley

I don't know anything about RSS,

I suggest you read the Dive Into Mark article. It explains some of the
background to this and is a good explanation.
http://diveintomark.org/archives/2004/02/04/incompatible-rss

RSS has suffered because of too many standards, and especially because
these standards have generally been poorly specified. In particular
there is no clear guidance on how to embed HTML content within an RSS
item.

A problem with RSS, and all such protocols that try to become an open
publication medium, is that many creators will make content and many
consumers will try to read it. Where the spec isn't exhaustive on how
it _must_ be done, then a situation soon develops of de facto
behaviour for how it _is_ done. Readers become dependent on this, and
you diverge from it at your peril.
but wouldn'it be easier and more logical to insert the XHTML as elements using namespaces?

That's an attractive option. However it's not a viable one.
There are several reasons:

Namespacing relies on using XHTML, and you may wish to include HTML
_as_HTML_ not XHTML. Some consumers may be confused if they receive
XHTML

Namespacing relies on including a balanced fragment (i.e. one that can
be well-formed as as XML fragment). This wasn't a requirement on the
original RSS/HTML enclosure, so this is hard to re-impose in some
cases (<a name="..." > is one of the more awkward cases to deal
with).

RSS is not an XML protocol. Successive versions of badly-written specs
have clouded this. There are all sorts of references of "ASCII" when
it should really be CDATA. It's commonplace to include HTML entities,
even when these aren't valid outside the HTML DTD. Reliable parsing
of RSS from external sources is a mess, and it often relies on
knife-and-fork parsing with non-XML tools. It's not reliable to
assume good support for standard XML features if you're working with
external feeds, even though you "should" be able to do this.
And if that wouldn't be possible yet, shouldn't it become possible?

RSS is old. It's post-XML, but pre-XHTML and (arguably)
pre-namespacing. So even if a namespaced approach became widespread,
consumers should (strongly) keep supporting the old way if they still
want to accept content supplied that way.

I use namespaced content for internal RSS feeds within my projects,
where I always use RSS 1.0. For external work though, I encode plain
HTML. I use balanced fragments, so I close elements like <p>...</p>,
but I don't use the <br /> form for <br>
 
J

Joris Gillis

I suggest you read the Dive Into Mark article. It explains some of the
background to this and is a good explanation.
http://diveintomark.org/archives/2004/02/04/incompatible-rss

RSS has suffered because of too many standards, and especially because
these standards have generally been poorly specified. In particular
there is no clear guidance on how to embed HTML content within an RSS
item.

A problem with RSS, and all such protocols that try to become an open
publication medium, is that many creators will make content and many
consumers will try to read it. Where the spec isn't exhaustive on how
it _must_ be done, then a situation soon develops of de facto
behaviour for how it _is_ done. Readers become dependent on this, and
you diverge from it at your peril.


That's an attractive option. However it's not a viable one.
There are several reasons:

Namespacing relies on using XHTML, and you may wish to include HTML
_as_HTML_ not XHTML. Some consumers may be confused if they receive
XHTML

Namespacing relies on including a balanced fragment (i.e. one that can
be well-formed as as XML fragment). This wasn't a requirement on the
original RSS/HTML enclosure, so this is hard to re-impose in some
cases (<a name="..." > is one of the more awkward cases to deal
with).

RSS is not an XML protocol. Successive versions of badly-written specs
have clouded this. There are all sorts of references of "ASCII" when
it should really be CDATA. It's commonplace to include HTML entities,
even when these aren't valid outside the HTML DTD. Reliable parsing
of RSS from external sources is a mess, and it often relies on
knife-and-fork parsing with non-XML tools. It's not reliable to
assume good support for standard XML features if you're working with
external feeds, even though you "should" be able to do this.


RSS is old. It's post-XML, but pre-XHTML and (arguably)
pre-namespacing. So even if a namespaced approach became widespread,
consumers should (strongly) keep supporting the old way if they still
want to accept content supplied that way.

I use namespaced content for internal RSS feeds within my projects,
where I always use RSS 1.0. For external work though, I encode plain
HTML. I use balanced fragments, so I close elements like <p>...</p>,
but I don't use the <br /> form for <br>

Now that what I call a valuable reply:)
Thank you very much.
 
P

Peter Flynn

Me and some friends are working on some PHP based templates for web
pages. We've templates that look like this (simplified):

<html>
<head>
<title>
The green and blue design for carpentry companies
</title>
</head>
<body>
<?php showMainContent(); ?>
<div style="width:200px; float:right">
<?php showLinkArea(3); ?>
</div>
</body>
</html>


I'd like to publish all the templates in our database in an RSS feed so
it will be easier to import them on other sites. Does it screw things
up if I stuff HTML into the DESCRIPTION tag on an RSS .91 feed?

Yes. Implementations of RSS readers are almost all hopelessly broken and
non-conformant, and the RSS "spec" -- such as it is -- has been so kicked
about and bastardised as to be virtually worthless except as a carrier
format like HTML. There were plans to make a newer, better version, but
like HTML it has now become so fossilised that it's not worth changing.

///Peter
 
L

lkrubner

Thank you for your in-depth reply. I've already read Mark's article and
one thing I got from it was that it didn't matter much which version of
RSS you used, they were all broken.

For now I'm in the lucky position of being the consumer of my own
output. We have some HTML templates we'd like publish, but we are
publishing them for people who have our software, so we control the
source and the point of consumption. I'd love to eventualy use a richer
RSS but I'm short on time this month and so I'd like to reuse what PHP
code we already have written and tested. The code we have puts out
valid RSS .91.

To publish an HTML template in the description tag of RSS, should I
just wrap it in a CDATA tag? Or escape it as someone ablove remarked.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,999
Messages
2,570,243
Members
46,836
Latest member
login dogas

Latest Threads

Top