Basic info needed on RSS feeds

D

danieldryhurst

I'm trying to create my own RSS feed which will grab some headlines
from external sites and parse them into one xml document.

The reason I want to do this is experimentation and there is currently
no rss feed item available for my chosen subject so I'm grabbing it
from various places; (I'm also planning it to integrate into a custom
deskbar I'm making with MioFactory so the xml document needs a
particular format).

I tried something called MyWebfeeds demo and it pulled off some news
links for http://www.liverpoolfc.tv/news/ (try it to see what I mean).
I would like to get a script that does this (obtain the source code for
this maybe - if any of you know how they have coded it would be nice).

Cheers to all who offer assistance.
 
S

syeates

I'm trying to create my own RSS feed which will grab some headlines
from external sites and parse them into one xml document.

The mistake you appear to be making is thinking that the tag soup
people serve up as RSS is actually XML. Commonly it is not XML and
even when it is the character encodings are often incorrect.
However, software is available to do what you seem to want to be
doing, check out the list at wikipedia:

http://en.wikipedia.org/wiki/List_of_news_aggregators

cheers
stuart
 
D

danieldryhurst

Thank you for the reply.

While I read through that, I'll explain more fully what I want to be
able to do. Basically there is a site which has latest news on it (but
they have no <span class="rss:item"> tags). So what I need basically
is to write/find a free script will will run through the html and
retrieve all the head lines and export the data to an xml file that is
RSS compliant.

Hope this is a little clearer :).
 
P

Peter Flynn

Thank you for the reply.

While I read through that, I'll explain more fully what I want to be
able to do. Basically there is a site which has latest news on it (but
they have no <span class="rss:item"> tags). So what I need basically
is to write/find a free script will will run through the html and
retrieve all the head lines and export the data to an xml file that is
RSS compliant.

If their HTML is static over time (ie it's generated automatically, and
so is consistent even if corrupt), you may be able to use HTML Tidy to
turn it into XHTML which can then be used by XSLT to extract the bits
you want.

Example: if the junk-HTML produced by the site is consistent to the stage
that you know the headlines you want are always in the 15th, 17th, and 19th
<P> elements in the 3rd <div>, then a scripted conversion to XHTML and a
short XSLT file will let you extract the headlines and output them in the
form you want.

Tedious, clumsy, but it works.

///Peter
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,230
Members
46,819
Latest member
masterdaster

Latest Threads

Top