Webpage to RSS

B

Bart Braem

Does anyone have a ruby based script lying around that would transform
updates to a webpage to RSS or some other feed format?
We don't use a CMS for our website but there are items that are updated
often and RSS feeds might be appreciated. Someone must have done this
before I guess.
So is there a script that might do that? The categories are separated by h2
tags and the items are in li tags.

Bart
 
H

hemant

Hi,
Since this is such a specailized task, it really depends on the website you
are transforming. I would suggest you take a look at Hpricot(
http://code.whytheluckystiff.net/hpricot) and at the RSS class in the
standard library. It shouldn't be to hard to roll one up, and we can always
help. I am usually in #ruby-lang on freenode after 7 every night.

Rails Recipes Book by Chad Fowler has similar stuff. Almost ready for
use. If you don't have the book, still you can download the code
sample i guess.
 
W

why the lucky stiff

One question though: do you see a way of parsing a structure like this with
hpricot:

<h3>Structure 1</h3>
<h4>Substructre 1</h4>

<p>Substructure info</p>

<ul>

<li><a href="somefile">Somefiles description</a>. Addition date.</li>

I can cope with setting a date in the RSS, the problem is parsing this
structure. There is no surrounding element for the ul and I need both the
structure and the substructure information because the combination of those
too defines the effective identity of the ul and its items.
There seems to be no method to "give everything between to specific tags and
then go on to the next one"...

I'm not sure I understand exactly, but here's my impression of what you're
trying to do.

doc = Hpricot(html_string)
(doc/:h3).each do |ele|
rss_title = ele # okay, so you have the 3rd-level header
rss_contents = Hpricot::Elements[]

while ele = h3.next_sibling
rss_contents << ele
break if ele.respond_to?:)name) and ele.name == "ul"
end
end

So, basically, you can use `next_sibling` (or `previous_sibling`) to walk back
and forth between HTML brothers and sisters. I store it in an Hpricot::Elements
array, since you can then just call `rss_contents.to_html` or do other searches
on it.

This is available since changset [49], so you'll need to either install from SVN
or monkeypatch.

_why

[49] http://code.whytheluckystiff.net/hpricot/changeset/49
 
B

Bart Braem

Lutz said:
You could use hpricot (http://code.whytheluckystiff.net/hpricot/) to
parse the HTML and then use feedtools
(http://sporkmonger.com/articles/2005/08/11/tutorial/) to generate the
RSS.

Wow hpricot seems pretty nice, I noticed the hype but now I understand...
One question though: do you see a way of parsing a structure like this with
hpricot:

<h3>Structure 1</h3>
<h4>Substructre 1</h4>

<p>Substructure info</p>

<ul>

<li><a href="somefile">Somefiles description</a>. Addition date.</li>

I can cope with setting a date in the RSS, the problem is parsing this
structure. There is no surrounding element for the ul and I need both the
structure and the substructure information because the combination of those
too defines the effective identity of the ul and its items.
There seems to be no method to "give everything between to specific tags and
then go on to the next one"...

Thanks for the pointers
Bart
 
B

Bart Braem

why said:
I'm not sure I understand exactly, but here's my impression of what you're
trying to do.

doc = Hpricot(html_string)
(doc/:h3).each do |ele|
rss_title = ele  # okay, so you have the 3rd-level header
rss_contents = Hpricot::Elements[]

while ele = h3.next_sibling
rss_contents << ele
break if ele.respond_to?:)name) and ele.name == "ul"
end
end

So, basically, you can use `next_sibling` (or `previous_sibling`) to walk
back and forth between HTML brothers and sisters.  I store it in an
Hpricot::Elements array, since you can then just call
`rss_contents.to_html` or do other searches on it.

This is available since changset [49], so you'll need to either install
from SVN or monkeypatch.

The next_sibling and previous_sibling methods are just what I needed.
Now for an svn checkout...

Thanks a lot!
Bart
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,215
Messages
2,571,113
Members
47,708
Latest member
SharonMaes

Latest Threads

Top