[ANN] ClothRed (HTML to Textile)

P

Phillip Gawlowski

I'm pleased to announce, that I've begun working on a small library to
convert HTML into Textile.

Please forgive me, that this announcement isn't yet following the
community's standards, but I'm slowly getting there.

For the curious, the website and project on RuybForge have gone online
*and* have some content[0].

For the impatient:
ClothRed will be exactly the reverse of RedCloth: It will grab any HTML
string, and convert it into Textile.

As a bonus, ClothRed will strip all HTML that is not being converted
into Textile's markup from the text, making it, hopefully, usable for
sanitizing HTML.

I hope to have an Alpha release out by the end of next month.

Links:
[0] http://clothred.rubyforge.org/

--
Phillip "CynicalRyan" Gawlowski
http://cynicalryan.110mb.com/

Rule of Open-Source Programming #5:

A project is never finished.
 
J

Jacob Fugal

I'm pleased to announce, that I've begun working on a small library to
convert HTML into Textile. ...
ClothRed will be exactly the reverse of RedCloth: It will grab any HTML
string, and convert it into Textile.

As a bonus, ClothRed will strip all HTML that is not being converted
into Textile's markup from the text, making it, hopefully, usable for
sanitizing HTML.

I hope to have an Alpha release out by the end of next month.

Awesome, Phillip. I really look forward to using this!

Jacob Fugal
 
D

Daniel DeLorme

Phillip said:
ClothRed will be exactly the reverse of RedCloth: It will grab any HTML
string, and convert it into Textile.

As a bonus, ClothRed will strip all HTML that is not being converted
into Textile's markup from the text, making it, hopefully, usable for
sanitizing HTML.

Looks interesting, but I hope there would be a mode to preserve unknown
HTML in addition to the "lossy" mode. Sanitizing HTML is good but if you
convert the resulting Textile to HTML and it doesn't look like the
original, that's not too good IMHO.

Daniel
 
P

Phillip Gawlowski

Daniel said:
Looks interesting, but I hope there would be a mode to preserve unknown
HTML in addition to the "lossy" mode. Sanitizing HTML is good but if you
convert the resulting Textile to HTML and it doesn't look like the
original, that's not too good IMHO.

To do that, there'll probably be two different modes of HTML stripping:
* One "strict": Every thing that cannot be parsed by ClothRed will be
thrown out.
* One "loose": All HTML that ClothRed cannot preserve will be kept, and
warnings will be emitted (either to stdout, or stderr, or both).

The latter will not be usable for sanitizing HTML, as "unknown" HTML
*should* be treated as malicious (specifically, as there is no "unknown"
HTML in the W3C specs).

--
Phillip "CynicalRyan" Gawlowski
http://cynicalryan.110mb.com/

Rule of Open-Source Programming #33:

Don't waste time on writing test cases and test scripts - your users are
your best testers.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,233
Members
46,821
Latest member
AleidaSchi

Latest Threads

Top