[ANN] XHTMLDiff 1.0.0

A

Aredridel

Since today seems to be the day for document diffing tools, here's mine.

I'd like to announce XHTMLDiff 1.0.0, available at
http://theinternetco.net/projects/ruby/xhtmldiff for your consumption.

XHTMLDiff takes valid XHTML as input, and generates valid XHTML with
redlining tags (<ins> and <del>) as output. Valid input documents
should generate valid output.

It diffs down to the paragraph level at the moment. A future version
will search down to the word.

Prerequisites are REXML, Diff::LCS, and delegate.rb

Bug reports are welcome.

Aredridel.
 
A

Austin Ziegler

Since today seems to be the day for document diffing tools, here's mine.

I'd like to announce XHTMLDiff 1.0.0, available at
http://theinternetco.net/projects/ruby/xhtmldiff for your consumption.

XHTMLDiff takes valid XHTML as input, and generates valid XHTML with
redlining tags (<ins> and <del>) as output. Valid input documents
should generate valid output.

It diffs down to the paragraph level at the moment. A future version
will search down to the word.

Prerequisites are REXML, Diff::LCS, and delegate.rb

Cool. Always nice to see people using something I wrote :)

Are there any requests for improvements with Diff::LCS, Aredridel?

-austin
 
A

Aredridel

e to see people using something I wrote :)
Are there any requests for improvements with Diff::LCS, Aredridel?

None at the moment -- the functional interface is very pleasant, and
the library makes no assumptions about type, so I could ducktype to my
hearts content. Nicely and solidly written, and it's good to see the
McIlroy-Hunt algorithm spelled out in Ruby where I totally grok it,
rather than locked up in Perl or Smalltalk (which I've read, but was
never sure I really got)

Ari
 
A

Aredridel

e to see people using something I wrote :)
Are there any requests for improvements with Diff::LCS, Aredridel?

None at the moment -- the functional interface is very pleasant, and
the library makes no assumptions about type, so I could ducktype to my
hearts content. Nicely and solidly written, and it's good to see the
McIlroy-Hunt algorithm spelled out in Ruby where I totally grok it,
rather than locked up in Perl or Smalltalk (which I've read, but was
never sure I really got)

Ari
 
F

Francis Hwang

This looks quite cool. Are there any plans to generalize this to XML in
general? I can think of lots of good ways to use this if it's more
broadly applicable to XML.
 
A

Aredridel

This looks quite cool. Are there any plans to generalize this to XML in
general? I can think of lots of good ways to use this if it's more
broadly applicable to XML.

Not at the moment, since it satisfies my need, and differencing on XML
is a slightly different task, and much easier, or much harder
depending. XHTML has to satisfy the XHTML DTD, and so there's specific
places and specific tags to use to mark changes.

With XML, it would either have to be arbitrarily defined (easy), or
according to each flavor's DTD (hard).

I'm up for it when I get some free time, if someone wanted to specify
what they needed.

Ari
 
F

Francis Hwang

Well, in my case I wanted to compare two different RSS 2.0 feeds. Which
doesn't seem to have a DTD, harrumph. I'll be quite happy when we all
move to Atom ...
 
A

Aredridel

Well, in my case I wanted to compare two different RSS 2.0 feeds. Which
doesn't seem to have a DTD, harrumph. I'll be quite happy when we all
move to Atom ...

Ah, yes -- atom or RSS 1.0 (for comparison, RSS 1.0 would be even nicer)

What sort of interface would you want to compare RSS data? A list of
things that have changed since a previous run? A highlighted list? An
XML sort of patch? Would you want it at the API level, or a textually
annotated set of changes?
 
F

Francis Hwang

Ah, yes -- atom or RSS 1.0 (for comparison, RSS 1.0 would be even
nicer)

What sort of interface would you want to compare RSS data? A list of
things that have changed since a previous run? A highlighted list? An
XML sort of patch? Would you want it at the API level, or a textually
annotated set of changes?

What I was doing was refactoring some RSS code that didn't have enough
tests, so I wanted to compare pretty much every element of the
resulting RSS. I ended up just eyeballing it in an aggregator, which
seems to have worked out okay but still wasn't ideal.

I'd want a pretty granular comparison, and at the API level would be
ideal. I don't mind having to do a little work to format the changes
into readable output. Also, maybe having API-level information would
make it easier for me to filter out certain differences.

F.
 
A

Aredridel

What I was doing was refactoring some RSS code that didn't have enough
tests, so I wanted to compare pretty much every element of the
resulting RSS. I ended up just eyeballing it in an aggregator, which
seems to have worked out okay but still wasn't ideal.

I'd want a pretty granular comparison, and at the API level would be
ideal. I don't mind having to do a little work to format the changes
into readable output. Also, maybe having API-level information would
make it easier for me to filter out certain differences.

Hm. Sounds like <ins> and <del> equivalents might be perfect, though
really, you want exact, full-tree diffs. That's a simpler task,
really. Honestly, it sounds like raw Diff::LCS might be the tool you
want -- parse both with REXML, and then hit the trees with Diff::LCS
-- the gotcha being the way REXML deals with containers. Steal the
proxy class from XHTMLDiff and that should be all you need.

Ari
 
F

Francis Hwang

Hm. Sounds like <ins> and <del> equivalents might be perfect, though
really, you want exact, full-tree diffs. That's a simpler task,
really. Honestly, it sounds like raw Diff::LCS might be the tool you
want -- parse both with REXML, and then hit the trees with Diff::LCS
-- the gotcha being the way REXML deals with containers. Steal the
proxy class from XHTMLDiff and that should be all you need.

Sounds good! I'll give that a try sometime and maybe write a tiny
how-to on my blog.

F.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,160
Messages
2,570,889
Members
47,421
Latest member
StacyTaver

Latest Threads

Top