Hpricot test for equivalence of two xml segments?

  • Thread starter Xeno Campanoli / Eskimo North and Gmail
  • Start date
X

Xeno Campanoli / Eskimo North and Gmail

I'm looking through what documentation I can find for Hpricot (nokogirl wouldn't
install for me, and I just wand a quick an simple solution), and I cannot find a
simple method to take two xml strings and find out if they are equivalent. I'm
getting a bunch of xhmtl back from our rendering agent with random permutations
of attributes inside of the tags, and I want a quick and easy ruby way to find
out of segments are equivalent without writing my own regex based parser...???

It seems like there should be a simple method for this. If I had written
Hpricot, equivalence of segments would have been the first method I would have
written...???

xc
 
A

Ammar Ali

[Note: parts of this message were removed to make it a legal post.]

On Sat, Jul 17, 2010 at 12:28 AM, Xeno Campanoli / Eskimo North and Gmail <
I'm looking through what documentation I can find for Hpricot (nokogirl
wouldn't install for me, and I just wand a quick an simple solution), and I
cannot find a simple method to take two xml strings and find out if they are
equivalent. I'm getting a bunch of xhmtl back from our rendering agent with
random permutations of attributes inside of the tags, and I want a quick and
easy ruby way to find out of segments are equivalent without writing my own
regex based parser...???


I can think of a few definitions for equivalence. One definition would
simply require unifying the case of both strings and checking if they are
the same. A second definition would require building a tree of the structure
in each string, including attributes, sorting it, and looping over them to
check if they contain the same elements (Nokogiri's XML::NodeSet does
something like this with ==). A third definition would build on the second
one, while treating certain tags as equivalent to other tags (for example q
is equivalent to blockquote).

What's *your* definition of equivalence for two xml documents or fragments?

Ammar
 
X

Xeno Campanoli / Eskimo North and Gmail

On Sat, Jul 17, 2010 at 12:28 AM, Xeno Campanoli / Eskimo North and Gmail<



I can think of a few definitions for equivalence. One definition would
simply require unifying the case of both strings and checking if they are
the same. A second definition would require building a tree of the structure
in each string, including attributes, sorting it, and looping over them to
check if they contain the same elements (Nokogiri's XML::NodeSet does
something like this with ==). A third definition would build on the second
one, while treating certain tags as equivalent to other tags (for example q
is equivalent to blockquote).

What's *your* definition of equivalence for two xml documents or fragments?

Ammar

The only thing I am concerned about is permutations of attributes inside the
tags. Everything else I'm seeing is regular. Is there something where I can
parse all the tags in a segment and tell if they are equivalent and just have
the attributes in different orders? I'm not even concerned about different tag
forms. We don't see that. A typical example is:

<li><img alt="alt text" src="/my/image/path/thingy.jpg" />My Text</li>

I need to have something that can help me judge such things as equivalent.
Again, I NEVER see tag permutations, but just attribute permutations.
Thank you for you response.

Sincerely, Xeno
 
M

Mike Dalessio

[Note: parts of this message were removed to make it a legal post.]

On Fri, Jul 16, 2010 at 6:52 PM, Xeno Campanoli / Eskimo North and Gmail <
The only thing I am concerned about is permutations of attributes inside
the tags. Everything else I'm seeing is regular. Is there something where
I can parse all the tags in a segment and tell if they are equivalent and
just have the attributes in different orders? I'm not even concerned about
different tag forms. We don't see that. A typical example is:



I need to have something that can help me judge such things as equivalent.
Again, I NEVER see tag permutations, but just attribute permutations.

You should take a look at Lorax:

http://github.com/flavorjones/lorax

which is Nokogiri-based.

Your definition of equivalence (the semantically correct one, imho) can be
tested with:

Lorax::Signature.new(Nokogiri::XML(string1).root).signature ==
Lorax::Signature.new(Nokogiri::XML(string2).root).signature

And note that Nokogiri will also alllow you to parse XML fragments.

HTH,
-m
 
X

Xeno Campanoli / Eskimo North and Gmail

I believe you. Nokogirl wouldn't install though...yes, and nor did Lorax...

Looks like there's an install site, but I hesitate to use something this outside
the mainstream on a project like this. I don't want to impose needless
maintenance problems on my environment.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads

Install Hpricot gem failure 2
Hpricot question 0
[ANN] hpricot 0.7 23
Hpricot and XML 0
[ANN] Hpricot 0.6 -- the swift, delightful HTML parser 0
[ANN] hpricot 0.8 0
Hpricot and xpath 7
C++ equivalence of"Apply" 8

Members online

Forum statistics

Threads
473,968
Messages
2,570,154
Members
46,702
Latest member
LukasConde

Latest Threads

Top