(RE)XML question

R

rubyhacker

Question for you all. I want to treat HTML like XML
(which is no big deal).

But I want to find certain "special" tags (not real
HTML) and replace them with my own text.

It's macro-type stuff. Basically I want to output
the *same* HTML except for the text that replaced
the special tags.

I can't find any examples of generating XML with
REXML. It should be easy, I don't want it to be
too hard.

Contrived example below in case it helps.

How would you do this?

Thanks,
Hal


Input:

<html>
<body>
<p>Hi, there.</p>
<foo bar="this" bam="that">some more text</foo>
<p>That's all.</p>
</body>
</html>

Output:
<html>
<body>
<p>Hi, there.</p>
<p>I found a foo tag enclosing 'some more text' with
bar and bam values of 'this' and 'that'...</p>
<p>That's all.</p>
</body>
</html>
 
W

why the lucky stiff

Input:

<html>
<body>
<p>Hi, there.</p>
<foo bar="this" bam="that">some more text</foo>
<p>That's all.</p>
</body>
</html>

Output:
<html>
<body>
<p>Hi, there.</p>
<p>I found a foo tag enclosing 'some more text' with
bar and bam values of 'this' and 'that'...</p>
<p>That's all.</p>
</body>
</html>

So, in Hpricot:

doc = Hpricot("<html>...</html>")
doc.search("foo").each do |ele|
new_ele = Hpricot \
'<p>I found a ' + ele.name + " tag enclosing \'" +
ele.inner_html + "' with " + ele.attributes.keys.join(' and ') +
" values of " + ele.attributes.values.map { |x| "'#{x}'" }.join(' and ') +
"...</p>"
ele.parent.replace_child(ele, new_ele.children.first)
end
puts doc

REXML has a replace_child as well. But now you've motivated me to add Element#replace.

_why
 
H

Hal Fulton

why said:
So, in Hpricot:

doc = Hpricot("<html>...</html>")
doc.search("foo").each do |ele|
new_ele = Hpricot \
'<p>I found a ' + ele.name + " tag enclosing \'" +
ele.inner_html + "' with " + ele.attributes.keys.join(' and ') +
" values of " + ele.attributes.values.map { |x| "'#{x}'" }.join(' and ') +
"...</p>"
ele.parent.replace_child(ele, new_ele.children.first)
end
puts doc

REXML has a replace_child as well. But now you've motivated me to add Element#replace.

Hmm, the right thing to do and a tasty way to do it.

This motivates me to download Hpricot for the first time
and try it. Probably tomorrow as my brane is fride.


Thanks,
Hal
 
I

Ilan Berci

unknown said:
It's macro-type stuff. Basically I want to output
the *same* HTML except for the text that replaced
the special tags.

This is what XSLT was designed for and it may provide another option for
you..

ilan
 
P

Phrogz

Contrived example below in case it helps.

input = <<ENDHTML
<html>
<body>
<p>Hi, there.</p>
<foo bar="this" bam="that">some more text</foo>
<p>That's all.</p>
</body>
</html>
ENDHTML

require 'rexml/document'
doc = REXML::Document.new( input )
doc.root.each_element( '//foo' ){ |e|
new_para = REXML::Element.new( 'p' )
new_para.text = "I found a foo tag enclosing '#{e.text}' with bar and
bam values of '#{e.attributes['bar']}' and '#{e.attributes['bam']}'..."
e.parent.replace_child( e, new_para )
}
puts doc

#=> <html>
#=> <body>
#=> <p>Hi, there.</p>
#=> <p>I found a foo tag enclosing &apos;some more text&apos; with bar
and bam values of &apos;this&apos; and &apos;that&apos;...</p>
#=> <p>That's all.</p>
#=> </body>
#=> </html>
 
R

rubyhacker

Ilan said:
This is what XSLT was designed for and it may provide another option for
you..

That makes sense. I've never used XSLT, but I'm sure that's
a viable solution.

_Why's Hpricot example worked perfectly for me, BTW.

So, a related question.

Suppose I wanted to "nest" macros of this kind. Something like:

<mac1 foo="1" bar="2>My name is
<mac2 baz="3" bam="4">seed-value</mac2>
today.</mac1>

Forgive the nonsense example.

Could XSLT handle this easily? Could Hpricot (_why)?


Thanks,
Hal
 
W

William James

Question for you all. I want to treat HTML like XML
(which is no big deal).

But I want to find certain "special" tags (not real
HTML) and replace them with my own text.

It's macro-type stuff. Basically I want to output
the *same* HTML except for the text that replaced
the special tags.

I can't find any examples of generating XML with
REXML. It should be easy, I don't want it to be
too hard.

Contrived example below in case it helps.

How would you do this?

Thanks,
Hal


Input:

<html>
<body>
<p>Hi, there.</p>
<foo bar="this" bam="that">some more text</foo>
<p>That's all.</p>
</body>
</html>

Output:
<html>
<body>
<p>Hi, there.</p>
<p>I found a foo tag enclosing 'some more text' with
bar and bam values of 'this' and 'that'...</p>
<p>That's all.</p>
</body>
</html>

require 'xml-split.rb'

tag = 'foo'
DATA.read.xml_split(tag).each {|stuff|
if stuff.class == String
print stuff
else
attr = stuff[0].xml_parse
puts "<p>I found a #{tag} tag enclosing '#{stuff[1]}' with"
print "#{attr.keys.join(' and ')} values of "
print "'#{attr.values.join("' and '")}'...</p>"
end
}

__END__
<html>
<body>
<p>Hi, there.</p>
<foo bar="this" bam="that">some more text</foo>
<p>That's all.</p>
</body>
</html>

---- output ----

<html>
<body>
<p>Hi, there.</p>
<p>I found a foo tag enclosing 'some more text' with
bam and bar values of 'that' and 'this'...</p>
<p>That's all.</p>
</body>
</html>
 
I

Ilan Berci

unknown said:
_Why's Hpricot example worked perfectly for me, BTW.

So, a related question.

Suppose I wanted to "nest" macros of this kind. Something like:

<mac1 foo="1" bar="2>My name is
<mac2 baz="3" bam="4">seed-value</mac2>
today.</mac1>

Forgive the nonsense example.

Could XSLT handle this easily? Could Hpricot (_why)?


Thanks,
Hal

Yes, both techniques could handle nested elements, I don't know what XML
tools you are using, but many come with XSLT support built in. XSLT
allows any XML(XHTML) doc to be transformed into any other. At one time
it was slated to replace .CSS but that never seeemed to materialize.
Now days, it's mostly used in report generation and xml rpc filtering
but it ofcourse has many uses. The disadvantages of XSLT is that it can
be rather challenging to debug and it can grow to be very verbose in non
trivial transformations. The advantage is that it is a W3C standard and
practically every platform/language has support for it in one form or
another.

I have no experience of Hpricot but if you are already using Ruby as
your main processor then I would probably stick with Hpricot as the
solutions above look much cleaner than an XSLT solution :) Oh.. and
lastly, if you don't use XSLT/XPath on a regular basis, you can easily
forget it's symantics and have to keep referring back to the docs or at
least I have to.


ilan
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,210
Messages
2,571,091
Members
47,692
Latest member
RolandRose

Latest Threads

Top