Using XPath: find last text node of each paragraph under the rootnode.

D

Diego

I want to trim trailing whitespace at the end of all XHTML paragraphs.
I am using the REXML library.

Say I have the following in a valid XHTML file:

<p>hello <span>world</span> a </p>
<p>Hi there </p>
<p>The End </p>

I want to end up with this:

<p>hello <span>world</span> a</p>
<p>Hi there</p>
<p>The End</p>

So I was thinking what I could use XPath to get just the text nodes
that I want, then just trim the text, which would allow me to end up
with what I want (previous).

I started with the following XPath: //root/p/child::text()

Of course, the problem here is that it returns all text nodes that are
children of all p-tags. Which is this:

'hello '
' a '
'Hi there '
'The End '

Trying the following XPath gives me the last text node of the last
paragraph. Not the last text node of each paragraph that is a child of
the root node.

//root/p/child::text()[last()]

This only returns: 'The End '

What I would like to get from the XPath is therefore:

' a '
'Hi there '
'The End '

I have tried //root//p/child::text()[last()] on other XPath parsers,
and it works. Could just be a bug (or different interpretation of the
rules) by REXML?

Cheers, Diego
 
R

Robert Klemme

2008/11/3 Diego said:
I want to trim trailing whitespace at the end of all XHTML paragraphs.
I am using the REXML library.

Say I have the following in a valid XHTML file:

<p>hello <span>world</span> a </p>
<p>Hi there </p>
<p>The End </p>

I want to end up with this:

<p>hello <span>world</span> a</p>
<p>Hi there</p>
<p>The End</p>

So I was thinking what I could use XPath to get just the text nodes
that I want, then just trim the text, which would allow me to end up
with what I want (previous).

I started with the following XPath: //root/p/child::text()

Of course, the problem here is that it returns all text nodes that are
children of all p-tags. Which is this:

'hello '
' a '
'Hi there '
'The End '

Trying the following XPath gives me the last text node of the last
paragraph. Not the last text node of each paragraph that is a child of
the root node.

//root/p/child::text()[last()]

This only returns: 'The End '

What I would like to get from the XPath is therefore:

' a '
'Hi there '
'The End '

I have tried //root//p/child::text()[last()] on other XPath parsers,
and it works. Could just be a bug (or different interpretation of the
rules) by REXML?

Could well be both. When I try '//p/text()[last()]' I get only the
last node of the whole document. The issue seems to be the binding of
last() i.e. which collection it references or when it is applied. I
lean towards the bug variant.

One workaround would be to use a two step approach, i.e. first select
all <p> and then the last text:

irb(main):062:0> doc.elements.each('//p'){|x|
REXML::XPath.each(x,'text()[last()]'){|t|p t}}
" a "
"Hi there "
"The End "
=> [<p> ... </>, <p> ... </>, <p> ... </>]

Kind regards

robert
 
M

Mark Thomas

I have tried //root//p/child::text()[last()] on other XPath parsers,
and it works. Could just be a bug (or different interpretation of the
rules) by REXML?

Sounds like a bug to me too. Is there a reason you don't want to use a
parser like libxml-ruby, which is fully XPath 1.0 compliant (and will
give you a speed boost as well)?

-- Mark.
 
R

Robert Klemme

Is there a reason you don't want to use a
parser like libxml-ruby, which is fully XPath 1.0 compliant (and will
give you a speed boost as well)?

I can't speak for Diego but I use REXML because it's there and I do not
have to satisfy serious performance requirements when doing XML processing.

Cheers

robert
 
D

Diego

@Robert: As you suggest, a work around is in order. I had a look at
other XPath implementations and they were returning what I originally
expected. Just not REXML. At least now I am sure it's not just me. :)

@Mark: I would consider something other than REXML if (for example) I
maybe had performance issues using a work around. Or if there was not
easy work around. As Robert commented, I use it because it's there and
had not run in to any real problems prior to this. So I was happy with
it. Certainly if it was a show-stopper then I would switch to
something else. But thanks for the recommendation. Something to keep
in mind.

Cheers.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,982
Messages
2,570,190
Members
46,736
Latest member
zacharyharris

Latest Threads

Top