xpath to get the first sibling

Y

yawnmoth

<div>
<strong>Name: </strong> John Doe<br />
<strong>Phone Number: </strong> 111-111-1111<br />
<strong>Address: </strong> 111 Anywhere St.
</div>

Say that's my HTML. I can get the node whose value is "Name: " with
"//strong[.='Name: ']", but how do I get " John Doe"? <http://
www.w3schools.com/XPath/xpath_axes.asp> mentions "following-sibling"
but neither "//strong[.='Name: ']//following-sibling" or ""//strong
[.='Name: ']//following-sibling::*" yield any results.

Any ideas?
 
J

Joe Kesselman

yawnmoth said:
<div>
<strong>Name: </strong> John Doe<br />
<strong>Phone Number: </strong> 111-111-1111<br />
<strong>Address: </strong> 111 Anywhere St.
</div>

Say that's my HTML. I can get the node whose value is "Name: " with
"//strong[.='Name: ']", but how do I get " John Doe"?

* matches the default node-type, which is usually elements. If you want
it to match a text node, you need to say so.

What you're looking for is:
//strong[.='Name: ']/following-sibling::text()[1]
.... the first text node after the <strong> whose value is "Name: '.
(Note that if you don't specify [1], XPath will return all the following
siblings -- which would include the <br/>, the next <strong>, and
everything else up to the </div>.)

Mixed content tends to be a pain to work with in XML and XSLT. You may
want to consider structuring this more semantically, eg as a two-column
table.
 
P

Peter Flynn

Joe said:
yawnmoth said:
<div>
<strong>Name: </strong> John Doe<br />
<strong>Phone Number: </strong> 111-111-1111<br />
<strong>Address: </strong> 111 Anywhere St.
</div>

Say that's my HTML. I can get the node whose value is "Name: " with
"//strong[.='Name: ']", but how do I get " John Doe"?

* matches the default node-type, which is usually elements. If you want
it to match a text node, you need to say so.

What you're looking for is:
//strong[.='Name: ']/following-sibling::text()[1]
... the first text node after the <strong> whose value is "Name: '.
(Note that if you don't specify [1], XPath will return all the following
siblings -- which would include the <br/>, the next <strong>, and
everything else up to the </div>.)

Mixed content tends to be a pain to work with in XML and XSLT. You may
want to consider structuring this more semantically, eg as a two-column
table.

Alternative: use
<span class="label">Name:</span> <span class="name">John Doe</span>
etc

The abuse of <strong> emphasis is one of the unfortunate results of the
politically-correct sanitisation of HTML undertaken by the W3C.

Adding leading and trailing spaces to text nodes in mixed content just
to make it look pretty is usually A Bad Idea if the document is going to
be reprocessed.

However, if this is someone else's document, you just have to deal with it.

///Peter
 
Y

yawnmoth

<snip>
However, if this is someone else's document, you just have to deal with it.
That's what it is. If I was parsing my own HTML, I'd just stick with
what I know and wouldn't have put myself in a position to work with
something I don't.

Of course, by doing that, I'd also be missing an opportunity to learn
something new, so I can't be too annoyed with it, heh.

Thanks, Joe and Peter!
 
Joined
Sep 10, 2011
Messages
1
Reaction score
0
Joe Kesselman's syntax above worked for me once I removed the parentheses, as in: //strong[.='Name: ']/following-sibling::text[1] ... so to select the very next following sibling of any tag type, .../following-sibling::*[1]/... could be used, or for instance if you wanted to select not the very next div sibling, but rather the one after that, then .../following-sibling::div[2]/... should work.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,001
Messages
2,570,254
Members
46,851
Latest member
CliftonCor

Latest Threads

Top