python/xpath question...

bruce · Jul 6, 2006

for guys with python/xpath expertise..

i'm playing with xpath.. and i'm trying to solve an issue...

i have the following kind of situation where i'm trying to get certain data.

i have a bunch of tr/td...

i can create an xpath, that gets me all of the tr.. i only want to get the
sibling tr up until i hit a 'tr' that has a 'th' anybody have an idea as to
how this query might be created?..

the idea would be to start at the "Summer B", to skip the 1st "tr", to get
the next "tr"s until you get to the next "Summer" section...

sample data.....

<tr> <Th colspan=14 class="soc_comment"> Summer B </th> </tr>

<tr>
<td nowrap valign="bottom" class="colhelp">
<a href="#">Course
Course
 Course number and suffix, if applicable.
 C = combined lecture and lab course
 L = laboratory course
</a></td>
</tr>

<tr>
<td valign="top" nowrap><a href="javascript:crsdescunderpop('AST1002');">AST
1002</a></td>
</tr>
<tr>
<td valign="top" nowrap><a
href="javascript:crsdescunderpop('AST1022L');">AST 1022L</a></td>
</tr>
<tr>
<td valign="top" nowrap><a
href="javascript:crsdescunderpop('AST1022L');">AST 1022L</a></td>
</tr>
<tr>
<td valign="top" nowrap><a
href="javascript:crsdescunderpop('AST1022L');">AST 1022L</a></td>
</tr>
<tr> <Th colspan=14 class="soc_comment"> Summer C </th> </tr>

<tr>
<td nowrap valign="bottom" class="colhelp">
<a href="#">Course
..
..
..

thanks...

-bruce

Stefan Behnel · Jul 6, 2006

bruce said:
for guys with python/xpath expertise..

i'm playing with xpath.. and i'm trying to solve an issue...

i have the following kind of situation where i'm trying to get certain data.

i have a bunch of tr/td...

i can create an xpath, that gets me all of the tr.. i only want to get the
sibling tr up until i hit a 'tr' that has a 'th' anybody have an idea as to
how this query might be created?..

I'm not quite sure how this is supposed to be related to Python, but if you're
trying to find a sibling, what about using the "sibling" axis in XPath?

Stefan

John J. Lee · Jul 9, 2006

(Damn gmane's authorizor, I think I lost four postings because the
auth messages went to my work email address (and I thought the
authorization was supposed to be one-time only per group anyway??). I
deleted them as spam since I hadn't posted from there for days :-(
Grrr. At least I could reconstruct this one...)

bruce said:
for guys with python/xpath expertise..

i'm playing with xpath.. and i'm trying to solve an issue...

i have the following kind of situation where i'm trying to get certain data.

i have a bunch of tr/td...

i can create an xpath, that gets me all of the tr.. i only want to get the
sibling tr up until i hit a 'tr' that has a 'th' anybody have an idea as to
how this query might be created?..

[...]

((//tr/th)[2]/../following-sibling::tr/td/..)[count(.|((//tr/th)[3]/../preceding-sibling::*))=count((//tr/th)[3]/../preceding-sibling::*)]

which makes use of the following idiom for writing an intersection:

$set1[count(.|$set2)=count($set2)]

and gets the second group in the sequence you describe. IMHO, this
illustrates what happens when XPath is pushed too far ;-) I don't see
an easier way, but perhaps I missed one.

Example code:

(Note that the expression used here doesn't get any trailing group of
tr elements if there's no terminating tr/th -- that fits your
specification, but may not be what you really wanted. To fix that,
meditate on the above expression for an hour or two <0.8 wink>.)

#---------------------------------------------------------
def xpath(path, source):
import StringIO
import pprint
from lxml import etree
f = StringIO.StringIO(source)
tree = etree.parse(f)
r = tree.xpath(path)
#return "\n".join(etree.tostring(el) for el in r)
return pprint.pformat([etree.tostring(el) for el in r])

simple = """\
<html>
<tr><th>A</th></tr>
<tr><td>B</td></tr>
<tr><td>C</td></tr>
<tr><th>D</th></tr>
<tr><td>E</td></tr>
<tr><td>F</td></tr>
<tr><th>G</th></tr>
<tr><td>H</td></tr>
<tr><td>I</td></tr>
</html>
"""

for i in range(3):
expr = '((//tr/th)[%s]/../following-sibling::tr/td/..)[count(.|((//tr/th)[%s]/../preceding-sibling::*))=count((//tr/th)[%s]/../preceding-sibling::*)]' % (i+1, i+2, i+2)
print "---------------------"
print xpath(expr, simple)
#---------------------------------------------------------

john[0]$ tst.py
---------------------
['<tr><td>B</td></tr>\n', '<tr><td>C</td></tr>\n']
---------------------
['<tr><td>E</td></tr>\n', '<tr><td>F</td></tr>\n']
---------------------
[]

Knowing what you're doing, though, you'd probably be better off with
BeautifulSoup than XPath. Also note that mechanize (which I know
you're using) only supports BeautifulSoup 2 at present. You can't use
BeautifulSoup 3 yet (I hope to fix that 'RSN').

John

John J. Lee · Jul 9, 2006

Stefan Behnel said:
I'm not quite sure how this is supposed to be related to Python, but if you're
trying to find a sibling, what about using the "sibling" axis in XPath?

<nit>
There's no "sibling" axis in XPath. I'm sure you meant
"following-sibling" and/or "preceding-sibling".
</nit>

John

Nested Loop Insert Page Break	1	Nov 5, 2021
Sort by number of characters	1	Nov 2, 2023
Can someone tell me if this a real tracker? Or is it one designed to show you a different message at certain times, ie. acting like one?	0	Jan 10, 2021
Need help with <rowspan> in an HTML table	1	Nov 6, 2024
Angularjs newbie - second JSON datasource does not display	0	May 18, 2022
HTML Table Issue	1	Aug 29, 2022
Only one table shows up with the information	2	Mar 29, 2023
Can anyone please help? HTML - two tables applying different styles	4	Dec 1, 2020

python/xpath question...

bruce

Stefan Behnel

John J. Lee

John J. Lee

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads