Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > python/xpath question...

Reply
Thread Tools

python/xpath question...

 
 
bruce
Guest
Posts: n/a
 
      07-06-2006
for guys with python/xpath expertise..

i'm playing with xpath.. and i'm trying to solve an issue...

i have the following kind of situation where i'm trying to get certain data.

i have a bunch of tr/td...

i can create an xpath, that gets me all of the tr.. i only want to get the
sibling tr up until i hit a 'tr' that has a 'th' anybody have an idea as to
how this query might be created?..


the idea would be to start at the "Summer B", to skip the 1st "tr", to get
the next "tr"s until you get to the next "Summer" section...

sample data.....

<tr> <Th colspan=14 class="soc_comment"> Summer B </th> </tr>
<!-- START RA.CTLIB(SOCPHDR1) -->
<tr>
<td nowrap valign="bottom" class="colhelp">
<a href="#">Course<span>
<b>Course</b>
<br>Course number and suffix, if applicable.
<br>C = combined lecture and lab course
<br>L = laboratory course
</span></a></td>
</tr>
<!-- END RA.CTLIB(SOCPHDR1) -->
<tr>
<td valign="top" nowrap><a href="javascript:crsdescunderpop('AST1002');">AST
1002</a></td>
</tr>
<tr>
<td valign="top" nowrap><a
href="javascript:crsdescunderpop('AST1022L');">AST 1022L</a></td>
</tr>
<tr>
<td valign="top" nowrap><a
href="javascript:crsdescunderpop('AST1022L');">AST 1022L</a></td>
</tr>
<tr>
<td valign="top" nowrap><a
href="javascript:crsdescunderpop('AST1022L');">AST 1022L</a></td>
</tr>
<tr> <Th colspan=14 class="soc_comment"> Summer C </th> </tr>
<!-- START RA.CTLIB(SOCPHDR1) -->
<tr>
<td nowrap valign="bottom" class="colhelp">
<a href="#">Course<span>
..
..
..

thanks...

-bruce


 
Reply With Quote
 
 
 
 
Stefan Behnel
Guest
Posts: n/a
 
      07-06-2006
bruce wrote:
> for guys with python/xpath expertise..
>
> i'm playing with xpath.. and i'm trying to solve an issue...
>
> i have the following kind of situation where i'm trying to get certain data.
>
> i have a bunch of tr/td...
>
> i can create an xpath, that gets me all of the tr.. i only want to get the
> sibling tr up until i hit a 'tr' that has a 'th' anybody have an idea as to
> how this query might be created?..


I'm not quite sure how this is supposed to be related to Python, but if you're
trying to find a sibling, what about using the "sibling" axis in XPath?

Stefan
 
Reply With Quote
 
 
 
 
John J. Lee
Guest
Posts: n/a
 
      07-09-2006
(Damn gmane's authorizor, I think I lost four postings because the
auth messages went to my work email address (and I thought the
authorization was supposed to be one-time only per group anyway??). I
deleted them as spam since I hadn't posted from there for days
Grrr. At least I could reconstruct this one...)

"bruce" <> writes:

> for guys with python/xpath expertise..
>
> i'm playing with xpath.. and i'm trying to solve an issue...
>
> i have the following kind of situation where i'm trying to get certain data.
>
> i have a bunch of tr/td...
>
> i can create an xpath, that gets me all of the tr.. i only want to get the
> sibling tr up until i hit a 'tr' that has a 'th' anybody have an idea as to
> how this query might be created?..

[...]

((//tr/th)[2]/../following-sibling::tr/td/..)[count(.|((//tr/th)[3]/../preceding-sibling::*))=count((//tr/th)[3]/../preceding-sibling::*)]


which makes use of the following idiom for writing an intersection:

$set1[count(.|$set2)=count($set2)]


and gets the second group in the sequence you describe. IMHO, this
illustrates what happens when XPath is pushed too far I don't see
an easier way, but perhaps I missed one.

Example code:

(Note that the expression used here doesn't get any trailing group of
tr elements if there's no terminating tr/th -- that fits your
specification, but may not be what you really wanted. To fix that,
meditate on the above expression for an hour or two <0.8 wink>.)

#---------------------------------------------------------
def xpath(path, source):
import StringIO
import pprint
from lxml import etree
f = StringIO.StringIO(source)
tree = etree.parse(f)
r = tree.xpath(path)
#return "\n".join(etree.tostring(el) for el in r)
return pprint.pformat([etree.tostring(el) for el in r])

simple = """\
<html>
<tr><th>A</th></tr>
<tr><td>B</td></tr>
<tr><td>C</td></tr>
<tr><th>D</th></tr>
<tr><td>E</td></tr>
<tr><td>F</td></tr>
<tr><th>G</th></tr>
<tr><td>H</td></tr>
<tr><td>I</td></tr>
</html>
"""

for i in range(3):
expr = '((//tr/th)[%s]/../following-sibling::tr/td/..)[count(.|((//tr/th)[%s]/../preceding-sibling::*))=count((//tr/th)[%s]/../preceding-sibling::*)]' % (i+1, i+2, i+2)
print "---------------------"
print xpath(expr, simple)
#---------------------------------------------------------


john[0]$ tst.py
---------------------
['<tr><td>B</td></tr>\n', '<tr><td>C</td></tr>\n']
---------------------
['<tr><td>E</td></tr>\n', '<tr><td>F</td></tr>\n']
---------------------
[]


Knowing what you're doing, though, you'd probably be better off with
BeautifulSoup than XPath. Also note that mechanize (which I know
you're using) only supports BeautifulSoup 2 at present. You can't use
BeautifulSoup 3 yet (I hope to fix that 'RSN').


John
 
Reply With Quote
 
John J. Lee
Guest
Posts: n/a
 
      07-09-2006
Stefan Behnel <stefan.behnel-> writes:
[...]
> I'm not quite sure how this is supposed to be related to Python, but if you're
> trying to find a sibling, what about using the "sibling" axis in XPath?


<nit>
There's no "sibling" axis in XPath. I'm sure you meant
"following-sibling" and/or "preceding-sibling".
</nit>


John
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off




Advertisments