Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > RE: xpath question

Reply
Thread Tools

RE: xpath question

 
 
bruce
Guest
Posts: n/a
 
      07-02-2006
hi

is there anyone with XPath expertise here? i'm trying to figure out if
there's a way to use regex expressions with an xpath query? i've seen
references to the ability to use regex and xpath/xml, but i'm not sure how
to do it...

i have a situation where i have something like:
/html/table/..../[@class='foo']

is it possible to do soomething like [@class~=/fo/] so i'd match the class
attribute with fo....

i'm trying to parse HTML/Web docs...

thanks

-bruce


 
Reply With Quote
 
 
 
 
Simon Forman
Guest
Posts: n/a
 
      07-02-2006
bruce wrote:
> hi
>
> is there anyone with XPath expertise here? i'm trying to figure out if
> there's a way to use regex expressions with an xpath query? i've seen
> references to the ability to use regex and xpath/xml, but i'm not sure how
> to do it...
>
> i have a situation where i have something like:
> /html/table/..../[@class='foo']
>
> is it possible to do soomething like [@class~=/fo/] so i'd match the class
> attribute with fo....
>
> i'm trying to parse HTML/Web docs...
>
> thanks
>
> -bruce


I'll take this one...

Dude, this is a *python* mailing list, not an xml/xpath/regex one. In
addition, the regex syntax you're using above (~=/fo/) looks like
*perl* code-- but I wouldn't know 'cause I don't use perl myself.

Now it's entirely possible that there are *many* people here that are
xml/xpath/regex Kung Fu Masters, *and* it's entirely possible that one
or more of them are about to answer your question informatively and in
exhaustive detail. It's also entirely possible that this is the most
friendly and informative reply that you're going to get, here.


Try a more appropriate newsgroup, and good luck.

 
Reply With Quote
 
 
 
 
Simon Forman
Guest
Posts: n/a
 
      07-03-2006
bruce wrote:
> simon..
>
> you may not.. but lot's of people use python and xpath for html/xml
> functionality.. check google "python xpath"...
>
> later..
>

....
> > i have a situation where i have something like:
> > /html/table/..../[@class='foo']
> >
> > is it possible to do soomething like [@class~=/fo/] so i'd match the class
> > attribute with fo....
> >



So I did some checking, starting with the google search you suggested,
and I found out that lxml, 4Suite, and Amara (which is apparently based
on 4Suite somehow) all seem to be capable of doing what you're talking
about. I don't know how to do it with lxml, but I bet the people on
the lxml mailing list would be happy to explain it to you. As for
Amara and 4Suite I think it might be as simple as saying "Match(your
regex here in python re module form)" in your Xpath statement..


In the meantime, you could just use Xpath to extract a superset of the
elements you're interested in and then filter them with a re.Match
object.


I avoid xml if I can help it... My new favorite HTML editor, however,
is python and ElementTree...

 
Reply With Quote
 
uche.ogbuji@gmail.com
Guest
Posts: n/a
 
      07-03-2006
bruce wrote:
> is there anyone with XPath expertise here? i'm trying to figure out if
> there's a way to use regex expressions with an xpath query? i've seen
> references to the ability to use regex and xpath/xml, but i'm not sure how
> to do it...
>
> i have a situation where i have something like:
> /html/table/..../[@class='foo']
>
> is it possible to do soomething like [@class~=/fo/] so i'd match the class
> attribute with fo....
>
> i'm trying to parse HTML/Web docs...


4Suite [1] supports regex in XPath using the EXSLT community standard's
regex module [2]. It would be something like:

[re:match(@class, 'fo.*']

With the re prefix set as required by the EXSLT module.

[1] http://4Suite.org
[2] http://www.exslt.org/regexp/

--
Uche Ogbuji Fourthought, Inc.
http://uche.ogbuji.net http://fourthought.com
http://copia.ogbuji.net http://4Suite.org
Articles: http://uche.ogbuji.net/tech/publications/

 
Reply With Quote
 
Simon Forman
Guest
Posts: n/a
 
      07-03-2006
http://www.velocityreviews.com/forums/(E-Mail Removed) wrote:
> bruce wrote:
> > is there anyone with XPath expertise here? i'm trying to figure out if
> > there's a way to use regex expressions with an xpath query? i've seen
> > references to the ability to use regex and xpath/xml, but i'm not sure how
> > to do it...
> >
> > i have a situation where i have something like:
> > /html/table/..../[@class='foo']
> >
> > is it possible to do soomething like [@class~=/fo/] so i'd match the class
> > attribute with fo....
> >
> > i'm trying to parse HTML/Web docs...

>
> 4Suite [1] supports regex in XPath using the EXSLT community standard's
> regex module [2]. It would be something like:
>
> [re:match(@class, 'fo.*']
>
> With the re prefix set as required by the EXSLT module.
>
> [1] http://4Suite.org
> [2] http://www.exslt.org/regexp/
>
> --
> Uche Ogbuji Fourthought, Inc.
> http://uche.ogbuji.net http://fourthought.com
> http://copia.ogbuji.net http://4Suite.org
> Articles: http://uche.ogbuji.net/tech/publications/


Well shut my mouth! There *is* an xml/xpath python Guru here.

*sigh* Sorry Bruce, (and everybody else on this newsgroup) I apologize
for mouthing off and not contributing to a greater signal-to-noise
ratio.

I guess that should teach me not to post so quickly when I'm in a bad
mood. I'll do better in the future.


Peace,
~Simon

 
Reply With Quote
 
Stefan Behnel
Guest
Posts: n/a
 
      07-04-2006
(E-Mail Removed) wrote:
> bruce wrote:
>> is there anyone with XPath expertise here? i'm trying to figure out if
>> there's a way to use regex expressions with an xpath query? i've seen
>> references to the ability to use regex and xpath/xml, but i'm not sure how
>> to do it...
>>
>> i have a situation where i have something like:
>> /html/table/..../[@class='foo']
>>
>> is it possible to do soomething like [@class~=/fo/] so i'd match the class
>> attribute with fo....
>>
>> i'm trying to parse HTML/Web docs...

>
> 4Suite [1] supports regex in XPath using the EXSLT community standard's
> regex module [2]. It would be something like:
>
> [re:match(@class, 'fo.*']
>
> With the re prefix set as required by the EXSLT module.


Same for lxml, although it's currently only enabled in XSLT:
http://codespeak.net/lxml/api.html#xslt

Guess I should change that for 1.1...

Stefan
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
"Memory leak" in javax.xml.xpath.XPath Marvin_123456 Java 4 07-29-2005 03:49 PM
XPath: efficiency in xpath expressions Tjerk Wolterink XML 1 11-13-2004 06:03 PM
Are there any XPath parsers that generate XPath trees? goog XML 0 01-14-2004 01:47 PM
XPath that does not include other XPath Anna XML 0 07-31-2003 07:55 AM
Problem selecting a node with XPATH if attribute value contains backslashes - how to force XPATH string to be treated as literal? Alastair Cameron XML 1 07-08-2003 07:24 PM



Advertisments