Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > XML > XPath searching

Reply
Thread Tools

XPath searching

 
 
CxT
Guest
Posts: n/a
 
      04-13-2009
Hello,

I am very new to XPath. I *have* read through several online
tutorials though.

I have, what I think to be, a very basic question:

How do I find something specific in an HTML document using XPath?
What I mean is... I am looking for a specific <div class="foo"...
which might be nested 100 levels deep - I am trying to pull a stock
quote from http://moneycentral.msn.com/detail/s...ote?Symbol=IBM.

I'd like to use something like "*div[@class='foo']" but that doesn't
seem to be valid.

Any guidance would be much appreciated.

Thanks,
CxT
 
Reply With Quote
 
 
 
 
Martin Honnen
Guest
Posts: n/a
 
      04-13-2009
CxT wrote:

> How do I find something specific in an HTML document using XPath?


XPath is first of all defined on XML documents, not an HTML documents.
Depending on the implementation there are however ways to parse HTML
documents into a suitable data structure for XPath. Which XPath
implementation do you use?

> What I mean is... I am looking for a specific <div class="foo"...
> which might be nested 100 levels deep - I am trying to pull a stock
> quote from http://moneycentral.msn.com/detail/s...ote?Symbol=IBM.
>
> I'd like to use something like "*div[@class='foo']" but that doesn't
> seem to be valid.


//div

would select 'div' elements at all levels and then you can add your
predicate

//div[@class = 'foo']

and should filter out only those 'div' elements where the class
attribute has the value 'foo'.

--

Martin Honnen
http://JavaScript.FAQTs.com/
 
Reply With Quote
 
 
 
 
CxT
Guest
Posts: n/a
 
      04-13-2009
On Apr 13, 7:55*am, Martin Honnen <(E-Mail Removed)> wrote:
> http://moneycentral.msn.com/detail/s...ote?Symbol=IBM.
>
> > I'd like to use something like "*div[@class='foo']" but that doesn't
> > seem to be valid.

>
> //div
>
> would select 'div' elements at all levels and then you can add your
> predicate
>
> //div[@class = 'foo']
>
> and should filter out only those 'div' elements where the class
> attribute has the value 'foo'.


That definitely seems to work Martin - thank you!

Here is the block that receive:

<div class="bd">
<table>
<tr>
<td id="detail">
<table>
<tr class="rs0">
<th colspan="4"><span class="s1">119.57</span>
&nbsp;unch <a href="http://moneycentral.msn.com/investor/invsub/
advisor/advisor.asp?symbol=AAPL" class="fyistyle">fyi</a>&nbsp;&nbsp;</
th>
</tr>

I want to access that value of the span (class=s1) - 119.57. Do I
have to work my way down from each level (from the div)? For example
something like: "//div[@class = 'bd']/table/tr/td[@class = 'detail'" -
which again doesn't seem to be valid.

Thanks you for any guidance... once I understand how to iterate over
paths I think I should be good to do.

CxT
 
Reply With Quote
 
Martin Honnen
Guest
Posts: n/a
 
      04-13-2009
CxT wrote:

> Here is the block that receive:
>
> <div class="bd">
> <table>
> <tr>
> <td id="detail">
> <table>
> <tr class="rs0">
> <th colspan="4"><span class="s1">119.57</span>
> &nbsp;unch <a href="http://moneycentral.msn.com/investor/invsub/
> advisor/advisor.asp?symbol=AAPL" class="fyistyle">fyi</a>&nbsp;&nbsp;</
> th>
> </tr>
>
> I want to access that value of the span (class=s1) - 119.57. Do I
> have to work my way down from each level (from the div)? For example
> something like: "//div[@class = 'bd']/table/tr/td[@class = 'detail'" -
> which again doesn't seem to be valid.


A closing square bracket is missing:
//div[@class = 'bd']/table/tr/td[@class = 'detail']
is certainly a syntactically correct XPath expression.

On the other hand SGML/HTML parsing rules might insert an implied tbody so
//div[@class = 'bd']/table/tbody/tr/td[@class = 'detail']
could also be possible, depending on the parser used for parsing the HTML.

--

Martin Honnen
http://JavaScript.FAQTs.com/
 
Reply With Quote
 
Johannes Koch
Guest
Posts: n/a
 
      04-14-2009
Martin Honnen schrieb:
> CxT wrote:
>
>> Here is the block that receive:
>>
>> <div class="bd">
>> <table>
>> <tr>
>> <td id="detail">
>> <table>
>> <tr class="rs0">
>> <th colspan="4"><span class="s1">119.57</span>
>> &nbsp;unch <a href="http://moneycentral.msn.com/investor/invsub/
>> advisor/advisor.asp?symbol=AAPL" class="fyistyle">fyi</a>&nbsp;&nbsp;</
>> th>
>> </tr>
>>
>> I want to access that value of the span (class=s1) - 119.57. Do I
>> have to work my way down from each level (from the div)? For example
>> something like: "//div[@class = 'bd']/table/tr/td[@class = 'detail'" -
>> which again doesn't seem to be valid.

>
> A closing square bracket is missing:
> //div[@class = 'bd']/table/tr/td[@class = 'detail']
> is certainly a syntactically correct XPath expression.
>
> On the other hand SGML/HTML parsing rules might insert an implied tbody so
> //div[@class = 'bd']/table/tbody/tr/td[@class = 'detail']
> could also be possible, depending on the parser used for parsing the HTML.


Additionally, in the code fragment the td element has an _id_ attribute
with value "detail", not a _class_ attribute with that value.

--
Johannes Koch
In te domine speravi; non confundar in aeternum.
(Te Deum, 4th cent.)
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
"Memory leak" in javax.xml.xpath.XPath Marvin_123456 Java 4 07-29-2005 03:49 PM
XPath: efficiency in xpath expressions Tjerk Wolterink XML 1 11-13-2004 06:03 PM
Are there any XPath parsers that generate XPath trees? goog XML 0 01-14-2004 01:47 PM
XPath that does not include other XPath Anna XML 0 07-31-2003 07:55 AM
Problem selecting a node with XPATH if attribute value contains backslashes - how to force XPATH string to be treated as literal? Alastair Cameron XML 1 07-08-2003 07:24 PM



Advertisments