Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Ruby > Hpricot - best way to parse based on comments

Reply
Thread Tools

Hpricot - best way to parse based on comments

 
 
Jerome ---
Guest
Posts: n/a
 
      11-20-2006
I am trying to parse some files that contain comments like this:

<html>
<body>

<!-- BEGIN ad_content -->

images, text, etc...

<!-- END ad_content -->

Interesting text of site here.

</body>
</html>


I am wondering how to go about extracting the data within the comments
block using Hpricot. I am not aware of a way to refer to commented HTML
through CSS or XPath selectors.

Thanks for any ideas!

- Jerome

--
Posted via http://www.ruby-forum.com/.

 
Reply With Quote
 
 
 
 
Keith Fahlgren
Guest
Posts: n/a
 
      11-20-2006
On 11/20/06, Jerome --- <(E-Mail Removed)> wrote:
> I am trying to parse some files that contain comments like this:
> ...
> I am not aware of a way to refer to commented HTML
> through CSS or XPath selectors.


The XPath comment() selector will select all comments:

For example (xpath after -m flag):
keith@devel ~ $ xml sel -t -m '//comment()' -v '.' -n simple.xml
one comment
two comment

keith@devel ~ $ cat simple.xml
<simple>
<!-- one comment -->
<foo/>
<!-- two comment -->
<bar/>
</simple>


HTH,
Keith

 
Reply With Quote
 
 
 
 
Ken Bloom
Guest
Posts: n/a
 
      11-21-2006
On Tue, 21 Nov 2006 07:52:12 +0900, Jerome --- wrote:

> I am trying to parse some files that contain comments like this:
>
> <html>
> <body>
>
> <!-- BEGIN ad_content -->
>
> images, text, etc...
>
> <!-- END ad_content -->
>
> Interesting text of site here.
>
> </body>
> </html>
>
>
> I am wondering how to go about extracting the data within the comments
> block using Hpricot. I am not aware of a way to refer to commented HTML
> through CSS or XPath selectors.
>
> Thanks for any ideas!
>
> - Jerome
>


Why not gsub out the unwanted sections before parsing with hpricot, or
if the data you want is nested between comments, use a regexp to narrow
down the document to only the text between the comments before parsing
with hpricot?

--Ken Bloom

--
Ken Bloom. PhD candidate. Linguistic Cognition Laboratory.
Department of Computer Science. Illinois Institute of Technology.
http://www.iit.edu/~kbloom1/
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Hpricot wont parse <a> elements via XPath No Uu Ruby 1 05-25-2009 10:23 PM
Can I use Hpricot to parse data into different array elem? Christiaan Venter Ruby 1 05-22-2009 05:11 AM
using HPricot to parse a fiddly table Adam Dullenty Ruby 2 01-07-2008 12:49 AM
hpricot - parse html K. R. Ruby 3 01-03-2008 05:51 PM
Hpricot & mechanize fail to parse page after redirect Ehud Rosenberg Ruby 2 11-14-2007 09:18 PM



Advertisments