Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Ruby > extracing the URL from hpricot element

Thread Tools

extracing the URL from hpricot element

Nikita Ratlos
Posts: n/a
I want to get a list of URLs from a webpage as follows:

First I create the Hpricot element as follows
doc = Hpricot(open(searchurl))

links = doc/"//html//body//div[6]//div[2]//a[@id='p-1']" +#

Next I want to append the URLs to an array as such:

results <<{|link| puts link.attributes['href'] }

The line nicely prints out the URLs how I need them, but then
puts the whole HTML link in the results array.

Any ideas how to get the URLs (without the HTML) into my results array ?
Posted via

Reply With Quote

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off

Similar Threads
Thread Thread Starter Forum Replies Last Post
Re: Extracing Track Information Evan Platt Computer Support 0 09-24-2008 06:48 PM
Extracing data from webpage srinivasan srinivas Python 2 09-11-2008 12:00 PM
Linux, extracing symbol table to read core dump Johannes Bauer C Programming 2 11-08-2007 06:01 PM
extracing .PAC archives Travis Computer Information 2 07-17-2007 08:22 AM
VHDL and extracing equations buke2 VHDL 2 07-28-2004 02:14 PM