Velocity Reviews

Velocity Reviews (http://www.velocityreviews.com/forums/index.php)
-   Ruby (http://www.velocityreviews.com/forums/f66-ruby.html)
-   -   How to get data from html table (http://www.velocityreviews.com/forums/t835832-how-to-get-data-from-html-table.html)

Vikash Kumar 11-27-2006 11:20 AM

How to get data from html table
 
I want to store the values of a table in different variables, I have the
following table structure:

<table width="579">
<tr class="even">
<td class width="65">&nbsp;Case5-04</td>
<td class width="130">10/11/2006 23:24:33</td>
<td class width="61">Case5-04</td>
<td class width="32">1005</td>
<td class width="59">Sell</td>
<td class width="36">1,000</td>
<td class width="34">ARP</td>
<td class width="52">$36.90</td>
</tr>
<tr class="odd">
<td class width="65">&nbsp;Case5-03</td>
<td class width="130">10/11/2006 23:20:07</td>
<td class width="61">Case5-03</a></td>
<td class width="32">1005</td>
<td class width="59">Buy</td>
<td class width="36">1,500</td>
<td class width="34">ARP</td>
<td class width="52">$36.70</td>
</tr>
<tr class="even">
<td class width="65">&nbsp;Case4-04</td>
<td class width="130">10/11/2006 05:28:54</td>
<td class width="61">Case4-04</a></td>
<td class width="32">1004</td>
<td class width="59">Sell</td>
<td class width="36">300</td>
<td class width="34">RIL</td>
<td class width="52">$490.00</td>
</tr>
<tr class="odd">
<td class width="65">&nbsp;Case4-03</td>
<td class width="130">10/11/2006 05:21:32</td>
<td class width="61">Case4-03</a></td>
<td class width="32">1004</td>
<td class width="59">Buy</td>
<td class width="36">200</td>
<td class width="34">RIL</td>
<td class width="52">$489.90</td>
</tr>
</table>

I want to store the values in variables so that I can compare records.
Please help me out how to do this in ruby.

--
Posted via http://www.ruby-forum.com/.


Peter Szinek 11-27-2006 11:51 AM

Re: How to get data from html table
 
> I want to store the values in variables so that I can compare records.
> Please help me out how to do this in ruby.


One possible way:

Record = Struct.new("Record", :name, :date, :name_again, :some_num,
:buy_link, :some_num2, :letters, :price)
records = []

doc = Hpricot(doc)
stuff = doc/"/table/tr/td"

elements = stuff.map { |elem| elem.inner_html }.each_slice(8) do |slice|
records << Record.new(*slice)
end

p records.sort_by {|record| record.price.slice(1..record.size) }

Note that since I did not know the semantics of the table cells,
sometimes the Struct Record has some weird fields in it, but you get the
idea.


Also I am not 100% sure if the sort_by should not be done on to_f-d
prices (probably not due to rounding problems, but I wonder if there can
be some weird string issues, too).

HTH,
Peter

__
http://www.rubyrailways.com



Park Heesob 11-27-2006 04:32 PM

Re: How to get data from html table
 

Hi,

>From: Vikash Kumar <vikashkumar051@gmail.com>
>Reply-To: ruby-talk@ruby-lang.org
>To: ruby-talk@ruby-lang.org (ruby-talk ML)
>Subject: How to get data from html table
>Date: Mon, 27 Nov 2006 20:20:54 +0900
>
>I want to store the values of a table in different variables, I have the
>following table structure:
>
><table width="579">
> <tr class="even">
> <td class width="65">&nbsp;Case5-04</td>
> <td class width="130">10/11/2006 23:24:33</td>
> <td class width="61">Case5-04</td>
> <td class width="32">1005</td>
> <td class width="59">Sell</td>
> <td class width="36">1,000</td>
> <td class width="34">ARP</td>
> <td class width="52">$36.90</td>
> </tr>
> <tr class="odd">
> <td class width="65">&nbsp;Case5-03</td>
> <td class width="130">10/11/2006 23:20:07</td>
> <td class width="61">Case5-03</a></td>
> <td class width="32">1005</td>
> <td class width="59">Buy</td>
> <td class width="36">1,500</td>
> <td class width="34">ARP</td>
> <td class width="52">$36.70</td>
> </tr>
> <tr class="even">
> <td class width="65">&nbsp;Case4-04</td>
> <td class width="130">10/11/2006 05:28:54</td>
> <td class width="61">Case4-04</a></td>
> <td class width="32">1004</td>
> <td class width="59">Sell</td>
> <td class width="36">300</td>
> <td class width="34">RIL</td>
> <td class width="52">$490.00</td>
> </tr>
> <tr class="odd">
> <td class width="65">&nbsp;Case4-03</td>
> <td class width="130">10/11/2006 05:21:32</td>
> <td class width="61">Case4-03</a></td>
> <td class width="32">1004</td>
> <td class width="59">Buy</td>
> <td class width="36">200</td>
> <td class width="34">RIL</td>
> <td class width="52">$489.90</td>
> </tr>
></table>
>
>I want to store the values in variables so that I can compare records.
>Please help me out how to do this in ruby.
>

Here is another way:

After saving the html table text to file 'w.xml',
You can deal the value like this:

require 'rexml/document'
include REXML
doc = Document.new File.new("w.xml")
doc.elements.each("*/tr/td") {|e|
puts e.texts
}


Regards,

Park Heesob

__________________________________________________ _______________
FREE pop-up blocking with the new MSN Toolbar - get it now!
http://toolbar.msn.click-url.com/go/...ave/direct/01/



Peter Szinek 11-27-2006 07:33 PM

Re: How to get data from html table
 
Hello,

> Digression: when solving a problem like this, it is often much easier to
> write a few lines of HTML than to try to use a high-powered library to
> accomplish it.


I don't see why is it an advantage here. The first solution in this thread:

-------------------------------------------------------------------
Record = Struct.new("Record", :name, :date, :name_again, :some_num,
:buy_link, :some_num2, :letters, :price)
records = []

cells = Hpricot(doc)/"/table/tr/td"

cells.map { |elem| elem.inner_html }.each_slice(8) do |slice|
records << Record.new(*slice)
end

p records.sort_by {|record| record.price.slice(1..record.size) }
------------------------------------------------------------------

is shorter, does not care about malformed HTML and even does the sorting
which I believe was the main intention of the OP. So why not use a
high-powered library?

Discalimer: that solution was actually mine but I am not referring to it
because of this, but rather because I think that parsing all the cells
with a one liner using a robust HTML parser is actually much better in
practice than to use a basic set of regexps and then patch the results
they yield with ad-hoc rules (missing close tags etc) looked up from 3
examples. I believe the above HPricot-powered solution will work with
100 records, too (if the other 97 does not get *really* messed up - but
in that case the regexps will fail miserably too) whereas the
we-do-not-need-any-high-powered-library approach may need another 25
patches due to the other errors in the 100-record HTML...

I do not argue that parsing the page with regexps and seeing what's
going on under the hood can provide a lot of experience, but I am really
sure that feeding a real life page to a HTML parser is safer than to use
the regexp approach.

Of course if this question is just a theoretical one, and there won't be
100 (or more than 3) records, just these 3, then forget about this mail.

Cheers,
Peter

__
http://www.rubyrailways.com





Vikash Kumar 11-28-2006 10:03 AM

Re: How to get data from html table
 
> #!/usr/bin/ruby -w
>
> data = File.read(sourcefilename)
>
> output = []
>
> html_rows = data.scan(%r{<tr.*?>(.*?)</tr>}im).flatten
>
> html_rows.each do |row|
> # filter these undesired elements
> row.gsub!("&nbsp;","")
> row.gsub("</a>","")
> cells = row.scan(%r{<td.*?>(.*?)</td>}im).flatten
> output << cells
> end
>
> # done collecting, now display
>
> output.each do |row|
> line = row.join(",")
> puts line
> end
>


What will be right solution if some one wants to get the data from yahoo
site http://finance.yahoo.com/q?s=IBM and then displaying only some
values such as Prev Close, Last Trade. Lets suppose we go to the URL
through :

require 'watir'
include Watir
require 'hpricot'
include Hpricot
ie=Watir::IE.new
ie.goto("http://finance.yahoo.com/q?s=IBM")

Now, whats next. Also let suppose we want to get all the values of
table, we don't know the table structure then what what should be the
correct solution ?

--
Posted via http://www.ruby-forum.com/.



All times are GMT. The time now is 05:09 AM.

Powered by vBulletin®. Copyright ©2000 - 2013, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.


1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57