![]() |
How to get data from html table
I want to store the values of a table in different variables, I have the
following table structure: <table width="579"> <tr class="even"> <td class width="65"> Case5-04</td> <td class width="130">10/11/2006 23:24:33</td> <td class width="61">Case5-04</td> <td class width="32">1005</td> <td class width="59">Sell</td> <td class width="36">1,000</td> <td class width="34">ARP</td> <td class width="52">$36.90</td> </tr> <tr class="odd"> <td class width="65"> Case5-03</td> <td class width="130">10/11/2006 23:20:07</td> <td class width="61">Case5-03</a></td> <td class width="32">1005</td> <td class width="59">Buy</td> <td class width="36">1,500</td> <td class width="34">ARP</td> <td class width="52">$36.70</td> </tr> <tr class="even"> <td class width="65"> Case4-04</td> <td class width="130">10/11/2006 05:28:54</td> <td class width="61">Case4-04</a></td> <td class width="32">1004</td> <td class width="59">Sell</td> <td class width="36">300</td> <td class width="34">RIL</td> <td class width="52">$490.00</td> </tr> <tr class="odd"> <td class width="65"> Case4-03</td> <td class width="130">10/11/2006 05:21:32</td> <td class width="61">Case4-03</a></td> <td class width="32">1004</td> <td class width="59">Buy</td> <td class width="36">200</td> <td class width="34">RIL</td> <td class width="52">$489.90</td> </tr> </table> I want to store the values in variables so that I can compare records. Please help me out how to do this in ruby. -- Posted via http://www.ruby-forum.com/. |
Re: How to get data from html table
> I want to store the values in variables so that I can compare records.
> Please help me out how to do this in ruby. One possible way: Record = Struct.new("Record", :name, :date, :name_again, :some_num, :buy_link, :some_num2, :letters, :price) records = [] doc = Hpricot(doc) stuff = doc/"/table/tr/td" elements = stuff.map { |elem| elem.inner_html }.each_slice(8) do |slice| records << Record.new(*slice) end p records.sort_by {|record| record.price.slice(1..record.size) } Note that since I did not know the semantics of the table cells, sometimes the Struct Record has some weird fields in it, but you get the idea. Also I am not 100% sure if the sort_by should not be done on to_f-d prices (probably not due to rounding problems, but I wonder if there can be some weird string issues, too). HTH, Peter __ http://www.rubyrailways.com |
Re: How to get data from html table
Hi, >From: Vikash Kumar <vikashkumar051@gmail.com> >Reply-To: ruby-talk@ruby-lang.org >To: ruby-talk@ruby-lang.org (ruby-talk ML) >Subject: How to get data from html table >Date: Mon, 27 Nov 2006 20:20:54 +0900 > >I want to store the values of a table in different variables, I have the >following table structure: > ><table width="579"> > <tr class="even"> > <td class width="65"> Case5-04</td> > <td class width="130">10/11/2006 23:24:33</td> > <td class width="61">Case5-04</td> > <td class width="32">1005</td> > <td class width="59">Sell</td> > <td class width="36">1,000</td> > <td class width="34">ARP</td> > <td class width="52">$36.90</td> > </tr> > <tr class="odd"> > <td class width="65"> Case5-03</td> > <td class width="130">10/11/2006 23:20:07</td> > <td class width="61">Case5-03</a></td> > <td class width="32">1005</td> > <td class width="59">Buy</td> > <td class width="36">1,500</td> > <td class width="34">ARP</td> > <td class width="52">$36.70</td> > </tr> > <tr class="even"> > <td class width="65"> Case4-04</td> > <td class width="130">10/11/2006 05:28:54</td> > <td class width="61">Case4-04</a></td> > <td class width="32">1004</td> > <td class width="59">Sell</td> > <td class width="36">300</td> > <td class width="34">RIL</td> > <td class width="52">$490.00</td> > </tr> > <tr class="odd"> > <td class width="65"> Case4-03</td> > <td class width="130">10/11/2006 05:21:32</td> > <td class width="61">Case4-03</a></td> > <td class width="32">1004</td> > <td class width="59">Buy</td> > <td class width="36">200</td> > <td class width="34">RIL</td> > <td class width="52">$489.90</td> > </tr> ></table> > >I want to store the values in variables so that I can compare records. >Please help me out how to do this in ruby. > Here is another way: After saving the html table text to file 'w.xml', You can deal the value like this: require 'rexml/document' include REXML doc = Document.new File.new("w.xml") doc.elements.each("*/tr/td") {|e| puts e.texts } Regards, Park Heesob __________________________________________________ _______________ FREE pop-up blocking with the new MSN Toolbar - get it now! http://toolbar.msn.click-url.com/go/...ave/direct/01/ |
Re: How to get data from html table
Hello,
> Digression: when solving a problem like this, it is often much easier to > write a few lines of HTML than to try to use a high-powered library to > accomplish it. I don't see why is it an advantage here. The first solution in this thread: ------------------------------------------------------------------- Record = Struct.new("Record", :name, :date, :name_again, :some_num, :buy_link, :some_num2, :letters, :price) records = [] cells = Hpricot(doc)/"/table/tr/td" cells.map { |elem| elem.inner_html }.each_slice(8) do |slice| records << Record.new(*slice) end p records.sort_by {|record| record.price.slice(1..record.size) } ------------------------------------------------------------------ is shorter, does not care about malformed HTML and even does the sorting which I believe was the main intention of the OP. So why not use a high-powered library? Discalimer: that solution was actually mine but I am not referring to it because of this, but rather because I think that parsing all the cells with a one liner using a robust HTML parser is actually much better in practice than to use a basic set of regexps and then patch the results they yield with ad-hoc rules (missing close tags etc) looked up from 3 examples. I believe the above HPricot-powered solution will work with 100 records, too (if the other 97 does not get *really* messed up - but in that case the regexps will fail miserably too) whereas the we-do-not-need-any-high-powered-library approach may need another 25 patches due to the other errors in the 100-record HTML... I do not argue that parsing the page with regexps and seeing what's going on under the hood can provide a lot of experience, but I am really sure that feeding a real life page to a HTML parser is safer than to use the regexp approach. Of course if this question is just a theoretical one, and there won't be 100 (or more than 3) records, just these 3, then forget about this mail. Cheers, Peter __ http://www.rubyrailways.com |
Re: How to get data from html table
> #!/usr/bin/ruby -w
> > data = File.read(sourcefilename) > > output = [] > > html_rows = data.scan(%r{<tr.*?>(.*?)</tr>}im).flatten > > html_rows.each do |row| > # filter these undesired elements > row.gsub!(" ","") > row.gsub("</a>","") > cells = row.scan(%r{<td.*?>(.*?)</td>}im).flatten > output << cells > end > > # done collecting, now display > > output.each do |row| > line = row.join(",") > puts line > end > What will be right solution if some one wants to get the data from yahoo site http://finance.yahoo.com/q?s=IBM and then displaying only some values such as Prev Close, Last Trade. Lets suppose we go to the URL through : require 'watir' include Watir require 'hpricot' include Hpricot ie=Watir::IE.new ie.goto("http://finance.yahoo.com/q?s=IBM") Now, whats next. Also let suppose we want to get all the values of table, we don't know the table structure then what what should be the correct solution ? -- Posted via http://www.ruby-forum.com/. |
| All times are GMT. The time now is 05:09 AM. |
Powered by vBulletin®. Copyright ©2000 - 2013, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.