Velocity Reviews

Velocity Reviews (http://www.velocityreviews.com/forums/index.php)
-   Ruby (http://www.velocityreviews.com/forums/f66-ruby.html)
-   -   Parsing HTML using regexes and arrays. (http://www.velocityreviews.com/forums/t854073-parsing-html-using-regexes-and-arrays.html)

soldier.coder 11-07-2008 09:05 PM

Parsing HTML using regexes and arrays.
 
I have a nice little regex to pull the information rich guts from a
table....

%r{</thead.*?>(.*?)</table>}m =~html
# $1 now contains all the rows of the table as one long string.

I'd like to turn that into an array of rows, but I am not exactly sure
how.

Additionally, I'd like to process the rows so that i can get data from
between the nth <td></td> pair.

Any help?

Michael Libby 11-07-2008 10:33 PM

Re: Parsing HTML using regexes and arrays.
 
On Fri, Nov 7, 2008 at 3:08 PM, soldier.coder
<geekprogrammer.ed@googlemail.com> wrote:
> I have a nice little regex to pull the information rich guts from a
> table....
>
> %r{</thead.*?>(.*?)</table>}m =~html
> # $1 now contains all the rows of the table as one long string.
>
> I'd like to turn that into an array of rows, but I am not exactly sure
> how.
>
> Additionally, I'd like to process the rows so that i can get data from
> between the nth <td></td> pair.
>
> Any help?


If you have a string with a repeating pattern that you want an array
of, String#scan is your man.

irb(main):001:0> html = "<td>foo</td><td>bar</td>"
=> "<td>foo</td><td>bar</td>"
irb(main):002:0> a = html.scan(/<td>(.+?)<\/td>/)
=> [["foo"], ["bar"]]

Hmmm, that's sort of ugly.

irb(main):003:0> a = html.scan(/<td>(.+?)<\/td>/).flatten
=> ["foo", "bar"]

Much better.

Ad hoc regexes are fine for quick-n-dirty scripting. But if you're
serious about parsing HTML you might want to look into Hpricot or
Nokogiri.

-Michael Libby



All times are GMT. The time now is 09:15 PM.

Powered by vBulletin®. Copyright ©2000 - 2014, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.