Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > Perl HTML::TableExtract Question

Reply
Thread Tools

Perl HTML::TableExtract Question

 
 
Paul
Guest
Posts: n/a
 
      04-17-2005
Hi !

I hope someone can help.

I want to extract data from a table with 2 columns.

A sample of the table can be generated with:-

"http://moneycentral.msn.com/investor/research/sreport.asp?Symbol=ba&QD=1&OP=1&IC=1&Y1=1&CR=1&AF= 1&AIE=1&AIR=1&FRH=1&FRK=1&ISA=1&ISQ=1&BSA=1&BSQ=1& CFA=1&CFQ=1&TYS=1&ITT=1&ITP=1&Type=Equity"

(Sorry about the long URL )

What I want is the field from the top table Labelled - "Tot. Shares Out."

My Current Code is :-

#!/usr/bin/perl -w


use strict;
use HTML::TableExtract;


my $inFile = "/home/mas/development/URLTemp.tmp";
my $te = HTML::TableExtract->new( headers => [ 'Fundamental Data', '*' ]);
$te->parse_file( $inFile );
foreach my $ts ( $te->table_states ) {
foreach my $row ( $ts->rows ) {
print join( ",", @$row, "," ), "\n";
}
}


But this seems to get the table lower down the page. This wouldn't be so
bad as it has the value I need repeated but - "How do I get an
un-labelled column ????"

Any help would be appreciated.

Paul
 
Reply With Quote
 
 
 
 
Paul
Guest
Posts: n/a
 
      04-17-2005
Paul wrote:
> Hi !
>
> I hope someone can help.
>
> I want to extract data from a table with 2 columns.
>
> A sample of the table can be generated with:-
>
> "http://moneycentral.msn.com/investor/research/sreport.asp?Symbol=ba&QD=1&OP=1&IC=1&Y1=1&CR=1&AF= 1&AIE=1&AIR=1&FRH=1&FRK=1&ISA=1&ISQ=1&BSA=1&BSQ=1& CFA=1&CFQ=1&TYS=1&ITT=1&ITP=1&Type=Equity"
>
>
> (Sorry about the long URL )
>
> What I want is the field from the top table Labelled - "Tot. Shares Out."
>
> My Current Code is :-
>
> #!/usr/bin/perl -w
>
>
> use strict;
> use HTML::TableExtract;
>
>
> my $inFile = "/home/mas/development/URLTemp.tmp";
> my $te = HTML::TableExtract->new( headers => [ 'Fundamental Data', '*' ]);
> $te->parse_file( $inFile );
> foreach my $ts ( $te->table_states ) {
> foreach my $row ( $ts->rows ) {
> print join( ",", @$row, "," ), "\n";
> }
> }
>
>
> But this seems to get the table lower down the page. This wouldn't be so
> bad as it has the value I need repeated but - "How do I get an
> un-labelled column ????"
>
> Any help would be appreciated.
>
> Paul

Just a bit more info on this - the ", '*'" doesn't work - in fact it
returns empty data. Without it it assumes that the rows below are what
is wanted and it returns:-

Market Capitalization,,
Earnings/Share,,

The real question is "How do I specify a row with a NULL header ??


 
Reply With Quote
 
 
 
 
Tad McClellan
Guest
Posts: n/a
 
      04-17-2005
Paul <none@none> wrote:

> What I want is the field from the top table Labelled - "Tot. Shares Out."


> my $te = HTML::TableExtract->new( headers => [ 'Fundamental Data', '*' ]);



The headers approach will not work since there are no headers
on the table that contains the data that you are after.


> "How do I get an
> un-labelled column ????"



Positionally.

"Tot. Shares Out." is the 7th column in the 12th row of the table
at depth=2 and count=1.


> Any help would be appreciated.



my $te = HTML::TableExtract->new( depth => 2, count => 1);
my $total_outstanding = ($ts->rows)[11]->[6];


--
Tad McClellan SGML consulting
http://www.velocityreviews.com/forums/(E-Mail Removed) Perl programming
Fort Worth, Texas
 
Reply With Quote
 
Paul
Guest
Posts: n/a
 
      04-17-2005
Tad McClellan wrote:
> Paul <none@none> wrote:
>
>
>>What I want is the field from the top table Labelled - "Tot. Shares Out."

>
>
>>my $te = HTML::TableExtract->new( headers => [ 'Fundamental Data', '*' ]);

>
>
>
> The headers approach will not work since there are no headers
> on the table that contains the data that you are after.
>
>
>
>>"How do I get an
>>un-labelled column ????"

>
>
>
> Positionally.
>
> "Tot. Shares Out." is the 7th column in the 12th row of the table
> at depth=2 and count=1.
>
>
>
>>Any help would be appreciated.

>
>
>
> my $te = HTML::TableExtract->new( depth => 2, count => 1);
> my $total_outstanding = ($ts->rows)[11]->[6];
>
>

Thanks for that Tad !! I got the same answer at about 0230 in the
morning

It seems the page isn't very well constructed.

I spent lots of time looking for the new version of HTML::TableExtract
which is supposed to address rows as well as columns but could only find
fleeting references to it.

Regards.
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
FAQ 2.17 What is perl.com? Perl Mongers? pm.org? perl.org? cpan.org? PerlFAQ Server Perl Misc 0 04-04-2011 10:00 PM
FAQ 1.4 What are Perl 4, Perl 5, or Perl 6? PerlFAQ Server Perl Misc 0 02-27-2011 11:00 PM
FAQ 2.17 What is perl.com? Perl Mongers? pm.org? perl.org? cpan.org? PerlFAQ Server Perl Misc 0 02-03-2011 11:00 AM
FAQ 1.4 What are Perl 4, Perl 5, or Perl 6? PerlFAQ Server Perl Misc 0 01-23-2011 05:00 AM
Perl Help - Windows Perl script accessing a Unix perl Script dpackwood Perl 3 09-30-2003 02:56 AM



Advertisments