In article
<233f66ab-b5eb-449d-b3b0->,
shankar_perl_rookie <> wrote:
> Hello All,
>
> I have an html file where I am trying to extract a table. The problem
> I am facing is there are lot of tables in the page and the table I am
> looking to extract appears after a particular string say $some_text. I
> know of a way that I can search for the string in the html page but
> what I want to do is capture a table that immediately follows the
> $some_text.
>
> Any suggestions on how to do this ??
The most reliable way would be to use the HTML:

arser module to parse
the html file, register appropriate handlers for the table elements
(<table>, <tr>, <td>) and one for text elements, look for your string,
and process the next table encountered in a callback (handler
subroutines are called as callbacks by the parsing method).
Another way would be to use a module to extract tables from HTML. There
are at least two on CPAN: HTML::TableExtract and HTML::TableParser. The
problem using these is to find the table after the specified text. Is
there some other way of identifying the table?
The quick and dirty way is to use a regular expression (untested):
if( $html =~ m{ $some_text .*? <table> (.*?) </table> }isx ) {
# table contents in $1
}
However, this will not always work. It fails if you have nested tables,
for example, which is a common occurrence in some HTML. However, if you
are in a hurry it might work for you. It is always better to use a real
parser for HTML.
--
Jim Gibson