![]() |
|
|
|||||||
![]() |
PERL - [TABLE NOT SHOWN] problem with HTML::Parse |
|
|
Thread Tools | Search this Thread |
|
|
#1 |
|
When I run the well quoted line:
my $ascii = HTML::FormatText->new->format(HTML: to remove HTML tags from an html document, it replaces all tables with "[TABLE NOT SHOWN]". Is there a quick and easy way to get the table content parsed too? Thanks a lot, Mitchua Mitchua |
|
|
|
|
#2 |
|
Posts: n/a
|
"Mitchua" <> wrote in message news:EJiPa.115702$ ble.rogers.com... > When I run the well quoted line: > my $ascii = > HTML::FormatText->new->format(HTML: > to remove HTML tags from an html document, it replaces all tables with > "[TABLE NOT SHOWN]". Is there a quick and easy way to get the table content > parsed too? > The documentation for HTML::FormatText states: "Formatting of HTML tables and forms is not implemented." So not with that module. The documentation makes a reference to HTML::Formatter (http://search.cpan.org/author/SBURKE...L/Formatter.pm ), which in turn contains references to other modules that may be of some help. |
|
|
|
#3 |
|
Posts: n/a
|
"Mitchua" <> wrote in message news:YRHPa.6477$ .rogers.com... > > Are there any other (easy) ways to remove all html tags (including tricky > tags like comments, etc.) from a web page without using those modules? I'm > looking for a solution beyond a regular expression. > "Easy": no. That's why we have all those modules in the HTML section of CPAN -- the solution is always difficult, messy and "beyond a regular expression." I note that in your OP you used HTML: indicates that it is deprecated. Have you looked into HTML: speak highly of that module. |
|
|
|
#4 |
|
Posts: n/a
|
"James E Keenan" <> wrote in message
news:beovoq$... > > "Mitchua" <> wrote in message > news:YRHPa.6477$ .rogers.com... > > > > Are there any other (easy) ways to remove all html tags (including tricky > > tags like comments, etc.) from a web page without using those modules? > I'm > > looking for a solution beyond a regular expression. > > > "Easy": no. That's why we have all those modules in the HTML section of > CPAN -- the solution is always difficult, messy and "beyond a regular > expression." > > I note that in your OP you used HTML: this > indicates that it is deprecated. Have you looked into HTML: People > speak highly of that module. > I found this code on the web that uses it: use HTML: $p = HTML: $p->parse($notes); # parse the HTML in notes $p->eof; # signal end of parse file print $p->as_string; # print out the parsed text but i get the error "Can't locate ../HTML/Parser/as_string.al". I'm looking for that file now. Jonathan |
|