Go Back   Velocity Reviews > Newsgroups > PERL
User Name
Password
Register FAQ Members List Calendar Search Today's Posts Mark Forums Read

Reply

PERL - [TABLE NOT SHOWN] problem with HTML::Parse

 
Thread Tools Search this Thread
Old 07-10-2003, 08:00 PM   #1
Default [TABLE NOT SHOWN] problem with HTML::Parse


When I run the well quoted line:
my $ascii =
HTML::FormatText->new->format(HTML:arse:arse_html($html));
to remove HTML tags from an html document, it replaces all tables with
"[TABLE NOT SHOWN]". Is there a quick and easy way to get the table content
parsed too?

Thanks a lot,
Mitchua




Mitchua
  Reply With Quote
Old 07-11-2003, 02:05 AM   #2
James E Keenan
 
Posts: n/a
Default Re: [TABLE NOT SHOWN] problem with HTML::Parse


"Mitchua" <> wrote in message
news:EJiPa.115702$ ble.rogers.com...
> When I run the well quoted line:
> my $ascii =
> HTML::FormatText->new->format(HTML:arse:arse_html($html));
> to remove HTML tags from an html document, it replaces all tables with
> "[TABLE NOT SHOWN]". Is there a quick and easy way to get the table

content
> parsed too?
>

The documentation for HTML::FormatText states: "Formatting of HTML tables
and forms is not implemented." So not with that module. The documentation
makes a reference to HTML::Formatter
(http://search.cpan.org/author/SBURKE...L/Formatter.pm
), which in turn contains references to other modules that may be of some
help.


  Reply With Quote
Old 07-12-2003, 01:43 PM   #3
James E Keenan
 
Posts: n/a
Default Re: [TABLE NOT SHOWN] problem with HTML::Parse


"Mitchua" <> wrote in message
news:YRHPa.6477$ .rogers.com...
>
> Are there any other (easy) ways to remove all html tags (including tricky
> tags like comments, etc.) from a web page without using those modules?

I'm
> looking for a solution beyond a regular expression.
>

"Easy": no. That's why we have all those modules in the HTML section of
CPAN -- the solution is always difficult, messy and "beyond a regular
expression."

I note that in your OP you used HTML:arse. The 1-line description of this
indicates that it is deprecated. Have you looked into HTML:arser? People
speak highly of that module.


  Reply With Quote
Old 07-14-2003, 12:38 AM   #4
Mitchua
 
Posts: n/a
Default Re: [TABLE NOT SHOWN] problem with HTML::Parse

"James E Keenan" <> wrote in message
news:beovoq$...
>
> "Mitchua" <> wrote in message
> news:YRHPa.6477$ .rogers.com...
> >
> > Are there any other (easy) ways to remove all html tags (including

tricky
> > tags like comments, etc.) from a web page without using those modules?

> I'm
> > looking for a solution beyond a regular expression.
> >

> "Easy": no. That's why we have all those modules in the HTML section of
> CPAN -- the solution is always difficult, messy and "beyond a regular
> expression."
>
> I note that in your OP you used HTML:arse. The 1-line description of

this
> indicates that it is deprecated. Have you looked into HTML:arser?

People
> speak highly of that module.
>


I found this code on the web that uses it:

use HTML:arser;
$p = HTML:arser->new;
$p->parse($notes); # parse the HTML in notes
$p->eof; # signal end of parse file
print $p->as_string; # print out the parsed text

but i get the error "Can't locate ../HTML/Parser/as_string.al". I'm looking
for that file now.

Jonathan


  Reply With Quote
Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump