![]() |
Capturing actual Browser output in perl
#!/usr/bin/perl
use LWP; my $browser = LWP::UserAgent->new; my $response = $browser->get( "http://lkml.org" ); print( $response->content ); In this program I am trying to get the output as the browser displays it , not the actual HTML page with all the tags .., that $response- >content returns. For a example , this URL , What I want to save in a string is how the browser shows it Last 100 messages Today's messages Yesterday's messages Hottest Messages LKML.ORG NOT what the actual HTML content is: <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> <head> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> <link href="/css/frontpage.css" rel="stylesheet" type="text/css" / > <title>LKML.ORG - the Linux Kernel Mailing List Archive</title> <script type="text/javascript" src="/css/multiline-tooltip.js"></ script> </head> ...... Is there any easy way to achieve this Thanks Digz |
Re: Capturing actual Browser output in perl
digz <Digvijoy.C@gmail.com> wrote:
>#!/usr/bin/perl >use LWP; >my $browser = LWP::UserAgent->new; >my $response = $browser->get( "http://lkml.org" ); >print( $response->content ); > >In this program I am trying to get the output as the browser displays >it , not the actual HTML page with all the tags .., that $response- >>content returns. The way you stated your requirements your best bet is a screen capture tool, because the output of a browser depends not only on the HTML but to a large part on user settings and configurations. Therefore a different rendering tool would have to use the same configuration as the browser and interpret them the same way. >For a example , this URL , > >What I want to save in a string is how the browser shows it But a browser shows a a graphic with different fonts, styles, colors, layouts, tables, .... You cannot save that as a "text string" (unless you incorporate that formatting information in the string, of course, but then it is no longer plain text). >Last 100 messages Today's messages Yesterday's messages >Hottest Messages >LKML.ORG > >NOT > >what the actual HTML content is: >..... >Is there any easy way to achieve this The easiest way to get an approximation of the textual part of the display is to use a text-only browser like e.g. Lynx and redirect its output to a file (Lynx has an option for that). Another way, probably more customizable (what do you intent to do with tool tips? Alternate text and captures for graphics? DHTML? How much JavaScript do you want to run? ...?) is to run the HTML code through an HTML parser and extract those text pieces you are interested in. THere are several parsers on CPAN. |
Re: Capturing actual Browser output in perl
digz wrote:
> #!/usr/bin/perl > use LWP; > my $browser = LWP::UserAgent->new; > my $response = $browser->get( "http://lkml.org" ); > print( $response->content ); > > In this program I am trying to get the output as the browser displays > it , not the actual HTML page with all the tags .., that $response- > content returns. You may want to check out: http://search.cpan.org/dist/html2text/ http://search.cpan.org/perldoc?HTML:...ext::Html2text -- Gunnar Hjalmarsson Email: http://www.gunnar.cc/cgi-bin/contact.pl |
Re: Capturing actual Browser output in perl
In Dread Ink, the Grave Hand of digz Did Inscribe:
> In this program I am trying to get the output as the browser displays > it , not the actual HTML page with all the tags .., that > $response->content returns. I was endeavoring close to the same thing a while back, and I think this was the closest I came: #!/usr/bin/perl # perl wahab4.pl use strict; use warnings; use LWP::Simple; use HTML::Parser; use HTML::FormatText; my ($html, $ascii); $html = get("http://www.co-array.com/"); defined $html or die "Can't fetch HTML from http://www.perl.com/"; $ascii = HTML::FormatText->new->format(parse_html($html)); print $ascii; C:\MinGW\source>perl wahab4.pl Undefined subroutine &main::parse_html called at wahab4.pl line 12. I'm having trouble using the methods that are on cpan. I sure wish every module included a bevy of examples. -- Frank No Child Left Behind is the most ironically named act, piece of legislation since the 1942 Japanese Family Leave Act. ~~ Al Franken, in response to the 2004 SOTU address |
| All times are GMT. The time now is 08:07 AM. |
Powered by vBulletin®. Copyright ©2000 - 2013, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.