Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > Capturing actual Browser output in perl

Reply
Thread Tools

Capturing actual Browser output in perl

 
 
digz
Guest
Posts: n/a
 
      05-22-2009
#!/usr/bin/perl
use LWP;
my $browser = LWP::UserAgent->new;
my $response = $browser->get( "http://lkml.org" );
print( $response->content );

In this program I am trying to get the output as the browser displays
it , not the actual HTML page with all the tags .., that $response-
>content returns.


For a example , this URL ,

What I want to save in a string is how the browser shows it

Last 100 messages Today's messages Yesterday's messages
Hottest Messages
LKML.ORG

NOT

what the actual HTML content is:

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html;
charset=UTF-8" />
<link href="/css/frontpage.css" rel="stylesheet" type="text/css" /
>

<title>LKML.ORG - the Linux Kernel Mailing List Archive</title>
<script type="text/javascript" src="/css/multiline-tooltip.js"></
script>
</head>
......
Is there any easy way to achieve this

Thanks

Digz
 
Reply With Quote
 
 
 
 
Jürgen Exner
Guest
Posts: n/a
 
      05-22-2009
digz <(E-Mail Removed)> wrote:
>#!/usr/bin/perl
>use LWP;
>my $browser = LWP::UserAgent->new;
>my $response = $browser->get( "http://lkml.org" );
>print( $response->content );
>
>In this program I am trying to get the output as the browser displays
>it , not the actual HTML page with all the tags .., that $response-
>>content returns.


The way you stated your requirements your best bet is a screen capture
tool, because the output of a browser depends not only on the HTML but
to a large part on user settings and configurations.
Therefore a different rendering tool would have to use the same
configuration as the browser and interpret them the same way.

>For a example , this URL ,
>
>What I want to save in a string is how the browser shows it


But a browser shows a a graphic with different fonts, styles, colors,
layouts, tables, ....
You cannot save that as a "text string" (unless you incorporate that
formatting information in the string, of course, but then it is no
longer plain text).

>Last 100 messages Today's messages Yesterday's messages
>Hottest Messages
>LKML.ORG
>
>NOT
>
>what the actual HTML content is:
>.....
>Is there any easy way to achieve this


The easiest way to get an approximation of the textual part of the
display is to use a text-only browser like e.g. Lynx and redirect its
output to a file (Lynx has an option for that).

Another way, probably more customizable (what do you intent to do with
tool tips? Alternate text and captures for graphics? DHTML? How much
JavaScript do you want to run? ...?) is to run the HTML code through an
HTML parser and extract those text pieces you are interested in. THere
are several parsers on CPAN.


 
Reply With Quote
 
 
 
 
Gunnar Hjalmarsson
Guest
Posts: n/a
 
      05-22-2009
digz wrote:
> #!/usr/bin/perl
> use LWP;
> my $browser = LWP::UserAgent->new;
> my $response = $browser->get( "http://lkml.org" );
> print( $response->content );
>
> In this program I am trying to get the output as the browser displays
> it , not the actual HTML page with all the tags .., that $response-
> content returns.


You may want to check out:

http://search.cpan.org/dist/html2text/

http://search.cpan.org/perldoc?HTML:...ext::Html2text

--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl
 
Reply With Quote
 
Franken Sense
Guest
Posts: n/a
 
      05-23-2009
In Dread Ink, the Grave Hand of digz Did Inscribe:

> In this program I am trying to get the output as the browser displays
> it , not the actual HTML page with all the tags .., that
> $response->content returns.


I was endeavoring close to the same thing a while back, and I think this
was the closest I came:

#!/usr/bin/perl
# perl wahab4.pl

use strict;
use warnings;
use LWP::Simple;
use HTML:arser;
use HTML::FormatText;
my ($html, $ascii);
$html = get("http://www.co-array.com/");
defined $html
or die "Can't fetch HTML from http://www.perl.com/";
$ascii = HTML::FormatText->new->format(parse_html($html));
print $ascii;


C:\MinGW\source>perl wahab4.pl
Undefined subroutine &main:arse_html called at wahab4.pl line 12.

I'm having trouble using the methods that are on cpan. I sure wish every
module included a bevy of examples.
--
Frank

No Child Left Behind is the most ironically named act, piece of legislation
since the 1942 Japanese Family Leave Act.
~~ Al Franken, in response to the 2004 SOTU address
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Capturing System Command Output with Perl on Windows/Apache Jasper2000 Perl Misc 2 02-25-2010 03:51 PM
Capturing output form console kaushik.krishnakumar@gmail.com Java 0 02-15-2006 11:02 AM
capturing the output of a JSP Andy Fish Java 4 02-11-2004 03:07 PM
Re: How do I get actual value "&#xa;" in my output file from xsl Andy Fish XML 0 07-30-2003 05:56 PM
Capturing ASPX output from another page Rick Strahl [MVP] ASP .Net 1 07-09-2003 07:34 PM



Advertisments