Velocity Reviews

Velocity Reviews (http://www.velocityreviews.com/forums/index.php)
-   Perl Misc (http://www.velocityreviews.com/forums/f67-perl-misc.html)
-   -   LWP user agent grabs the intermediate wait page after POST intead ofthe actual result page (http://www.velocityreviews.com/forums/t906383-lwp-user-agent-grabs-the-intermediate-wait-page-after-post-intead-ofthe-actual-result-page.html)

bhabs 02-12-2008 05:50 AM

LWP user agent grabs the intermediate wait page after POST intead ofthe actual result page
 
Hi,

I wrote a small LWP based perl program to search the air fare from a
travel website using POST.

#!/usr/bin/perl
use strict;
use CGI;
use LWP;

my $web_browser = LWP::UserAgent->new();
push @{ $web_browser->requests_redirectable }, 'POST';
$web_browser->timeout(300);
my $web_response = ();

$web_response = $web_browser->post('http://blabla.com/travel/
InitialSearch.do',
[
'fromCity' =>
'SFO',
'toCIty'
=> 'CVG'
.... #the rest
of the fields occur here
],
);

die "Error: ", $web_response->status_line()
unless $web_response->is_success;

my @content = $web_response->content;
print "@content";

When I print the content, I see the "intermediate" wait page (where it
displays the progress bar using javascript.... => I matched the
content with the "view source" from IExplorer)
I am unable to capture the final air fare page. It takes time for the
website to do the search and then display the air fare result page.
How do I make my program wait for the actual result and not grab the
intermediate response.

Could anyone please help me on this?

Regards,
bhabs

Ben Morrow 02-12-2008 04:08 PM

Re: LWP user agent grabs the intermediate wait page after POST inteadof the actual result page
 

Quoth Christian Winter <thepoet_nospam@arcor.de>:
> bhabs wrote:
> > I wrote a small LWP based perl program to search the air fare from a
> > travel website using POST.
> >

> [...code snipped]
> >
> > When I print the content, I see the "intermediate" wait page (where it
> > displays the progress bar using javascript.... => I matched the
> > content with the "view source" from IExplorer)
> > I am unable to capture the final air fare page. It takes time for the
> > website to do the search and then display the air fare result page.
> > How do I make my program wait for the actual result and not grab the
> > intermediate response.

>
> You have to simulate what the browser does, and from your
> description, this is most likely a repeated ajax request
> to the server. Analyze the behaviour of the javascript
> and see how it fetches the progress state and what it
> does once the result is calculated, then craft those
> actions yourself. You best chances to see exactly what is going
> on in the background is with a network sniffer like wireshark,
> or a browser plugin like Firefox' Live HTTP Headers.


Or http://www.research.att.com/sw/tools/wsp/ , which will write a Perl
script to make the appropriate requests for you.

Ben


Tad J McClellan 02-13-2008 01:55 AM

Re: LWP user agent grabs the intermediate wait page after POST intead of the actual result page
 
Christian Winter <thepoet_nospam@arcor.de> wrote:
> bhabs wrote:
>> I wrote a small LWP based perl program to search the air fare from a
>> travel website using POST.
>>

> [...code snipped]
>>
>> When I print the content, I see the "intermediate" wait page (where it
>> displays the progress bar using javascript.... => I matched the
>> content with the "view source" from IExplorer)
>> I am unable to capture the final air fare page. It takes time for the
>> website to do the search and then display the air fare result page.
>> How do I make my program wait for the actual result and not grab the
>> intermediate response.

>
> You have to simulate what the browser does, and from your
> description, this is most likely a repeated ajax request
> to the server. Analyze the behaviour of the javascript
> and see how it fetches the progress state and what it
> does once the result is calculated, then craft those
> actions yourself. You best chances to see exactly what is going
> on in the background is with a network sniffer like wireshark,



I like the Web Scraping Proxy for this, it logs the traffic in
the form of LWP Perl code:

http://www.research.att.com/sw/tools/wsp/


> or a browser plugin like Firefox' Live HTTP Headers.



--
Tad McClellan
email: perl -le "print scalar reverse qq/moc.noitatibaher\100cmdat/"


All times are GMT. The time now is 04:56 AM.

Powered by vBulletin®. Copyright ©2000 - 2013, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.