![]() |
LWP user agent grabs the intermediate wait page after POST intead ofthe actual result page
Hi,
I wrote a small LWP based perl program to search the air fare from a travel website using POST. #!/usr/bin/perl use strict; use CGI; use LWP; my $web_browser = LWP::UserAgent->new(); push @{ $web_browser->requests_redirectable }, 'POST'; $web_browser->timeout(300); my $web_response = (); $web_response = $web_browser->post('http://blabla.com/travel/ InitialSearch.do', [ 'fromCity' => 'SFO', 'toCIty' => 'CVG' .... #the rest of the fields occur here ], ); die "Error: ", $web_response->status_line() unless $web_response->is_success; my @content = $web_response->content; print "@content"; When I print the content, I see the "intermediate" wait page (where it displays the progress bar using javascript.... => I matched the content with the "view source" from IExplorer) I am unable to capture the final air fare page. It takes time for the website to do the search and then display the air fare result page. How do I make my program wait for the actual result and not grab the intermediate response. Could anyone please help me on this? Regards, bhabs |
Re: LWP user agent grabs the intermediate wait page after POST inteadof the actual result page
Quoth Christian Winter <thepoet_nospam@arcor.de>: > bhabs wrote: > > I wrote a small LWP based perl program to search the air fare from a > > travel website using POST. > > > [...code snipped] > > > > When I print the content, I see the "intermediate" wait page (where it > > displays the progress bar using javascript.... => I matched the > > content with the "view source" from IExplorer) > > I am unable to capture the final air fare page. It takes time for the > > website to do the search and then display the air fare result page. > > How do I make my program wait for the actual result and not grab the > > intermediate response. > > You have to simulate what the browser does, and from your > description, this is most likely a repeated ajax request > to the server. Analyze the behaviour of the javascript > and see how it fetches the progress state and what it > does once the result is calculated, then craft those > actions yourself. You best chances to see exactly what is going > on in the background is with a network sniffer like wireshark, > or a browser plugin like Firefox' Live HTTP Headers. Or http://www.research.att.com/sw/tools/wsp/ , which will write a Perl script to make the appropriate requests for you. Ben |
Re: LWP user agent grabs the intermediate wait page after POST intead of the actual result page
Christian Winter <thepoet_nospam@arcor.de> wrote:
> bhabs wrote: >> I wrote a small LWP based perl program to search the air fare from a >> travel website using POST. >> > [...code snipped] >> >> When I print the content, I see the "intermediate" wait page (where it >> displays the progress bar using javascript.... => I matched the >> content with the "view source" from IExplorer) >> I am unable to capture the final air fare page. It takes time for the >> website to do the search and then display the air fare result page. >> How do I make my program wait for the actual result and not grab the >> intermediate response. > > You have to simulate what the browser does, and from your > description, this is most likely a repeated ajax request > to the server. Analyze the behaviour of the javascript > and see how it fetches the progress state and what it > does once the result is calculated, then craft those > actions yourself. You best chances to see exactly what is going > on in the background is with a network sniffer like wireshark, I like the Web Scraping Proxy for this, it logs the traffic in the form of LWP Perl code: http://www.research.att.com/sw/tools/wsp/ > or a browser plugin like Firefox' Live HTTP Headers. -- Tad McClellan email: perl -le "print scalar reverse qq/moc.noitatibaher\100cmdat/" |
| All times are GMT. The time now is 04:56 AM. |
Powered by vBulletin®. Copyright ©2000 - 2013, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.