Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > A couple of vague LWP questions

Reply
Thread Tools

A couple of vague LWP questions

 
 
Franklin H.
Guest
Posts: n/a
 
      04-25-2005
1) When using LWP::Simple to grab a webpage the GET request
occasionally and irreproducibly appears to hang and does not return.
Any clue as to why this could conceivably occur? There doesn't appear
to be a way to set the request timeout with this particular module but
perhaps someone may know of a workaround?

2) When using LWP::UserAgent to grab the same webpage as above the
webserver somehow seems able to recognizes the request as coming from
an "automated tool". Any idea why this might possibly occure with
LWP::UserAgent but not with LWP::Simple?

TYIA,
Fr.

 
Reply With Quote
 
 
 
 
Franklin H.
Guest
Posts: n/a
 
      04-25-2005
> 2) When using LWP::UserAgent to grab the same webpage as above the
> webserver somehow seems able to recognizes the request as coming from


> an "automated tool". Any idea why this might possibly occure with
> LWP::UserAgent but not with LWP::Simple?


It would appear that the trick here is to set USERAGENt to something
other than the default "libwww-perl/#.##". Arbitrarily I chise:

$ua->agent('Mozilla/5.001');

 
Reply With Quote
 
 
 
 
Franklin H.
Guest
Posts: n/a
 
      04-25-2005

> 2) When using LWP::UserAgent to grab the same webpage as above the
> webserver somehow seems able to recognizes the request as coming from


> an "automated tool". Any idea why this might possibly occure with
> LWP::UserAgent but not with LWP::Simple?


It would appear that the trick here is to set USERAGENT to something
other than the default "libwww-perl/#.##".

Arbitrarily I chose: $ua->agent('Mozilla/5.001');

 
Reply With Quote
 
Brian Wakem
Guest
Posts: n/a
 
      04-25-2005
Franklin H. wrote:

>> 2) When using LWP::UserAgent to grab the same webpage as above the
>> webserver somehow seems able to recognizes the request as coming from

>
>> an "automated tool". Any idea why this might possibly occure with
>> LWP::UserAgent but not with LWP::Simple?

>
> It would appear that the trick here is to set USERAGENt to something
> other than the default "libwww-perl/#.##". Arbitrarily I chise:
>
> $ua->agent('Mozilla/5.001');



If you are trying to blend in with normal traffic then I suggest using -

Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)

- which is IE6 on Windows XP.


The answer to your other question is either use LWP::UserAgent and use the
timeout function provdied ( $ua->timeout( $secs ) ), or use alarm.

eval {
local $SIG{ALRM} = sub { die "timeout" };
alarm $secs;
$response = get($url);
alarm 0;
};
if ($@ =~ m/timeout/) {
# timed out
}



--
Brian Wakem


 
Reply With Quote
 
Franklin H.
Guest
Posts: n/a
 
      04-25-2005
Well I am tryting t9o make this platform independent and as such would
hate to run into problems with $SIG{ALRM} on XP.

Similarly, mightn't "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)
" be suspicious if the request came from a LINUX OS?

 
Reply With Quote
 
Charles DeRykus
Guest
Posts: n/a
 
      04-25-2005
In article <(E-Mail Removed) .com>,
Franklin H. <(E-Mail Removed)> wrote:
>1) When using LWP::Simple to grab a webpage the GET request
>occasionally and irreproducibly appears to hang and does not return.
>Any clue as to why this could conceivably occur? There doesn't appear
>to be a way to set the request timeout with this particular module but
>perhaps someone may know of a workaround?
>


LWP::Simple's is built on LWP::UserAgent so you can import
$ua and invoke a timeout,e.g:

use LWP qw($ua); $ua->timeout(10);

See LWP::Simple doc for discussion of above.

>2) When using LWP::UserAgent to grab the same webpage as above the
>webserver somehow seems able to recognizes the request as coming from
>an "automated tool". Any idea why this might possibly occure with
>LWP::UserAgent but not with LWP::Simple?
>


Some servers may be checking the user agent id. No idea why
LWP::Simple would slip by if that's the case. Again see
LWP::UserAgent vs LWP::Simple docs or how to alter setting.

hth,
--
Charles DeRykus
 
Reply With Quote
 
Mark Clements
Guest
Posts: n/a
 
      04-25-2005
Franklin H. wrote:
> Well I am tryting t9o make this platform independent and as such would
> hate to run into problems with $SIG{ALRM} on XP.
>
> Similarly, mightn't "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)
> " be suspicious if the request came from a LINUX OS?
>

Nah. The remote server only sees an HTTP request: it has no idea from
what type of system the request originated, other than what is in the
HTTP headers.

Mark
 
Reply With Quote
 
Joe Smith
Guest
Posts: n/a
 
      04-25-2005
Charles DeRykus wrote:

> Some servers may be checking the user agent id. No idea why
> LWP::Simple would slip by if that's the case.


perldoc LWP::UserAgent
the default agent identifier is "libwww-perl/#.##"

Line 43 of LWP/Simple.pm
$ua->agent("LWP::Simple/$LWP::VERSION");
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
vague lvalue vs rvalue question Chad C Programming 0 04-16-2008 01:23 AM
Shared Library Exceptions & Vague Linkage akennis C++ 7 07-26-2006 08:30 PM
French New Wave / Nouvelle Vague matt r DVD Video 0 06-16-2005 02:47 PM
search(Object criteria) - to vague? VisionSet Java 4 12-06-2004 02:56 PM
WTB: Godard's New Wave (Nouvelle vague) robert gray DVD Video 3 10-29-2003 06:51 PM



Advertisments