Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > LWP user agent query

Reply
Thread Tools

LWP user agent query

 
 
P.R.Brady
Guest
Posts: n/a
 
      08-26-2005
I tried my web crawler/link checker on a neighbour's site and found
problems with the button top right entitled 'cymraeg' in this page (and
the same button on others):
http://www.anglesey.gov.uk/english/c...smoke-free.htm

I think I need to extract the url:
http://www.anglesey.gov.uk/cgi-bin/c...nguage=cymraeg
for the get as in the following code but I am getting 404 not found
returned.

Internet Explorer seems very happy with the button and returns the Welsh
version, but Netscape 7 is not entirely happy with it either.

Where is the problem? My hand extraction of the target url, the code
below or an issue in the host?

Regards
Phil



use strict;
use warnings;
use LWP::UserAgent;
use HTTP::Response;
use HTML::TokeParser;

my $referer=
'http://www.anglesey.gov.uk/english/community/health/smoke-free/smoke-free.htm';
my $url=
'http://www.anglesey.gov.uk/cgi-bin/change_language.asp?language=cymraeg';

#open the browser
my $browser = LWP::UserAgent->new;
$browser->timeout(30);

my $response = $browser->get($url,
Referer => $referer,
'User-Agent' => 'Mozilla/7. [en] (Win98; U)',
'Accept' => 'text/html, image/gif, image/x-xbitmap,
image/jpeg, image/pjpeg, image/png, */*',
'Accept-Charset' => 'ISO-8859-1, *, utf-8',
'Accept-Language' => 'cy, en, en-GB',
'media-range' => '*/*',
'max-redirect' => '70',
);

my $status= $response->status_line;

print "Status=$status\n";

my $base = $response->base;
print "Base=$base\n";
if ($response->is_success) {
print "Show data?";
$_= <STDIN>;
if (/y/i){
my $doc = $response -> content;
print "$doc\n";
}
}
exit;

 
Reply With Quote
 
 
 
 
A. Sinan Unur
Guest
Posts: n/a
 
      08-26-2005
"P.R.Brady" <(E-Mail Removed)> wrote in
news:(E-Mail Removed):

> I tried my web crawler/link checker on a neighbour's site and found
> problems with the button top right entitled 'cymraeg' in this page
> (and the same button on others):
> http://www.anglesey.gov.uk/english/c...ke-free/smoke-
> free.htm
>
> I think I need to extract the url:
> http://www.anglesey.gov.uk/cgi-bin/change_language.asp?
> language=cymraeg
> for the get as in the following code but I am getting 404 not found
> returned.
>
> Internet Explorer seems very happy with the button and returns the
> Welsh version, but Netscape 7 is not entirely happy with it either.


Clicking on the link in Firefox re-directs me to http://www.cos.com/

I am inclined to think this is a case of either bad HTML or bad ASP
programming, and thus off-topic here.

Sinan
--
A. Sinan Unur <(E-Mail Removed)>
(reverse each component and remove .invalid for email address)

comp.lang.perl.misc guidelines on the WWW:
http://mail.augustmail.com/~tadmc/cl...uidelines.html
 
Reply With Quote
 
 
 
 
Alan J. Flavell
Guest
Posts: n/a
 
      08-26-2005
On Fri, 26 Aug 2005, P.R.Brady wrote:

> I tried my web crawler/link checker on a neighbour's site and found problems
> with the button top right entitled 'cymraeg' in this page (and the same button
> on others):
> http://www.anglesey.gov.uk/english/c...smoke-free.htm


As soon as I click it, my browser throws an alert telling me that
the site wants to set a cookie.
However, even if I respond by allowing session cookies, I get an
error alert, telling me that "community could not be found".

> Internet Explorer seems very happy with the button and returns the Welsh
> version, but Netscape 7 is not entirely happy with it either.


That sounds ominouosly like the all too prevalent situation of a web
page that's been designed to work only with the operating system
compoment that thinks it's a browser, but not with a www-compatible
client agent.

> I think I need to extract the url:
> http://www.anglesey.gov.uk/cgi-bin/c...nguage=cymraeg
> for the get as in the following code but I am getting 404 not found
> returned.


You've worked that out from the 'form method="GET" ...' which is used
to implement this switch, right?

Here's how their server seems to respond to that URL:


HTTP/1.1 302 Object moved
Connection: close
Date: Fri, 26 Aug 2005 14:48:58 GMT
Server: Microsoft-IIS/6.0
MicrosoftOfficeWebServer: 5.0_Pub
X-Powered-By: ASP.NET
Location: //
Content-Length: 123
Content-Type: text/html
Set-Cookie: ASPSESSIONIDSCTBSRDA=HDKPDDIDBPOGDPJLBCCGGGOL; path=/
Cache-control: private


That "Location:" looks meaningless to me. The HTTP specification
demands an absolute URL to be returned on a Location: header, and that
most certainly ain't one. Whatever a client agent would do in
response to it would seem to be in the nature of an error fixup, and
there's no reason to suppose clients would perform the same fix as
each other.

You might consider running LWP without automatically resolving
redirections, so that you get control back as soon as this code 302
response is returned, and try to fix this up yourself, if MSIE has
given you some clue about where it's supposed to go. You'll need to
have cookie handling enabled, too, of course. Sorry, I haven't tried
this at all - it's just a suggestion.


<rant>
It's bad enough that the source of the above web page has a DOCTYPE
that makes it look like HTML/2.0, which it clearly is not: but there's
a META that says it was extruded by Microsoft FrontPage 5.0, so the
likelihood of it working with anything that's WWW-compatible does not
seem too high...
</>

 
Reply With Quote
 
P.R.Brady
Guest
Posts: n/a
 
      08-26-2005
P.R.Brady wrote:
> I tried my web crawler/link checker on a neighbour's site ..


many thanks both. Set my mind at rest!

Phil

 
Reply With Quote
 
P.R.Brady
Guest
Posts: n/a
 
      08-26-2005
Alan J. Flavell wrote:
> On Fri, 26 Aug 2005, P.R.Brady wrote:
>
>
>>I tried my web crawler/link checker on a neighbour's site and found problems
>>with the button top right entitled 'cymraeg' in this page (and the same button
>>on others):
>>http://www.anglesey.gov.uk/english/c...smoke-free.htm

>
>
> As soon as I click it, my browser throws an alert telling me that
> the site wants to set a cookie.
> However, even if I respond by allowing session cookies, I get an
> error alert, telling me that "community could not be found".
>
>
>>Internet Explorer seems very happy with the button and returns the Welsh
>>version, but Netscape 7 is not entirely happy with it either.

>
>
> That sounds ominouosly like the all too prevalent situation of a web
> page that's been designed to work only with the operating system
> compoment that thinks it's a browser, but not with a www-compatible
> client agent.
>
>
>>I think I need to extract the url:
>>http://www.anglesey.gov.uk/cgi-bin/c...nguage=cymraeg
>>for the get as in the following code but I am getting 404 not found
>>returned.

>
>
> You've worked that out from the 'form method="GET" ...' which is used
> to implement this switch, right?
>



That's right, but IE shows
http://www.anglesey.gov.uk/cgi-bin/c...raeg&x=26&y=11
in it's url bar after successfully extracting the Welsh page. Adding
the x and y don't help the perl reader.

We're no fans of IE and MS web products here either.

Phil

 
Reply With Quote
 
Brian Wakem
Guest
Posts: n/a
 
      08-26-2005
P.R.Brady wrote:

> I tried my web crawler/link checker on a neighbour's site and found
> problems with the button top right entitled 'cymraeg' in this page (and
> the same button on others):
>

http://www.anglesey.gov.uk/english/c...smoke-free.htm
>
> I think I need to extract the url:
> http://www.anglesey.gov.uk/cgi-bin/c...nguage=cymraeg
> for the get as in the following code but I am getting 404 not found
> returned.
>
> Internet Explorer seems very happy with the button and returns the Welsh
> version, but Netscape 7 is not entirely happy with it either.
>
> Where is the problem? My hand extraction of the target url, the code
> below or an issue in the host?
>
> Regards



All UK government website are poorly written by 10-a-penny frontpage
monkeys. I had the misfortune of automating some processes through one
particular government website. They told me before I started that the site
would only work in IE. Well it didn't work very well in IE and produced
random errors all over the place. Eventually I gave up and told them to
fix their site before I would try again.



--
Brian Wakem
Email: http://homepage.ntlworld.com/b.wakem/myemail.png
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
LWP user agent grabs the intermediate wait page after POST intead ofthe actual result page bhabs Perl Misc 2 02-13-2008 01:55 AM
link to download forte free agent 3.3 ( freeware ) news reader agent jameshanley39@yahoo.co.uk Computer Information 1 07-27-2007 12:23 AM
OT: Paging Agent Jar, Agent Briscobar FrisbeeŽ MCSE 0 02-08-2007 07:41 PM
Dhcp Relay Agent And Acl On Sw 3750, DHCP Relay Agent and ACL on Sw 3750 Vimokh Cisco 3 09-06-2006 02:16 AM
LWP User Agent/HTTP Request help needed! Bumble Perl Misc 2 02-28-2004 11:13 PM



Advertisments