Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Error downloading page, some pages work great but cant seem to get this one

Reply
Thread Tools

Error downloading page, some pages work great but cant seem to get this one

 
 
Jack Schafer
Guest
Posts: n/a
 
      04-23-2004
I am trying to download the source code for an array of differant
websites, usually i will get something like this from Dilbert.com:

HTTP/1.1 200 OK
Date: Fri, 23 Apr 2004 00:04:54 GMT
Server: Apache/1.3.27 (Unix) Resin/2.1.s030505 mod_ssl/2.8.14
OpenSSL/0.9.7b
Last-Modified: Thu, 22 Apr 2004 07:05:10 GMT
ETag: "182ba6-9d7b-40876ea6"
Accept-Ranges: bytes
Content-Length: 40315
Connection: close
Content-Type: text/html


then the whole html page prints
.....


the problem occurs when i try the same thing on www.kingsofchaos.com i
get the following header:

HTTP/1.1 200 OK
Date: Fri, 23 Apr 2004 00:16:49 GMT
Server: Apache/1.3.29 (Unix) (Gentoo/Linux)
Connection: close
Content-Type: text/html

with out the page attatched.
I was wondering if you had any ideas on why i cant access the page,
and any suggestions as to how i should do it. Right now i am using the
following code:


use IO::Socket::INET;
my $host = $_[0];
my $get = $_[1];
my $port= 80;
my $protocol = "tcp";
my $socket;
my @page;
$socket = IO::Socket::INET->new(PeerAddr => $host, PeerPort => $port,
Proto => $protocol) or die "Could not connect\n";
#sends request
$socket->send("GET $get HTTP/1.0\nHOST: $host\n\n");
#recieve desired file
@page=<$socket>;
 
Reply With Quote
 
 
 
 
Joe Smith
Guest
Posts: n/a
 
      04-23-2004
Jack Schafer wrote:
> the problem occurs when i try the same thing on www.kingsofchaos.com
> $socket->send("GET $get HTTP/1.0\nHOST: $host\n\n");
> @page=<$socket>;


1) You're doing it the hard way. Use the LWP modules instead.
2) Because of 1, you're not sending all of the HTTP headers the
web server wants to see.

According to the Web Scraping Proxy (http://www.research.att.com/~hpk/wsp/)
you'll need to store and send cookies, and execute javascript.

# Request: http://www.kingsofchaos.com/
$request = new HTTP::Request('GET' => "http://www.kingsofchaos.com/");
# Set-Cookie: koc_session=ea30aa58e36; path=/; domain=www.kingsofchaos.com
# Set-Cookie: security_hash=323466; expires=Sun, 23-May-2004 08:17:26 GMT;
path=/; domain=.kingsofchaos.com
# Set-Cookie: cookie_hash=801f782dce8147; path=/

3) Post to comp.lang.perl.misc (instead of comp.lang.perl) next time.
-Joe
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
cant seem to get system() & pipe to work together dtown22@gmail.com Ruby 10 10-31-2007 11:26 PM
Cant seem to index aspx pages ThatsIT.net.au ASP .Net 0 05-13-2007 03:15 PM
cant compile on linux system.cant compile on cant compile onlinux system. Nagaraj C++ 1 03-01-2007 11:18 AM
i cant seem to work out how to do this...think you could help ben HTML 11 08-27-2005 12:40 AM
Hi I am new to asp i can not get it to work on xp pro sp2 even though the localhost work but asp pages dont so can some one help craig dicker ASP .Net 9 07-07-2005 11:52 AM



Advertisments