Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Error downloading page, some pages work great but cant seem to get this one

Thread Tools

Error downloading page, some pages work great but cant seem to get this one

Jack Schafer
Posts: n/a
I am trying to download the source code for an array of differant
websites, usually i will get something like this from

HTTP/1.1 200 OK
Date: Fri, 23 Apr 2004 00:04:54 GMT
Server: Apache/1.3.27 (Unix) Resin/2.1.s030505 mod_ssl/2.8.14
Last-Modified: Thu, 22 Apr 2004 07:05:10 GMT
ETag: "182ba6-9d7b-40876ea6"
Accept-Ranges: bytes
Content-Length: 40315
Connection: close
Content-Type: text/html

then the whole html page prints

the problem occurs when i try the same thing on i
get the following header:

HTTP/1.1 200 OK
Date: Fri, 23 Apr 2004 00:16:49 GMT
Server: Apache/1.3.29 (Unix) (Gentoo/Linux)
Connection: close
Content-Type: text/html

with out the page attatched.
I was wondering if you had any ideas on why i cant access the page,
and any suggestions as to how i should do it. Right now i am using the
following code:

use IO::Socket::INET;
my $host = $_[0];
my $get = $_[1];
my $port= 80;
my $protocol = "tcp";
my $socket;
my @page;
$socket = IO::Socket::INET->new(PeerAddr => $host, PeerPort => $port,
Proto => $protocol) or die "Could not connect\n";
#sends request
$socket->send("GET $get HTTP/1.0\nHOST: $host\n\n");
#recieve desired file
Reply With Quote
Joe Smith
Posts: n/a
Jack Schafer wrote:
> the problem occurs when i try the same thing on
> $socket->send("GET $get HTTP/1.0\nHOST: $host\n\n");
> @page=<$socket>;

1) You're doing it the hard way. Use the LWP modules instead.
2) Because of 1, you're not sending all of the HTTP headers the
web server wants to see.

According to the Web Scraping Proxy (
you'll need to store and send cookies, and execute javascript.

# Request:
$request = new HTTP::Request('GET' => "");
# Set-Cookie: koc_session=ea30aa58e36; path=/;
# Set-Cookie: security_hash=323466; expires=Sun, 23-May-2004 08:17:26 GMT;
# Set-Cookie: cookie_hash=801f782dce8147; path=/

3) Post to comp.lang.perl.misc (instead of comp.lang.perl) next time.
Reply With Quote

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off

Similar Threads
Thread Thread Starter Forum Replies Last Post
cant seem to get system() & pipe to work together Ruby 10 10-31-2007 11:26 PM
Cant seem to index aspx pages ASP .Net 0 05-13-2007 03:15 PM
cant compile on linux system.cant compile on cant compile onlinux system. Nagaraj C++ 1 03-01-2007 11:18 AM
i cant seem to work out how to do this...think you could help ben HTML 11 08-27-2005 12:40 AM
Hi I am new to asp i can not get it to work on xp pro sp2 even though the localhost work but asp pages dont so can some one help craig dicker ASP .Net 9 07-07-2005 11:52 AM