Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > How to make a Perl program do concurrent downloading?

Reply
Thread Tools

How to make a Perl program do concurrent downloading?

 
 
Adlene
Guest
Posts: n/a
 
      05-01-2004
Hi, there:

I wrote a program to download 500,000 HTML files from a website, I
have compiled all the links in a file. my grabber.pl will download all of
them...

I have a fast internet connection. I think it is better to run multiple
downloads at
same time, but $INET = new Win32::Internet() only allows one at a
time...what
may I do?

I also found, occassionally the grabber just hang somewhere...In such
situation I
need to bypass $INET->FetchURL($url), write the offending URL in an error
file
and continue on to next iteration...How may I do that?

Best Regards,
Adlene



 
Reply With Quote
 
 
 
 
Bryan Castillo
Guest
Posts: n/a
 
      05-02-2004
"Adlene" <(E-Mail Removed)> wrote in message news:<c6vvmn$ck4$(E-Mail Removed)>...
> Hi, there:
>
> I wrote a program to download 500,000 HTML files from a website, I
> have compiled all the links in a file. my grabber.pl will download all of
> them...


Depending on who owns the Internet site, they may find it rude that
you want to dowload so many files and that you may want to take as
much resources as possible from their web server. Perhaps you should
find a different way of retrieving the data, such as contacting the
web site administrator and tell them what you want to do, they may
give you a tar gzipped file of the site??


>
> I have a fast internet connection. I think it is better to run multiple
> downloads at


It may be better for you, but that is questionable for everyone else.

Here is some information on web robots. You might want to do some
more searching though on web robots.

http://www.phantomsearch.com/usersguide/R04Robot.htm

<from the above URL>

The Four Laws of Web Robotics
Law One: A Web Robot Must Show Identification
Phantom supports this. You can set the "User-Agent" and "From E-Mail"
fields in the preferences dialog. Both of these are reported in the
HTTP header when Phantom makes requests of remote Web servers.

Law Two: A Web Robot Must Obey Exclusion Standard
Phantom fully supports the exclusion standard.

Law Three: A Web Robot Must Not Hog Resources
Phantom only retrieves files it can index (unless mirroring with
binaries option on) and restricts its movement to the path specified
by starting point s. You can also set the minimum time between hits on
the same server. Generally, 60 seconds is considered polite.

For busy sites a greater hit rate may be acceptable, but do not assume
whether a site is "busy" or not— contact the webmaster first. When
crawling your own server, of course, you can set the hit interval to
anything you like, including zero.

Law Four: A Web Robot Must Report Errors
Phantom can show you links that are no longer valid. Please contact
the Webmaster and pass this information on if broken URLs are found.



> same time, but $INET = new Win32::Internet() only allows one at a
> time...what
> may I do?
>
> I also found, occassionally the grabber just hang somewhere...In such
> situation I
> need to bypass $INET->FetchURL($url), write the offending URL in an error
> file
> and continue on to next iteration...How may I do that?
>
> Best Regards,
> Adlene

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Re: How include a large array? Edward A. Falk C Programming 1 04-04-2013 08:07 PM
How to make the each looping concurrent thread to improve WHILE-loopperformance? www Java 13 02-07-2007 05:52 PM
problems locating the concurrent EDU.oswego.cs.dl.util.concurrent package Pep Java 6 08-16-2005 07:26 AM
GNU make & make.pl are dead: long live Perl makepp Daniel Pfeiffer Perl Misc 1 09-09-2003 07:31 AM
How to make the program connect to next IP address without exiting program, if the current connection fail?? Abby C Programming 1 08-29-2003 03:08 AM



Advertisments