Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > Speeding up glob?

Reply
Thread Tools

Speeding up glob?

 
 
Jim
Guest
Posts: n/a
 
      04-25-2005
Hi

I have a very simple perl program that runs _very_ slowly. Here's my
code:

#!/usr/local/bin/perl
#
# script to keep only a weeks worth of files
#
use File::stat;

$time = time;

# get list of all files in the backup directory
@files = glob ("/backup/output.log*");

unless (@files[0]) {
print "No files to process\n";
exit;
}

while (<@files>) {
$filename = $_;
$st = stat($_);

$mod_time = $time - $st->mtime;

# if file edit time is greater than x days, delete the file
# 1440 minutes in a day
# 86400 seconds in a day
# 604800 seconds in a week
# 2419200 seconds in a month
# 7257600 seconds in 90 days

if ($mod_time > 7257600) {
print "Deleting file $filename\n";
unlink ($filename);
}
else {
#do nothing
}
}

There are several thousand files (~21K) in this directory and many
thousands of those files fit the criteria to delete. It takes a really
long time to run this program. What's the holdup? Is it glob? My OS
(Solaris ? IO? Any way to speed this up? Thanks.

Jim
 
Reply With Quote
 
 
 
 
Mark Clements
Guest
Posts: n/a
 
      04-25-2005
Jim wrote:
> Hi
>
> I have a very simple perl program that runs _very_ slowly. Here's my
> code:
>
> #!/usr/local/bin/perl
> #
> # script to keep only a weeks worth of files
> #

You need to run with strict and warnings turned on. Please read the
posting guidelines (subject "Posting Guidelines for
comp.lang.perl.misc"), which are posted regularly.

> use File::stat;
>
> $time = time;
>
> # get list of all files in the backup directory
> @files = glob ("/backup/output.log*");
>
> unless (@files[0]) {
> print "No files to process\n";
> exit;
> }
>
> while (<@files>) {
> $filename = $_;
> $st = stat($_);

<snip>

> There are several thousand files (~21K) in this directory and many
> thousands of those files fit the criteria to delete. It takes a really
> long time to run this program. What's the holdup? Is it glob? My OS
> (Solaris ? IO? Any way to speed this up? Thanks.

Solaris doesn't (or didn't - I stand open to correction) perform very
well with large directories on ufs. How long does eg ls take to complete
in this directory?

Secondly, you can benchmark your programs using a number of different
methods to work out where bottlenecks are. check out

Benchmark::Timer
Devel:Prof

regards,

Mark
 
Reply With Quote
 
 
 
 
J. Gleixner
Guest
Posts: n/a
 
      04-25-2005
Jim wrote:

> There are several thousand files (~21K) in this directory and many
> thousands of those files fit the criteria to delete. It takes a really
> long time to run this program. What's the holdup? Is it glob? My OS
> (Solaris ? IO? Any way to speed this up? Thanks.


Using File::Find or readdir, and process & unlinking each one, if it
passes your test, would probably be better. Similar to reading &
processing each line of a file, compared to slurping in the entire file
and then iterating through each line.

The "fastest" option would be to just used find (man find), and you'll
probably need to use xargs also (man xargs).
 
Reply With Quote
 
xhoster@gmail.com
Guest
Posts: n/a
 
      04-25-2005
Jim <(E-Mail Removed)> wrote:
> Hi
>
> I have a very simple perl program that runs _very_ slowly. Here's my
> code:
>

....
> @files = glob ("/backup/output.log*");
>
> while (<@files>) {


You are double globbing. Don't do that.

Xho

--
-------------------- http://NewsReader.Com/ --------------------
Usenet Newsgroup Service $9.95/Month 30GB
 
Reply With Quote
 
Tintin
Guest
Posts: n/a
 
      04-25-2005

"Jim" <(E-Mail Removed)> wrote in message
news:(E-Mail Removed)...
> Hi
>
> I have a very simple perl program that runs _very_ slowly. Here's my
> code:
>
> #!/usr/local/bin/perl
> #
> # script to keep only a weeks worth of files
> #
> use File::stat;
>
> $time = time;
>
> # get list of all files in the backup directory
> @files = glob ("/backup/output.log*");
>
> unless (@files[0]) {
> print "No files to process\n";
> exit;
> }
>
> while (<@files>) {
> $filename = $_;
> $st = stat($_);
>
> $mod_time = $time - $st->mtime;
>
> # if file edit time is greater than x days, delete the file
> # 1440 minutes in a day
> # 86400 seconds in a day
> # 604800 seconds in a week
> # 2419200 seconds in a month
> # 7257600 seconds in 90 days
>
> if ($mod_time > 7257600) {
> print "Deleting file $filename\n";
> unlink ($filename);
> }
> else {
> #do nothing
> }
> }
>
> There are several thousand files (~21K) in this directory and many
> thousands of those files fit the criteria to delete. It takes a really
> long time to run this program. What's the holdup? Is it glob? My OS
> (Solaris ? IO? Any way to speed this up? Thanks.


The bottleneck is mostly going to be OS & I/O, but you could try

find /backups -name "output.log*" -mtime +7 | xargs rm -f



 
Reply With Quote
 
peter pilsl
Guest
Posts: n/a
 
      04-25-2005
Jim wrote:
> Hi
>
> I have a very simple perl program that runs _very_ slowly. Here's my
> code:
>
> #!/usr/local/bin/perl
> #
> # script to keep only a weeks worth of files
> #
> use File::stat;
>
> $time = time;
>
> # get list of all files in the backup directory
> @files = glob ("/backup/output.log*");
>
> unless (@files[0]) {
> print "No files to process\n";
> exit;
> }
>
> while (<@files>) {
> $filename = $_;
> $st = stat($_);
>
> $mod_time = $time - $st->mtime;
>
> # if file edit time is greater than x days, delete the file
> # 1440 minutes in a day
> # 86400 seconds in a day
> # 604800 seconds in a week
> # 2419200 seconds in a month
> # 7257600 seconds in 90 days
>
> if ($mod_time > 7257600) {
> print "Deleting file $filename\n";
> unlink ($filename);
> }
> else {
> #do nothing
> }
> }
>
> There are several thousand files (~21K) in this directory and many
> thousands of those files fit the criteria to delete. It takes a really
> long time to run this program. What's the holdup? Is it glob? My OS
> (Solaris ? IO? Any way to speed this up? Thanks.
>
> Jim



just for comparison:

I just wrote a small script that creates 20k empty files and get the
stat of the files and deletes them again.
Its pretty fast on my machine: a linux 2.4.x on Athlon1800XP with 1GB
Ram and IDE with softwareraid level1 and loads of daemons running on it.
So definitely not a machine with fast I/O.


# time ./p.pl create
0.18user 0.71system 0:00.88elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (357major+76minor)pagefaults 0swaps

# time ./p.pl delete
0.12user 1.18system 0:01.29elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (364major+820minor)pagefaults 0swaps

So its not the globbing, but maybe the double-globbing as Xho pointed
out already !!


Try the following on your machine:


#!/usr/bin/perl -w
use strict;

if ($ARGV[0]=~/create/) {
foreach (0..20000) {
open (FH,">x$_"); close FH;
}
}

if ($ARGV[0]=~/delete/) {
my @files=glob ("x*");
foreach(@files) {
stat($_);
unlink($_);
}
}


best,
peter

--
http://www.goldfisch.at/know_list
 
Reply With Quote
 
Ala Qumsieh
Guest
Posts: n/a
 
      04-25-2005
Jim wrote:

> I have a very simple perl program that runs _very_ slowly. Here's my
> code:


> unless (@files[0]) {


This works. But you probably meant:

unless (@files) {

or
unless ($files[0]) {

Type this for more info on the diff between $files[0] and @files[0]:

perldoc -q 'difference.*\$array'

> while (<@files>) {


This is doing much more work than you think it is. Change it to:

foreach (@files) {

--Ala
 
Reply With Quote
 
Tad McClellan
Guest
Posts: n/a
 
      04-25-2005
Jim <(E-Mail Removed)> wrote:


> unless (@files[0]) {



You should always enable warnings when developing Perl code!


> # if file edit time is greater than x days, delete the file
> # 1440 minutes in a day
> # 86400 seconds in a day
> # 604800 seconds in a week
> # 2419200 seconds in a month



You do not need "in a week" nor "in a month".

You already have how many in a day, multiply by 90 to get how
many are in 90 days.


> # 7257600 seconds in 90 days



Wrong answer...


> if ($mod_time > 7257600) {



if ($mod_time > 60 * 60 * 24 * 90) {


Perl will constant-fold it for you.


> There are several thousand files (~21K) in this directory



Then the largest bottleneck is probably the OS and filesystem,
not the programming language (though your algorithm seems
sub-optimal too).


> It takes a really
> long time to run this program. What's the holdup?



There are several thousand files (~21K) in that directory.


--
Tad McClellan SGML consulting
http://www.velocityreviews.com/forums/(E-Mail Removed) Perl programming
Fort Worth, Texas
 
Reply With Quote
 
Joe Smith
Guest
Posts: n/a
 
      04-26-2005
Jim wrote:

> I have a very simple perl program that runs _very_ slowly.


You posted this question earlier and have already gotten and
answer. Why are you not accepting the answers already given?

> while (<@files>) {


Big error right there. '<' and '>' are *not* appropriate here.

> $st = stat($_);
> $mod_time = $time - $st->mtime;
> # 1440 minutes in a day


Why are you doing that way? Have you not heard of -M()>

if (-M $_ > 7 ) { print 'File $_ is older than 7.000 days\n"; }

-Joe
 
Reply With Quote
 
Jim
Guest
Posts: n/a
 
      04-26-2005
In article <ITcbe.1259$(E-Mail Removed)>,
(E-Mail Removed) says...

>
> Type this for more info on the diff between $files[0] and @files[0]:
>
> perldoc -q 'difference.*\$array'
>
> > while (<@files>) {

>
> This is doing much more work than you think it is. Change it to:
>
> foreach (@files) {
>



Changing my while to a foreach has sped up the program considerably.
Thanks to those for the help.

Jim
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
[ot]Valid excuse for speeding? JaR MCSE 9 01-05-2005 08:23 PM
speeding up data transfer? Devin Panchal Wireless Networking 1 09-06-2004 05:46 PM
Need some hints on speeding up Spamtrap Perl 1 08-11-2004 11:25 PM
Speeding up pages OHM ASP .Net 2 05-24-2004 03:24 PM
Speeding up page display Troy ASP .Net 2 01-22-2004 09:13 PM



Advertisments