Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > Trim Multiple Dirs to Max Total Space Used - by Date

Reply
Thread Tools

Trim Multiple Dirs to Max Total Space Used - by Date

 
 
Ron Heiby
Guest
Posts: n/a
 
      06-25-2004
Hi! I've done a lot of FAQ reading and Google-ing and reading in O'Reilly books, but
I'm still stuck.

I have a system where data files are created in multiple directories. I need to run a
daily script that will total the disk space used by all the files in all the
directories and see whether the space exceeds some MAXSPACE value. In this case, all
but one of the directories are subdirectories of a common parent dir, while the other
one is off on its own. If the space does exceed the maximum, I need to start deleting
files, oldest first, until the total space used drops just below the maximum.

I've been looking at File::Find, and File::stat, among others, but don't quite see how
this all can be hung together to accomplish this seemingly simple task.

Any help would be much appreciated. Thanks!

P.S. I'll be looking for responses here. If using Email, remove the "_u" from my name
to avoid getting shuffled into an infrequently perused mailbox.

--
Ron.
 
Reply With Quote
 
 
 
 
Jürgen Exner
Guest
Posts: n/a
 
      06-25-2004
Ron Heiby wrote:
> Hi! I've done a lot of FAQ reading and Google-ing and reading in
> O'Reilly books, but I'm still stuck.
>
> I have a system where data files are created in multiple directories.
> I need to run a daily script that will total the disk space used by
> all the files in all the directories and see whether the space
> exceeds some MAXSPACE value. In this case, all but one of the
> directories are subdirectories of a common parent dir, while the
> other one is off on its own. If the space does exceed the maximum, I
> need to start deleting files, oldest first, until the total space
> used drops just below the maximum.
>
> I've been looking at File::Find, and File::stat, among others, but
> don't quite see how this all can be hung together to accomplish this
> seemingly simple task.


I would attack the problem in four steps:

First loop through all the directories to create an internal array of all
files which you are interested in. Forget File::Find, you don't need it
because you already have the comprehensive list of all directories.
For your purposes a file consists of the name including the full path, the
file size, and the date.
The obvious data structure would be an array of hash where each hash
contains three items, namely the qualified file name, the size, and the
date.

In step two you simply add all the sizes to determine your total used space.
Or you can do that while collecting the files in step 1 already.

Then sort the array by the date element.

And then beginning with the oldest file delete files (you got the fully
qualified name in the hash) until the added size of all deleted files is
larger than the difference between desired size and actual size as
determined in step 2.

jue


 
Reply With Quote
 
 
 
 
Ron Heiby
Guest
Posts: n/a
 
      06-25-2004
Purl Gurl <> wrote:
>You need to provide your system type.


Sorry. The system is Red Hat 8.0 Linux.

>quota -v (total disk usage per ownership)
>du -ask (per directory total disk usage kilobytes)


These look like they would tell me how much space is being used, but I do not see how
they would address the aspect of deleting the oldest.

>ls -la (returns a list of files / sizes)
>ls -laR (recursive list of files / sizes)


With either of these (and -t), I'm pretty sure that I can get a date-sorted list of the
files in the various directories, but each directory is listed separately. If I could
see how to get one combined list, that would be a big step forward.

>Use of quota seems to be best suited for your task.


I don't see how quota deletes the old files. I admit that I've never used the quota
system, but the little I've looked at it, it seems that it is more for preventing the
creation of new files that would exceed the limit. I cannot do that. I must delete the
old ones.

Thanks!
 
Reply With Quote
 
Ron Heiby
Guest
Posts: n/a
 
      06-25-2004
"Jürgen Exner" <> wrote:
>Forget File::Find, you don't need it
>because you already have the comprehensive list of all directories.


Sorry I didn't make that part clear. I know the odd-ball directory and I know the
parent directory of the other directories of interest. However, I do not know, a
priori, what their names are.

>For your purposes a file consists of the name including the full path, the
>file size, and the date.


Makes sense.

>The obvious data structure would be an array of hash where each hash
>contains three items, namely the qualified file name, the size, and the
>date.


I thought that a hash matched a single key with a single value. What would you have as
the key? Would I have the value be an array reference with the array holding the other
two? Or, am I as confused as I think I am?

>In step two you simply add all the sizes to determine your total used space.
>Or you can do that while collecting the files in step 1 already.


Yes, during collection makes sense to me.

>Then sort the array by the date element.


Perhaps when I better understand how you are picturing the data structure this will
become clearer. It sounds like the date is the hash key. I'm thinking that if this is
the case, I'll want to use the "raw" UNIX style seconds-since-epoch date value. But, I
think I'll still need to be careful of potential collisions, where multiple files have
the same modification date. This should happen rarely, and if I just increment the date
value of the colliders until the date is unique, that won't be a problem. Maybe there's
no reason why the date has to be the key, though. the full pathname of each file is
already unique, and could probably be the key just as well. I'm still confused about
having two values for each key in the hash, though.

>And then beginning with the oldest file delete files (you got the fully
>qualified name in the hash) until the added size of all deleted files is
>larger than the difference between desired size and actual size as
>determined in step 2.


Speaking of size -- I think the size that matters here is the number of Kbytes that the
file is actually taking up on the drive, which is likely slightly larger than its
length might imply. On the other hand, if that's a real pain, I can pretty easily
ignore that slop, as this does not have to be completely exact. If I leave a few of the
files lying around an extra day, it's no problem.

A couple other things I failed to mention earlier that may be useful to know -- The
typical size of each of these files will be in the 50-100 Kbyte realm. We're talking
about keeping around a configurable amount of these files, with the default being 250
Megabytes.

Thanks!
 
Reply With Quote
 
Jürgen Exner
Guest
Posts: n/a
 
      06-25-2004
Ron Heiby wrote:
> "Jürgen Exner" <> wrote:
>> Forget File::Find, you don't need it
>> because you already have the comprehensive list of all directories.

>
> Sorry I didn't make that part clear. I know the odd-ball directory
> and I know the parent directory of the other directories of interest.
> However, I do not know, a priori, what their names are.


Well, ok, then yes, File::Find would be the best tool to enumerate all file.

>> For your purposes a file consists of the name including the full
>> path, the file size, and the date.

>
> Makes sense.
>
>> The obvious data structure would be an array of hash where each hash
>> contains three items, namely the qualified file name, the size, and
>> the date.

>
> I thought that a hash matched a single key with a single value. What
> would you have as the key?


Each hash would contain 3 elements, the keys being: 'name', 'size', and
'date'.
This represents one abstract file.

> Would I have the value be an array
> reference with the array holding the other two? Or, am I as confused
> as I think I am?


You need the complete list of all files. Easiest technical implementation is
a array (= list) of hashes (= files).

>> In step two you simply add all the sizes to determine your total
>> used space. Or you can do that while collecting the files in step 1
>> already.

>
> Yes, during collection makes sense to me.
>
>> Then sort the array by the date element.

>
> Perhaps when I better understand how you are picturing the data
> structure this will become clearer. It sounds like the date is the
> hash key. I'm thinking that if this is the case, I'll want to use the
> "raw" UNIX style seconds-since-epoch date value. But, I think I'll
> still need to be careful of potential collisions, where multiple
> files have the same modification date.

[...]

You are thinking way to complicated. You got a list of files, implemented as
an array of hashes. Now just sort that list by the date of each file and
then start deleting from the upper (or lower) end of the sorted array.

jue


 
Reply With Quote
 
Michele Dondi
Guest
Posts: n/a
 
      06-25-2004
On Fri, 25 Jun 2004 03:43:53 GMT, Ron Heiby <>
wrote:

>I have a system where data files are created in multiple directories. I need to run a
>daily script that will total the disk space used by all the files in all the
>directories and see whether the space exceeds some MAXSPACE value. In this case, all
>but one of the directories are subdirectories of a common parent dir, while the other
>one is off on its own. If the space does exceed the maximum, I need to start deleting
>files, oldest first, until the total space used drops just below the maximum.
>
>I've been looking at File::Find, and File::stat, among others, but don't quite see how
>this all can be hung together to accomplish this seemingly simple task.


Generally it's not considered a good idea to post complete solutions,
but see is this (untested!) can help you:


#!/usr/bin/perl -l

use strict;
use warnings;
use File::Find;
use constant MAXSPACE => 0xA00_000; # 10Mb

@ARGV=grep { -d or !warn "`$_': not a directory!\n" } @ARGV;
die <<"EOD" unless @ARGV;
Usage: $0 <dir> [<dirs>]
EOD

my @files;

find { no_chdir => 1,
wanted => sub {
return unless -f;
print "Examining ", $_;
push @files, [ $_, (stat _)[7,9] ];
} }, @ARGV;

my $t=-(MAXSPACE);
$t+=$_->[1] for @files;

print "No file needs to be deleted" and exit if $t <= 0;

for (sort { $a->[2] <=> $b->[2] } @files) {
unlink $_->[0] and
print "Removing `$_->[0]'" or
warn "Can't remove `$_->[0]': $!\n";
last if ($t-=$_->[1]) <= 0;
}

__END__


Michele
--
you'll see that it shouldn't be so. AND, the writting as usuall is
fantastic incompetent. To illustrate, i quote:
- Xah Lee trolling on clpmisc,
"perl bug File::Basename and Perl's nature"
 
Reply With Quote
 
Ron Heiby
Guest
Posts: n/a
 
      06-25-2004
Purl Gurl <> wrote:
>Sorry Ron, our family cannot allow you to visit
>as much as we would like for you to visit.


Golly. Sorry about all the problems you've been having. I was able to get to the page
you listed and have a copy of the script. I will be taking a look at it today. I
appreciate all the help I've received. Thanks!

--
Ron.
 
Reply With Quote
 
Ron Heiby
Guest
Posts: n/a
 
      06-25-2004
Michele Dondi <> wrote:
Thanks! I'll be looking at this today. One thing is for sure, I'm learning some new (to
me) things about using Perl!

--
Ron.
 
Reply With Quote
 
Sherm Pendley
Guest
Posts: n/a
 
      06-25-2004
Ron Heiby wrote:

> Golly. Sorry about all the problems you've been having.


Do yourself a favor and read a few more of her rants before you start
feeling too sorry for her. She's delusional, and the "problems" she speaks
of exist only in her imagination.

sherm--

--
Cocoa programming in Perl: http://camelbones.sourceforge.net
Hire me! My resume: http://www.dot-app.org
 
Reply With Quote
 
Michele Dondi
Guest
Posts: n/a
 
      06-25-2004
On Fri, 25 Jun 2004 12:32:10 GMT, Ron Heiby <>
wrote:

>Michele Dondi <> wrote:
>Thanks! I'll be looking at this today. One thing is for sure, I'm learning some new (to
>me) things about using Perl!


Well, since I wrote the script in the first place, you may (modify it
suitably and) try it on a sample directory: please tell me if there's
anything wrong with it and ask for clarification...


Michele
--
you'll see that it shouldn't be so. AND, the writting as usuall is
fantastic incompetent. To illustrate, i quote:
- Xah Lee trolling on clpmisc,
"perl bug File::Basename and Perl's nature"
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
FAQ Topic - How do I trim whitespace - LTRIM/RTRIM/TRIM? FAQ server Javascript 26 02-26-2007 05:06 PM
Any programs to trim white space/ remove all white space in HTML file? Ben C HTML 6 01-28-2007 11:41 PM
FAQ Topic - How do I trim whitespace - LTRIM/RTRIM/TRIM? FAQ server Javascript 6 12-25-2006 08:47 PM
FAQ Topic - How do I trim whitespace - LTRIM/RTRIM/TRIM? FAQ server Javascript 0 10-25-2006 11:00 PM
FAQ Topic - How do I trim whitespace - LTRIM/RTRIM/TRIM? FAQ server Javascript 0 08-28-2006 11:00 PM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57