Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > reading a directory, first files the newest ones

Reply
Thread Tools

reading a directory, first files the newest ones

 
 
jordilin
Guest
Posts: n/a
 
      10-28-2007
When I read a huge directory with opendir,
opendir(DIR,"dirname");
my $file;
while($file=readdir(DIR))
whatever...
it loads the oldest ones first. I would like the newest files first,
instead of the oldest. Taking into account that I am only interested
in the newest files, this takes a lot of time, as the directory is
really huge. I am talking about thousands and thousands of files. I
need to process the files that are two hours old from now. I am not
interested in those older than two hours ago. I know that because I
check the modification time with stat.
any idea?
Thanks in advance

 
Reply With Quote
 
 
 
 
xhoster@gmail.com
Guest
Posts: n/a
 
      10-28-2007
jordilin <> wrote:
> When I read a huge directory with opendir,
> opendir(DIR,"dirname");
> my $file;
> while($file=readdir(DIR))
> whatever...
> it loads the oldest ones first. I would like the newest files first,
> instead of the oldest.


That is completely up to your OS and your file system. Perl just provides
a fairly simple conduit for their behavior to reach you.

> Taking into account that I am only interested
> in the newest files, this takes a lot of time, as the directory is
> really huge. I am talking about thousands and thousands of files. I
> need to process the files that are two hours old from now. I am not
> interested in those older than two hours ago. I know that because I
> check the modification time with stat.
> any idea?


Come up with a better directory structure; one that doesn't involve keeping
thousands and thousands of file in one directory that has to be scanned
over and over again. Or make whatever puts the files into that directory
to make a log, or to also create a symbolic link in another directory
pointing to the new file, which link can be deleted after 2 hours or so.

Its possible that your OS and your file system provide other tools for
inspecting very large directories more efficiently, but I rather doubt it.

Xho

--
-------------------- http://NewsReader.Com/ --------------------
The costs of publication of this article were defrayed in part by the
payment of page charges. This article must therefore be hereby marked
advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
this fact.
 
Reply With Quote
 
 
 
 
Gunnar Hjalmarsson
Guest
Posts: n/a
 
      10-28-2007
jordilin wrote:
> When I read a huge directory with opendir,
> opendir(DIR,"dirname");
> my $file;
> while($file=readdir(DIR))
> whatever...
> it loads the oldest ones first. I would like the newest files first,
> instead of the oldest. Taking into account that I am only interested
> in the newest files, this takes a lot of time,


How much time is that?

> as the directory is
> really huge. I am talking about thousands and thousands of files. I
> need to process the files that are two hours old from now. I am not
> interested in those older than two hours ago.


You may want to use grep() to assign to an array the files you are
interested in.

my @files = grep -M $_ <= 2/24, readdir DIR;

--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl
 
Reply With Quote
 
John W. Krahn
Guest
Posts: n/a
 
      10-28-2007
jordilin wrote:
>
> When I read a huge directory with opendir,
> opendir(DIR,"dirname");


You should *always* verify that the directory opened successfully:

opendir DIR, 'dirname' or die "Cannot open 'dirname' $!";

> my $file;
> while($file=readdir(DIR))
> whatever...
> it loads the oldest ones first.


No, it reads the file names in the order that they are stored in the
directory. It is just a coincidence that the older ones appear before
the newer ones.

> I would like the newest files first, instead of the oldest.


Then you will have to sort them yourself.

perldoc -f sort

> Taking into account that I am only interested
> in the newest files, this takes a lot of time, as the directory is
> really huge. I am talking about thousands and thousands of files. I
> need to process the files that are two hours old from now. I am not
> interested in those older than two hours ago. I know that because I
> check the modification time with stat.
> any idea?


The only thing you can do is read all the file names in the directory
and stat() each one.



John
--
use Perl;
program
fulfillment
 
Reply With Quote
 
jordilin
Guest
Posts: n/a
 
      10-28-2007
On Oct 28, 1:36 am, Gunnar Hjalmarsson <nore...@gunnar.cc> wrote:
> jordilin wrote:
> > When I read a huge directory with opendir,
> > opendir(DIR,"dirname");
> > my $file;
> > while($file=readdir(DIR))
> > whatever...
> > it loads the oldest ones first. I would like the newest files first,
> > instead of the oldest. Taking into account that I am only interested
> > in the newest files, this takes a lot of time,

>
> How much time is that?
>
> > as the directory is
> > really huge. I am talking about thousands and thousands of files. I
> > need to process the files that are two hours old from now. I am not
> > interested in those older than two hours ago.

>
> You may want to use grep() to assign to an array the files you are
> interested in.
>
> my @files = grep -M $_ <= 2/24, readdir DIR;
>
> --
> Gunnar Hjalmarsson
> Email:http://www.gunnar.cc/cgi-bin/contact.pl


To grab the files that are from two hours ago till now, I have to
process each file to check the modification time. Obviously, if the
while checks the oldest files first, it can take more than 10 minutes
to arrive for those files I am interested in. This directory has a
huge amount of files.

 
Reply With Quote
 
Gunnar Hjalmarsson
Guest
Posts: n/a
 
      10-28-2007
jordilin wrote:
> When I read a huge directory with opendir,
> opendir(DIR,"dirname");
> my $file;
> while($file=readdir(DIR))
> whatever...
> it loads the oldest ones first. I would like the newest files first,
> instead of the oldest. Taking into account that I am only interested
> in the newest files, this takes a lot of time, as the directory is
> really huge. I am talking about thousands and thousands of files. I
> need to process the files that are two hours old from now. I am not
> interested in those older than two hours ago.


Maybe you should let the system do the desired sorting. On *nix that
might be:

chomp( my @files = qx(ls -t $dir) );
foreach my $file (@files) {
last if -M "$dir/$file" > 2/24;
print "$file\n";
}

--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl
 
Reply With Quote
 
Jürgen Exner
Guest
Posts: n/a
 
      10-28-2007
jordilin wrote:
> On Oct 28, 1:36 am, Gunnar Hjalmarsson <nore...@gunnar.cc> wrote:
>> You may want to use grep() to assign to an array the files you are
>> interested in.
>>
>> my @files = grep -M $_ <= 2/24, readdir DIR;
>>

> To grab the files that are from two hours ago till now, I have to
> process each file to check the modification time.


Yes. That is what the -M does.

> Obviously, if the
> while checks the oldest files first, it can take more than 10 minutes
> to arrive for those files I am interested in.


That is exactly why Gunnar suggest not to use a while() loop but grep() in
the first place.

jue


 
Reply With Quote
 
jordilin
Guest
Posts: n/a
 
      10-28-2007
On Oct 28, 2:02 am, Gunnar Hjalmarsson <nore...@gunnar.cc> wrote:
> jordilin wrote:
> > When I read a huge directory with opendir,
> > opendir(DIR,"dirname");
> > my $file;
> > while($file=readdir(DIR))
> > whatever...
> > it loads the oldest ones first. I would like the newest files first,
> > instead of the oldest. Taking into account that I am only interested
> > in the newest files, this takes a lot of time, as the directory is
> > really huge. I am talking about thousands and thousands of files. I
> > need to process the files that are two hours old from now. I am not
> > interested in those older than two hours ago.

>
> Maybe you should let the system do the desired sorting. On *nix that
> might be:
>
> chomp( my @files = qx(ls -t $dir) );
> foreach my $file (@files) {
> last if -M "$dir/$file" > 2/24;
> print "$file\n";
> }
>
> --
> Gunnar Hjalmarsson
> Email:http://www.gunnar.cc/cgi-bin/contact.pl


With this code, and taking into account that the directory is huge,
memory usage would be a problem as we are going to use a huge array
@files, and the Unix server is a very important one. Don't know if
that could be achieved by means of a while. The real problem is having
to process many files before arriving to the interesting ones. The
solution would be reading the newest ones first. I think there is no
solution. We have, either to slurp all the files into an array (which
is going to take time and memory), or process the whole directory
through a while (one file at a time) till we get the proper files,
which in this case is going to take a lot of time as well.

 
Reply With Quote
 
jordilin
Guest
Posts: n/a
 
      10-28-2007
On Oct 28, 2:04 am, "Jürgen Exner" <jurge...@hotmail.com> wrote:
> jordilin wrote:
> > On Oct 28, 1:36 am, Gunnar Hjalmarsson <nore...@gunnar.cc> wrote:
> >> You may want to use grep() to assign to an array the files you are
> >> interested in.

>
> >> my @files = grep -M $_ <= 2/24, readdir DIR;

>
> > To grab the files that are from two hours ago till now, I have to
> > process each file to check the modification time.

>
> Yes. That is what the -M does.
>
> > Obviously, if the
> > while checks the oldest files first, it can take more than 10 minutes
> > to arrive for those files I am interested in.

>
> That is exactly why Gunnar suggest not to use a while() loop but grep() in
> the first place.
>
> jue


Yeah, it seems that this would be a solution.

 
Reply With Quote
 
Gunnar Hjalmarsson
Guest
Posts: n/a
 
      10-28-2007
jordilin wrote:
> On Oct 28, 2:02 am, Gunnar Hjalmarsson <nore...@gunnar.cc> wrote:
>> Maybe you should let the system do the desired sorting. On *nix that
>> might be:
>>
>> chomp( my @files = qx(ls -t $dir) );
>> foreach my $file (@files) {
>> last if -M "$dir/$file" > 2/24;
>> print "$file\n";
>> }

>
> With this code, and taking into account that the directory is huge,


How big is "huge"?

> memory usage would be a problem as we are going to use a huge array
> @files, and the Unix server is a very important one. Don't know if
> that could be achieved by means of a while. The real problem is having
> to process many files before arriving to the interesting ones.


With the above suggestion you wouldn't _process_ any files but the
interesting ones; you'd just store their names in an array.

> The solution would be reading the newest ones first.


And that's what the -t option achieves...

> I think there is no solution.


??

> We have, either to slurp all the files into an array (which
> is going to take time and memory), or process the whole directory
> through a while (one file at a time) till we get the proper files,
> which in this case is going to take a lot of time as well.


Have you measured the time for various options? You may want to study
the Benchmark module.

--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
I need to convert MSWord ".doc" files to plain data ".rtf" ones . . . lbrtchx@gmail.com Java 13 01-01-2008 06:51 PM
Copying files without halting on bad ones yawnmoth Computer Information 0 12-22-2005 09:30 AM
Newest URLs on top of address bar A Firefox 1 08-30-2005 01:42 PM
Hard drives--Are big ones more delicate than smaller ones? PowerPost2000 Computer Support 2 03-01-2005 10:30 PM
Newest Version of Switch Software ?? Howard Huntley Cisco 4 04-09-2004 02:37 AM



Advertisments