Velocity Reviews

Velocity Reviews (http://www.velocityreviews.com/forums/index.php)
-   Perl Misc (http://www.velocityreviews.com/forums/f67-perl-misc.html)
-   -   finding subdirectories without parsing every file (http://www.velocityreviews.com/forums/t881801-finding-subdirectories-without-parsing-every-file.html)

Helen 08-14-2003 07:45 AM

finding subdirectories without parsing every file
 
Hi

Is there any way to get the subdirectories of a directory without
having to sort through all the files in a directory?

I'm actually building a little perl script that looks at the
directories and then prints out a directory tree (as a webpage).

I've been using file::find to generate the directory tree but it's too
slow. I think the problem is that it looks at each file in the
directory. I'm not interested in what's in the directory, I just want
to know what the subdirectories are.

It takes about 30 seconds to build the directory tree on some of the
larger sites and the directory searching seems to be where the
bottlekneck is. That's compared to around 5 seconds to just download
the file.

Thanks :)

Helen

Tore Aursand 08-14-2003 01:50 PM

Re: finding subdirectories without parsing every file
 
On Thu, 14 Aug 2003 00:45:26 -0700, Helen wrote:
> Is there any way to get the subdirectories of a directory without
> having to sort through all the files in a directory?


Why is it going slow? Maybe you could share some of your code with us, so
that we're able to actually know what you're talking about?

I tend to use the following code when "filtering out" directories;

#!/usr/bin/perl
#
use strict;
use warnings;
use File::Find;

my $root = '/home/tore';
my @dirs = ();
find sub { push(@dirs, $File::Find::name) if -d }, $root;

I guess there are faster ways to do it (as always), but this solution does
it for me.


--
Tore Aursand <tore@aursand.no>

danglesocket 08-14-2003 04:54 PM

Re: finding subdirectories without parsing every file
 
>>> Helen<helen@helephant.com> 8/14/2003 3:45:26 AM >>>
Hi

Is there any way to get the subdirectories of a directory without
having to sort through all the files in a directory?

I'm actually building a little perl script that looks at the
directories and then prints out a directory tree (as a webpage).

I've been using file::find to generate the directory tree but it's too
slow. I think the problem is that it looks at each file in the
directory. I'm not interested in what's in the directory, I just want
to know what the subdirectories are.

-this should do what you want, it's kind of like a 'tree' cmd on win but
*nix like.
( none of those lines ).
#!/usr/bin/perl -w

use strict;
use warnings;

#yeah, 20 minutes later
#my $path = '/';
my $path = '/cygdrive/h/';
#my $path = '/home/dangle/';
&read_dir($path);


sub open_dir {
my $dir = shift;
my $path;
chdir($dir) or die "can't opendir $dir: $!";
opendir(DIR, $dir) || die "can't opendir $dir: $!";
$path = `pwd`;
my @dir_o = grep { !/^\./ && -d "$dir/$_" } readdir(DIR);
closedir DIR;
foreach my $dir_n (@dir_o) {
my $path_n = "$dir/$dir_n/";
print $path_n, "\n";
&read_dir($path_n);
}
}

# did you find a directory
# send it to open dir to get contents
sub read_dir {
# send pwd
my $path = shift;
opendir(DIR, $path) || die "can't opendir $path: $!";
my @dir_o = grep { !/^\./ && -d "$path/$_" } readdir(DIR);
closedir DIR;
foreach my $dir (@dir_o) {
print $path . $dir . "\n";
&open_dir($path . $dir);
}
}






__danglesocket__


James Willmore 08-15-2003 01:20 AM

Re: finding subdirectories without parsing every file
 
> Purl Gurl
> --
>
> #!perl
>
> print "Content-type: text/plain\n\n";
>
> $internal_path = "c:/apache/users/callgirl";


works better if you do this:

chdir($internal_path)
or die "Can't chdir to $internal_path: $!\n";

Just to show why:
--unmodified--
[jim@oplinux jim]$ perl news.pl
Content-type: text/plain

[jim@oplinux jim]$

--with my 'die' added--
[jim@oplinux jim]$ perl news.pl
Content-type: text/plain

Can't chdir to c:/apache/users/callgirl: No such file or directory
[jim@oplinux jim]$

But hey, you were busy and didn't have time to test
it before posting ... right?

Jim

James Willmore 08-15-2003 05:18 AM

Re: finding subdirectories without parsing every file
 
> James Willmore wrote:
>
> > > Purl Gurl

>
> (snipped)
>
> > > $internal_path = "c:/apache/users/callgirl";

>
> > Can't chdir to c:/apache/users/callgirl: No such file or directory

>
> > But hey, you were busy and didn't have time to test
> > it before posting ... right?

>
>
> No, you are inventing a lame excuse to troll
> evidenced by this idiocy of expecting sample
> code to compensate for every possibility,
> which is an insulting slap across the face
> of a reader.

<snip>

No, I was pointing out an error you made and made light of it -
because you always seem to point out the mistakes of others in a
demeaning way. I figured that I would return the favor.
(RE: "I am sure you boys can determine what is wrong.")

More to the point - the results, if you had bothered to read them
fully, pointed out that - your method did NOT return an error when
using the 'chdir' function, the fix I added did.

Now, what if the end user of your version changed the path, made a
mistake in typing, and then the script didn't work. Would it not be
more productive that the script TELL them EXACTLY what happened? You
ALWAYS complain about the code of others - why did you not just own
the error (okay - poor coding) and move on instead of going off on
some ramblings about God knows what - I just ignored it (not just snip
it in the reply)?

If you want to continue this 'flame out', you have my email address.
Don't take up bandwidth doing it here.


Helen 08-15-2003 07:12 AM

Re: finding subdirectories without parsing every file
 
jwillmore@cyberia.com (James Willmore) wrote in message news:<e0160815.0308141712.67b4eac2@posting.google. com>...
> > Is there any way to get the subdirectories of a directory without
> > having to sort through all the files in a directory?
> > <snip>
> > I've been using file::find to generate the directory tree but it's too
> > slow. I think the problem is that it looks at each file in the
> > directory. I'm not interested in what's in the directory, I just want
> > to know what the subdirectories are.

>


Thanks for the help of all who've answered my post. :)

> Ah.... but how far down the parent directory do you wish to search?
> File::Find has a 'finddepth' method and a multitude of options.


I really need it to list all of the directories, no matter how deep it
goes. I've designed the system so that it's simple to make sure that
the directory tree doesn't go too deep, but I didn't want to enforce a
depth because it makes the script less flexiable.

> Post your code and maybe we can lend more assistance.


I'm using the method below to build a "tree" structure which
represents the directories on our web server. The main complication is
that sites can have subsites, but in this part of the code I'm only
looking for the subdirectories of one site. If it finds another
subsite it stops recursing. This works because I load all the subsites
into the tree before I load all the subdirectories.

The directories and sites are stored in a tree object that uses the
directory and site path to add new sites/dirs to the tree. It's then
quite easy to recurse the bits I want when I'm printing the tree.

On the page where I'm doing the recursing it prints out only the
subdirectories of the site that don't belong to another subsite. So
it's really only looking at a small part of the tree. The problem is
that "small" is a relative term. I'm testing it with a subsite that
has 800 subdirectories (and over 9000 files) as a worst case scenario
(which isn't the biggest site on the server). I'm not sure I'll be
able to get the load time to anywhere near 10 seconds, but I like
working with such a large site because the effects of changing parts
of the script are exagerated.

The subsites are stored in a database, but the first thing I did was
make sure that all the database accesses happened at the same time. So
there are only two calls to the database (no matter how big the tree
gets) and they both use the same database handle. The database stuff
happens before I go looking for the subdirectories.

my $nodePath = "$basePath/".$node->getDirectory();
find(\&wanted, "$basePath/".$node->getDirectory());

sub wanted {
my $currentFile = $File::Find::name;
if(-d $currentFile) {
if($currentFile ne $nodePath) {
my $newDir = $currentFile;
$newDir =~ s/$basePath\///;

# if this directory is actually a site,
# we only want to recurse it
# if we're told to by the recurseSubSites parameter
if(!$siteTree->isNodeSite($newDir)) {
# if this directory isn't a site,
# add the directory to the site tree
$siteTree->addDirectory($newDir);
} elsif(!$recurseSubSites) {
# we don't want to recurse any of this directory's subdirs
$File::Find::prune = 1;
} # end if
} # end if
} # end if
} # end wanted

Since I posted here, I've done more comparisons of how fast it runs. A
lot of the problem is with the adding the node to the site tree and
I'm going to try to reduce that by doing sorting within the nodes as I
add them (and probably some other stuff too).

However, it takes a good 10-15 seconds just to print the directories
with the rest of the sub commented out. Perhaps I'm doing something in
an inefficient way? Or is it that I'm going to have to live with this
sort of speed if I'm using perl to recurse that many directories? I
actually didn't realise that I had so many files in the directories, I
thought it was only one or two thousand. I don't think I can rely on
the sorting of the operating system because I'm on a unix system that
seems to just return the files on alphabetical order.

Anyway, any comments or suggestions about the code would be
appreciated. I'm a bit of a newbie perl programmer so I'm just
muddling along and don't really know if I'm doing things the best way.

Thanks again for your help. It's given me a few more things to think
about.

Helen

James Willmore 08-16-2003 03:31 AM

Re: finding subdirectories without parsing every file
 
helen@helephant.com (Helen) wrote in message news:<33517f44.0308142312.52443236@posting.google. com>...
<snip>
> On the page where I'm doing the recursing it prints out only the
> subdirectories of the site that don't belong to another subsite. So
> it's really only looking at a small part of the tree. The problem is
> that "small" is a relative term. I'm testing it with a subsite that
> has 800 subdirectories (and over 9000 files) as a worst case scenario
> (which isn't the biggest site on the server). I'm not sure I'll be
> able to get the load time to anywhere near 10 seconds, but I like
> working with such a large site because the effects of changing parts
> of the script are exagerated.


If you want to do benchmarking, you can use the Benchmark module.
This should give you a snapshot of how the changes you make far as far
as time and CPU are concerned.

>
> The subsites are stored in a database, but the first thing I did was
> make sure that all the database accesses happened at the same time. So
> there are only two calls to the database (no matter how big the tree
> gets) and they both use the same database handle. The database stuff
> happens before I go looking for the subdirectories.
>
> my $nodePath = "$basePath/".$node->getDirectory();
> find(\&wanted, "$basePath/".$node->getDirectory());
>
> sub wanted {
> my $currentFile = $File::Find::name;
> if(-d $currentFile) {
> if($currentFile ne $nodePath) {
> my $newDir = $currentFile;
> $newDir =~ s/$basePath\///;
>
> # if this directory is actually a site,
> # we only want to recurse it
> # if we're told to by the recurseSubSites parameter
> if(!$siteTree->isNodeSite($newDir)) {
> # if this directory isn't a site,
> # add the directory to the site tree
> $siteTree->addDirectory($newDir);
> } elsif(!$recurseSubSites) {
> # we don't want to recurse any of this directory's subdirs
> $File::Find::prune = 1;
> } # end if
> } # end if
> } # end if
> } # end wanted


At first glance, it appears that you have everything in place to do
what you want. Just a suggestion - given the amount of files you are
dealing with and what you want the end result to look like, have you
considered writing out to file in, maybe XML or CSV? This would free
up memory and save the information you already processed in the event
your script is killed for some reason. Then you could also just
process the directories with one script and do something with the
results with another. Again, it's just a suggestion and may lead to
other issues.

HTH

Jim

Helen 08-19-2003 05:02 AM

Re: finding subdirectories without parsing every file
 
<snip>

> If you want to do benchmarking, you can use the Benchmark module.
> This should give you a snapshot of how the changes you make far as far
> as time and CPU are concerned.


Just wanted to thank you for this suggestion. It's made my
optimisation a *lot* easier. :)

Helen


All times are GMT. The time now is 02:07 AM.

Powered by vBulletin®. Copyright ©2000 - 2014, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.