Velocity Reviews

Velocity Reviews (http://www.velocityreviews.com/forums/index.php)
-   Perl Misc (http://www.velocityreviews.com/forums/f67-perl-misc.html)
-   -   *fastest* was to get a large directory listing in Perl (http://www.velocityreviews.com/forums/t894354-fastest-was-to-get-a-large-directory-listing-in-perl.html)

Seth Brundle 09-21-2005 09:57 PM

*fastest* was to get a large directory listing in Perl
 
There are several methods of getting a large directory listing (3000+ files
in a single directory) in Perl, but all the methods I've tried (<*>,
readdir) are vastly slower in my usage then using readdir in C.

This doesnt seem to make sense, since I imagine perl is just making the same
system call.

Opinions appreciated...




A. Sinan Unur 09-21-2005 10:22 PM

Re: *fastest* was to get a large directory listing in Perl
 
"Seth Brundle" <brundlefly76@hotmail.com> wrote in
news:Y9adnSgxcfb5SqzeRVn-jA@comcast.com:

> There are several methods of getting a large directory listing (3000+
> files in a single directory) in Perl, but all the methods I've tried
> (<*>, readdir) are vastly slower in my usage then using readdir in C.
>
> This doesnt seem to make sense, since I imagine perl is just making
> the same system call.


Maybe you are using readdir incorrectly?

What is your notion of fast & slow?

How do you measure fast and slow?

D:\Home\asu1\UseNet\clpmisc\r> dir
....
09/21/2005 06:14 PM 0 file998
09/21/2005 06:14 PM 0 file999
09/21/2005 06:14 PM 241 myt.pl
09/21/2005 06:18 PM 266 test.pl
3002 File(s) 507 bytes

D:\Home\asu1\UseNet\clpmisc\r> cat test.pl
#!/usr/bin/perl

use strict;
use warnings;

use Benchmark;

sub ls {
opendir my $dir, '.' or die "Cannot opendir '.': $!";
my @files = readdir $dir;
closedir $dir or die "Cannot closedir '.': $!";
}

timethese -1, { ls => \&ls };

__END__

D:\Home\asu1\UseNet\clpmisc\r> test
Benchmark: running ls for at least 1 CPU seconds...
ls: 1 wallclock secs ( 0.76 usr + 0.31 sys = 1.08 CPU) @ 99.35/s
(n=107)

This is on Windows XPSP2, AMD64 running at 1.8Ghz, 1Gb RAM, about 402 MB
allocated.

What is the equivalent C program you tested?

Sinan

--
A. Sinan Unur <1usa@llenroc.ude.invalid>
(reverse each component and remove .invalid for email address)

comp.lang.perl.misc guidelines on the WWW:
http://mail.augustmail.com/~tadmc/cl...uidelines.html

xhoster@gmail.com 09-21-2005 10:31 PM

Re: *fastest* was to get a large directory listing in Perl
 
"Seth Brundle" <brundlefly76@hotmail.com> wrote:
> There are several methods of getting a large directory listing (3000+
> files in a single directory) in Perl, but all the methods I've tried
> (<*>, readdir) are vastly slower in my usage then using readdir in C.
>
> This doesnt seem to make sense, since I imagine perl is just making the
> same system call.


Perl first has to determine if readdir is in a list or a string context and
has to unwrap the stack. Then it has to make the same system call as C
does. Then it has to copy the contents of the char* "foo.d_name" someplace
safe (unlike C's readdir), and package that up into a perl scalar, and push
that onto the return stack. And in a list context, it has to do that
repeatedly.

> Opinions appreciated...


It is possible you are doing something silly, like calling the underlying
system call 9,000,000+ times. If you posted code (both C and Perl would be
nice, if you want us to do the comparison) (and actual time measurements,
rather than just "vastly slower") we could offer more informed opinions.

Xho

--
-------------------- http://NewsReader.Com/ --------------------
Usenet Newsgroup Service $9.95/Month 30GB

xhoster@gmail.com 09-21-2005 10:33 PM

Re: *fastest* was to get a large directory listing in Perl
 
"Seth Brundle" <brundlefly76@hotmail.com> wrote:
> There are several methods of getting a large directory listing (3000+
> files in a single directory) in Perl, but all the methods I've tried
> (<*>, readdir) are vastly slower in my usage then using readdir in C.
>
> This doesnt seem to make sense, since I imagine perl is just making the
> same system call.


Perl first has to determine if readdir is in a list or a scalar context and
has to unwrap the stack. Then it has to make the same system call as C
does. Then it has to copy the contents of the char* "foo.d_name" someplace
safe (unlike C's readdir), and package that up into a perl scalar, and push
that onto the return stack. And in a list context, it has to do that
repeatedly.

> Opinions appreciated...


It is possible you are doing something silly, like calling the underlying
system call 9,000,000+ times. If you posted code (both C and Perl would be
nice, if you want us to do the comparison) (and actual time measurements,
rather than just "vastly slower") we could offer more informed opinions.

Xho

--
-------------------- http://NewsReader.Com/ --------------------
Usenet Newsgroup Service $9.95/Month 30GB

A. Sinan Unur 09-22-2005 12:23 AM

Re: *fastest* was to get a large directory listing in Perl
 
"A. Sinan Unur" <1usa@llenroc.ude.invalid> wrote in
news:Xns96D8BAD449E5Fasu1cornelledu@127.0.0.1:

> "Seth Brundle" <brundlefly76@hotmail.com> wrote in
> news:Y9adnSgxcfb5SqzeRVn-jA@comcast.com:
>
>> There are several methods of getting a large directory listing (3000+
>> files in a single directory) in Perl, but all the methods I've tried
>> (<*>, readdir) are vastly slower in my usage then using readdir in C.

....
snip Perl code
....

> D:\Home\asu1\UseNet\clpmisc\r> test
> Benchmark: running ls for at least 1 CPU seconds...
> ls: 1 wallclock secs ( 0.76 usr + 0.31 sys = 1.08 CPU) @ 99.35/s
> (n=107)
>
> This is on Windows XPSP2, AMD64 running at 1.8Ghz, 1Gb RAM, about 402
> MB allocated.


So we get about 100 readdirs in list context per second.

> What is the equivalent C program you tested?


The following C program is really not the equivalent of the Perl program
I posted, but it does copy the names, and creates a list of file names
etc.

I first ran a do-nothing version which called an empty ls() function 100
times to get a baseline timing. The time reported by the Windows'
timethis utility reported an average of 0.16 seconds.

Then I wrote the following:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>


#include <sys/types.h>
#include <dirent.h>

void ls(size_t num_files) {
struct dirent *ent;
DIR *dir;
size_t f;

char **list = malloc(1 + num_files * sizeof(*list) );
if( !list ) {
fprintf(stderr, "Memory allocation error\n");
exit(EXIT_FAILURE);
}

dir = opendir(".");
if( !dir ) {
perror("Cannot open '.'");
}

for(f = 0; f != num_files; ++f) {
char *d_name;
ent = readdir(dir);
if( !ent ) {
break;
}
d_name = malloc(1 + strlen(ent->d_name));
if( !d_name ) {
break;
}
strcpy(d_name, ent->d_name);
list[f] = d_name;
}

list[f] = NULL;
}

int main(void) {
struct dirent *ent;
size_t num_files = 0;

DIR *dir = opendir(".");
if( !dir ) {
perror("Cannot open '.'");
}

while(ent = readdir(dir)) {
++num_files;
}

if( closedir(dir) ) {
perror("Cannot close '.'");
}

{
int i;
for(i = 0; i != 100; ++i) {
ls(num_files);
}
}

return 0;
}

D:\Home\asu1\UseNet\clpmisc\r> gcc -Wall -O2 r.c -o r.exe

D:\Home\asu1\UseNet\clpmisc\r> timethis r.exe

TimeThis : Command Line : r.exe
TimeThis : Start Time : Wed Sep 21 20:20:45 2005
TimeThis : End Time : Wed Sep 21 20:20:47 2005
TimeThis : Elapsed Time : 00:00:01.640

So, again, we get about 100 readdirs per second in list context (so to
speak). Now, clearly, I am not a great C programmer, but I would be
interested to see the C program that generates the vastly superior
timings.

Sinan


--
A. Sinan Unur <1usa@llenroc.ude.invalid>
(reverse each component and remove .invalid for email address)

comp.lang.perl.misc guidelines on the WWW:
http://mail.augustmail.com/~tadmc/cl...uidelines.html


All times are GMT. The time now is 04:02 PM.

Powered by vBulletin®. Copyright ©2000 - 2014, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.