Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > *fastest* was to get a large directory listing in Perl

Reply
Thread Tools

*fastest* was to get a large directory listing in Perl

 
 
Seth Brundle
Guest
Posts: n/a
 
      09-21-2005
There are several methods of getting a large directory listing (3000+ files
in a single directory) in Perl, but all the methods I've tried (<*>,
readdir) are vastly slower in my usage then using readdir in C.

This doesnt seem to make sense, since I imagine perl is just making the same
system call.

Opinions appreciated...



 
Reply With Quote
 
 
 
 
A. Sinan Unur
Guest
Posts: n/a
 
      09-21-2005
"Seth Brundle" <(E-Mail Removed)> wrote in
news:(E-Mail Removed):

> There are several methods of getting a large directory listing (3000+
> files in a single directory) in Perl, but all the methods I've tried
> (<*>, readdir) are vastly slower in my usage then using readdir in C.
>
> This doesnt seem to make sense, since I imagine perl is just making
> the same system call.


Maybe you are using readdir incorrectly?

What is your notion of fast & slow?

How do you measure fast and slow?

D:\Home\asu1\UseNet\clpmisc\r> dir
....
09/21/2005 06:14 PM 0 file998
09/21/2005 06:14 PM 0 file999
09/21/2005 06:14 PM 241 myt.pl
09/21/2005 06:18 PM 266 test.pl
3002 File(s) 507 bytes

D:\Home\asu1\UseNet\clpmisc\r> cat test.pl
#!/usr/bin/perl

use strict;
use warnings;

use Benchmark;

sub ls {
opendir my $dir, '.' or die "Cannot opendir '.': $!";
my @files = readdir $dir;
closedir $dir or die "Cannot closedir '.': $!";
}

timethese -1, { ls => \&ls };

__END__

D:\Home\asu1\UseNet\clpmisc\r> test
Benchmark: running ls for at least 1 CPU seconds...
ls: 1 wallclock secs ( 0.76 usr + 0.31 sys = 1.08 CPU) @ 99.35/s
(n=107)

This is on Windows XPSP2, AMD64 running at 1.8Ghz, 1Gb RAM, about 402 MB
allocated.

What is the equivalent C program you tested?

Sinan

--
A. Sinan Unur <(E-Mail Removed)>
(reverse each component and remove .invalid for email address)

comp.lang.perl.misc guidelines on the WWW:
http://mail.augustmail.com/~tadmc/cl...uidelines.html
 
Reply With Quote
 
 
 
 
xhoster@gmail.com
Guest
Posts: n/a
 
      09-21-2005
"Seth Brundle" <(E-Mail Removed)> wrote:
> There are several methods of getting a large directory listing (3000+
> files in a single directory) in Perl, but all the methods I've tried
> (<*>, readdir) are vastly slower in my usage then using readdir in C.
>
> This doesnt seem to make sense, since I imagine perl is just making the
> same system call.


Perl first has to determine if readdir is in a list or a string context and
has to unwrap the stack. Then it has to make the same system call as C
does. Then it has to copy the contents of the char* "foo.d_name" someplace
safe (unlike C's readdir), and package that up into a perl scalar, and push
that onto the return stack. And in a list context, it has to do that
repeatedly.

> Opinions appreciated...


It is possible you are doing something silly, like calling the underlying
system call 9,000,000+ times. If you posted code (both C and Perl would be
nice, if you want us to do the comparison) (and actual time measurements,
rather than just "vastly slower") we could offer more informed opinions.

Xho

--
-------------------- http://NewsReader.Com/ --------------------
Usenet Newsgroup Service $9.95/Month 30GB
 
Reply With Quote
 
xhoster@gmail.com
Guest
Posts: n/a
 
      09-21-2005
"Seth Brundle" <(E-Mail Removed)> wrote:
> There are several methods of getting a large directory listing (3000+
> files in a single directory) in Perl, but all the methods I've tried
> (<*>, readdir) are vastly slower in my usage then using readdir in C.
>
> This doesnt seem to make sense, since I imagine perl is just making the
> same system call.


Perl first has to determine if readdir is in a list or a scalar context and
has to unwrap the stack. Then it has to make the same system call as C
does. Then it has to copy the contents of the char* "foo.d_name" someplace
safe (unlike C's readdir), and package that up into a perl scalar, and push
that onto the return stack. And in a list context, it has to do that
repeatedly.

> Opinions appreciated...


It is possible you are doing something silly, like calling the underlying
system call 9,000,000+ times. If you posted code (both C and Perl would be
nice, if you want us to do the comparison) (and actual time measurements,
rather than just "vastly slower") we could offer more informed opinions.

Xho

--
-------------------- http://NewsReader.Com/ --------------------
Usenet Newsgroup Service $9.95/Month 30GB
 
Reply With Quote
 
A. Sinan Unur
Guest
Posts: n/a
 
      09-22-2005
"A. Sinan Unur" <(E-Mail Removed)> wrote in
news:Xns96D8BAD449E5Fasu1cornelledu@127.0.0.1:

> "Seth Brundle" <(E-Mail Removed)> wrote in
> news:(E-Mail Removed):
>
>> There are several methods of getting a large directory listing (3000+
>> files in a single directory) in Perl, but all the methods I've tried
>> (<*>, readdir) are vastly slower in my usage then using readdir in C.

....
snip Perl code
....

> D:\Home\asu1\UseNet\clpmisc\r> test
> Benchmark: running ls for at least 1 CPU seconds...
> ls: 1 wallclock secs ( 0.76 usr + 0.31 sys = 1.08 CPU) @ 99.35/s
> (n=107)
>
> This is on Windows XPSP2, AMD64 running at 1.8Ghz, 1Gb RAM, about 402
> MB allocated.


So we get about 100 readdirs in list context per second.

> What is the equivalent C program you tested?


The following C program is really not the equivalent of the Perl program
I posted, but it does copy the names, and creates a list of file names
etc.

I first ran a do-nothing version which called an empty ls() function 100
times to get a baseline timing. The time reported by the Windows'
timethis utility reported an average of 0.16 seconds.

Then I wrote the following:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>


#include <sys/types.h>
#include <dirent.h>

void ls(size_t num_files) {
struct dirent *ent;
DIR *dir;
size_t f;

char **list = malloc(1 + num_files * sizeof(*list) );
if( !list ) {
fprintf(stderr, "Memory allocation error\n");
exit(EXIT_FAILURE);
}

dir = opendir(".");
if( !dir ) {
perror("Cannot open '.'");
}

for(f = 0; f != num_files; ++f) {
char *d_name;
ent = readdir(dir);
if( !ent ) {
break;
}
d_name = malloc(1 + strlen(ent->d_name));
if( !d_name ) {
break;
}
strcpy(d_name, ent->d_name);
list[f] = d_name;
}

list[f] = NULL;
}

int main(void) {
struct dirent *ent;
size_t num_files = 0;

DIR *dir = opendir(".");
if( !dir ) {
perror("Cannot open '.'");
}

while(ent = readdir(dir)) {
++num_files;
}

if( closedir(dir) ) {
perror("Cannot close '.'");
}

{
int i;
for(i = 0; i != 100; ++i) {
ls(num_files);
}
}

return 0;
}

D:\Home\asu1\UseNet\clpmisc\r> gcc -Wall -O2 r.c -o r.exe

D:\Home\asu1\UseNet\clpmisc\r> timethis r.exe

TimeThis : Command Line : r.exe
TimeThis : Start Time : Wed Sep 21 20:20:45 2005
TimeThis : End Time : Wed Sep 21 20:20:47 2005
TimeThis : Elapsed Time : 00:00:01.640

So, again, we get about 100 readdirs per second in list context (so to
speak). Now, clearly, I am not a great C programmer, but I would be
interested to see the C program that generates the vastly superior
timings.

Sinan


--
A. Sinan Unur <(E-Mail Removed)>
(reverse each component and remove .invalid for email address)

comp.lang.perl.misc guidelines on the WWW:
http://mail.augustmail.com/~tadmc/cl...uidelines.html
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
System.IO.Directory.GetDirectories() and System.IO.Directory.GetFiles() are not returning the specified directory Nathan Sokalski ASP .Net 2 09-06-2007 03:58 PM
how to get a http directory listing Carlos Diaz Ruby 2 07-13-2005 02:10 PM
Asking for directory listing on AP 350 flash: atcat Cisco 2 04-30-2005 08:27 AM
How do I get a directory listing on remote server? Dan King ASP General 1 01-04-2005 01:56 AM
Directory Listing JD Perl 0 08-23-2003 11:53 PM



Advertisments