Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Is there an alternative to os.walk?

Reply
Thread Tools

Is there an alternative to os.walk?

 
 
Bruce
Guest
Posts: n/a
 
      10-04-2006
Hi all,
I have a question about traversing file systems, and could use some
help. Because of directories with many files in them, os.walk appears
to be rather slow. I`m thinking there is a potential for speed-up since
I don`t need os.walk to report filenames of all the files in every
directory it visits. Is there some clever way to use os.walk or another
tool that would provide functionality like os.walk except for the
listing of the filenames?

 
Reply With Quote
 
 
 
 
Irmen de Jong
Guest
Posts: n/a
 
      10-04-2006
Bruce wrote:
> Hi all,
> I have a question about traversing file systems, and could use some
> help. Because of directories with many files in them, os.walk appears
> to be rather slow.


Provide more info/code. I suspect it is not os.walk itself that is slow,
but rather the code that processes its result...

> I`m thinking there is a potential for speed-up since
> I don`t need os.walk to report filenames of all the files in every
> directory it visits. Is there some clever way to use os.walk or another
> tool that would provide functionality like os.walk except for the
> listing of the filenames?


You may want to take a look at os.path.walk then.

--Irmen
 
Reply With Quote
 
 
 
 
waylan
Guest
Posts: n/a
 
      10-04-2006
Bruce wrote:
> Hi all,
> I have a question about traversing file systems, and could use some
> help. Because of directories with many files in them, os.walk appears
> to be rather slow. I`m thinking there is a potential for speed-up since
> I don`t need os.walk to report filenames of all the files in every
> directory it visits. Is there some clever way to use os.walk or another
> tool that would provide functionality like os.walk except for the
> listing of the filenames?


You might want to check out the path module [1] (not os.path). The
following is from the docs:

> The method path.walk() returns an iterator which steps recursively
> through a whole directory tree. path.walkdirs() and path.walkfiles()
> are the same, but they yield only the directories and only the files,
> respectively.


Oh, and you can thank Paul Bissex for pointing me to path [2].

[1]: http://www.jorendorff.com/articles/python/path/
[2]: http://e-scribe.com/news/289

 
Reply With Quote
 
Bruce
Guest
Posts: n/a
 
      10-07-2006
waylan wrote:
> Bruce wrote:
> > Hi all,
> > I have a question about traversing file systems, and could use some
> > help. Because of directories with many files in them, os.walk appears
> > to be rather slow. I`m thinking there is a potential for speed-up since
> > I don`t need os.walk to report filenames of all the files in every
> > directory it visits. Is there some clever way to use os.walk or another
> > tool that would provide functionality like os.walk except for the
> > listing of the filenames?

>
> You might want to check out the path module [1] (not os.path). The
> following is from the docs:
>
> > The method path.walk() returns an iterator which steps recursively
> > through a whole directory tree. path.walkdirs() and path.walkfiles()
> > are the same, but they yield only the directories and only the files,
> > respectively.

>
> Oh, and you can thank Paul Bissex for pointing me to path [2].
>


> [1]: http://www.jorendorff.com/articles/python/path/
> [2]: http://e-scribe.com/news/289


A little late but.. thanks for the replies, was very useful. Here`s
what I do in this case:

def search(a_dir):
valid_dirs = []
walker = os.walk(a_dir)
while 1:
try:
dirpath, dirnames, filenames = walker.next()
except StopIteration:
break
if dirtest(dirpath,filenames):
valid_dirs.append(dirpath)
return valid_dirs

def dirtest(a_dir):
testfiles = ['a','b','c']
for f in testfiles:
if not os.path.exists(os.path.join(a_dir,f)):
return 0
return 1

I think you`re right - it`s not os.walk that makes this slow, it`s the
dirtest method that takes so much more time when there are many files
in a directory. Also, thanks for pointing me to the path module, was
interesting.

 
Reply With Quote
 
Tim Roberts
Guest
Posts: n/a
 
      10-08-2006
"Bruce" <(E-Mail Removed)> wrote:
>
>A little late but.. thanks for the replies, was very useful. Here`s
>what I do in this case:
>
>def search(a_dir):
> valid_dirs = []
> walker = os.walk(a_dir)
> while 1:
> try:
> dirpath, dirnames, filenames = walker.next()
> except StopIteration:
> break
> if dirtest(dirpath,filenames):
> valid_dirs.append(dirpath)
> return valid_dirs
>
>def dirtest(a_dir):
> testfiles = ['a','b','c']
> for f in testfiles:
> if not os.path.exists(os.path.join(a_dir,f)):
> return 0
> return 1
>
>I think you`re right - it`s not os.walk that makes this slow, it`s the
>dirtest method that takes so much more time when there are many files
>in a directory. Also, thanks for pointing me to the path module, was
>interesting.


Umm, may I point out that you don't NEED the "os.path.exists" call, because
you are already being HANDED a list of all the filenames in that directory?
You could "dirtest" with this much faster routinee:

def dirtest(a_dir,filenames):
for f in ['a','b','c']:
if not f in filenames:
return 0
return 1
--
- Tim Roberts, http://www.velocityreviews.com/forums/(E-Mail Removed)
Providenza & Boekelheide, Inc.
 
Reply With Quote
 
hanumizzle
Guest
Posts: n/a
 
      10-08-2006
On 10/8/06, Tim Roberts <(E-Mail Removed)> wrote:

> Umm, may I point out that you don't NEED the "os.path.exists" call, because
> you are already being HANDED a list of all the filenames in that directory?
> You could "dirtest" with this much faster routinee:
>
> def dirtest(a_dir,filenames):
> for f in ['a','b','c']:
> if not f in filenames:
> return 0
> return 1


Or False / True for sufficiently new versions of Python.

-- Theerasak
 
Reply With Quote
 
Ant
Guest
Posts: n/a
 
      10-08-2006
The idiomatic way of doing the tree traversal is:

def search(a_dir):
valid_dirs = []
for dirpath, dirnames, filenames in os.walk(a_dir):
if dirtest(filenames):
valid_dirs.append(dirpath)
return valid_dirs

Also since you are given a list of filenames in the directory, then why
not just check the list of those files for your test files:

def dirtest(filenames):
testfiles = ['a','b','c']
for f in testfiles:
if not f in filenames:
return False
return False

You'd have to test this to see if it made a difference in performance,
but it makes for more readable code

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
html tag without <> - is there an alternative? David Johnstone HTML 5 04-20-2006 04:35 PM
Quicktime Alternative, RealPlayer Alternative & Media Player Classic John Capleton Computer Support 3 12-05-2005 07:41 AM
is there an alternative to samspade.org? Flighty Computer Support 2 03-06-2004 05:33 PM
Is there a good alternative to Outlook calendar/appointent funtion in a standalone program? say@no.to.spam Computer Support 13 11-26-2003 10:33 AM
is there an alternative to strstr Ramprasad A Padmanabhan C Programming 18 10-28-2003 10:05 PM



Advertisments