Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Big speed boost in os.walk in Python 2.5

Reply
Thread Tools

Big speed boost in os.walk in Python 2.5

 
 
looping
Guest
Posts: n/a
 
      10-13-2006
Hi,
I noticed a big speed improvement in some of my script that use os.walk
and I write a small script to check it:
import os
for path, dirs, files in os.walk('D:\\FILES\\'):
pass

Results on Windows XP after some run to fill the disk cache (with
~59000 files and ~3500 folders):
Python 2.4.3 : 45s
Python 2.5 : 10s

Very nice, but somewhat strange...
Is Python 2.4.3 os.walk buggy ???
Is this results only valid in Windows or *nix system show the same
difference ?
The profiler show that most of time is spend in ntpath.isdir and this
function is *a lot* faster in Python 2.5.
Maybe this improvement could be backported in Python 2.4 branch for the
next release ?


Python 2.4.3
604295 function calls (587634 primitive calls) in 48.629 CPU
seconds

Ordered by: standard name

ncalls tottime percall cumtime percall filename:lineno(function)
62554 0.264 0.000 0.264 0.000 :0(append)
1 0.001 0.001 48.593 48.593 :0(execfile)
66074 0.197 0.000 0.197 0.000 :0(len)
3521 5.219 0.001 5.219 0.001 :0(listdir)
1 0.036 0.036 0.036 0.036 :0(setprofile)
62554 38.812 0.001 38.812 0.001 :0(stat)
1 0.000 0.000 48.593 48.593 <string>:1(?)
66074 0.218 0.000 0.218 0.000 ntpath.py:116(splitdrive)
3520 0.009 0.000 0.009 0.000 ntpath.py:246(islink)
62554 0.767 0.000 40.137 0.001 ntpath.py:268(isdir)
66074 0.433 0.000 0.650 0.000 ntpath.py:51(isabs)
66074 0.880 0.000 1.726 0.000 ntpath.py:59(join)
20183/3522 1.217 0.000 48.573 0.014 os.py:211(walk)
1 0.000 0.000 48.629 48.629
profile:0(execfile('test.py'))
0 0.000 0.000 profile:0(profiler)
62554 0.174 0.000 0.174 0.000 stat.py:29(S_IFMT)
62554 0.385 0.000 0.559 0.000 stat.py:45(S_ISDIR)
1 0.019 0.019 48.592 48.592 test.py:1(?)


Python 2.5:
604295 function calls (587634 primitive calls) in 17.386 CPU
seconds

Ordered by: standard name

ncalls tottime percall cumtime percall filename:lineno(function)
62554 0.247 0.000 0.247 0.000 :0(append)
1 0.001 0.001 17.315 17.315 :0(execfile)
66074 0.168 0.000 0.168 0.000 :0(len)
3521 5.287 0.002 5.287 0.002 :0(listdir)
1 0.071 0.071 0.071 0.071 :0(setprofile)
62554 7.812 0.000 7.812 0.000 :0(stat)
1 0.000 0.000 17.315 17.315 <string>:1(<module>)
66074 0.186 0.000 0.186 0.000 ntpath.py:116(splitdrive)
3520 0.009 0.000 0.009 0.000 ntpath.py:245(islink)
62554 0.712 0.000 9.013 0.000 ntpath.py:267(isdir)
66074 0.394 0.000 0.581 0.000 ntpath.py:51(isabs)
66074 0.815 0.000 1.564 0.000 ntpath.py:59(join)
20183/3522 1.176 0.000 17.296 0.005 os.py:218(walk)
1 0.000 0.000 17.386 17.386
profile:0(execfile('test.py'))
0 0.000 0.000 profile:0(profiler)
62554 0.159 0.000 0.159 0.000 stat.py:29(S_IFMT)
62554 0.331 0.000 0.489 0.000 stat.py:45(S_ISDIR)
1 0.018 0.018 17.314 17.314 test.py:1(<module>)

 
Reply With Quote
 
 
 
 
Fredrik Lundh
Guest
Posts: n/a
 
      10-13-2006
looping wrote:

> Results on Windows XP after some run to fill the disk cache (with
> ~59000 files and ~3500 folders):
> Python 2.4.3 : 45s
> Python 2.5 : 10s
>
> Very nice, but somewhat strange...
> Is Python 2.4.3 os.walk buggy ???


No. A few "os" function are now implemented in terms of Windows API:s,
instead of using Microsoft C's POSIX compatibility layer. This includes
os.stat(), which is what isdir() uses to check if something is a
directory. The code was rewritten to work around problems with
timestamps, so the speedup is purely a side effect.

> Is this results only valid in Windows or *nix system show the same
> difference ?


On Unix system, Python uses POSIX API:s, not Windows API:s.

> The profiler show that most of time is spend in ntpath.isdir and this
> function is *a lot* faster in Python 2.5.


Why are you asking if something's buggy when you've already figured out
what's been improved?

> Maybe this improvement could be backported in Python 2.4 branch for the
> next release ?


It's not really broken, so that's not very likely.

</F>

 
Reply With Quote
 
 
 
 
looping
Guest
Posts: n/a
 
      10-13-2006
Fredrik Lundh wrote:
> looping wrote:
>
> >
> > Very nice, but somewhat strange...
> > Is Python 2.4.3 os.walk buggy ???

>
>
> Why are you asking if something's buggy when you've already figured out
> what's been improved?
>

You're right, buggy isn't the right word...

Anyway thanks for your detailed informations and I'm very pleased with
the performance improvement even if it's only a side effect and only on
Windows.

 
Reply With Quote
 
=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=
Guest
Posts: n/a
 
      10-13-2006
looping schrieb:
> Maybe this improvement could be backported in Python 2.4 branch for the
> next release ?


As Fredrik explains, this is probably the side-effect of a from-scratch
rewrite of the relevant functions. Another (undesirable) side-effect is
that the resulting binary won't work on Windows 95 anymore. So
backporting it as-is is out of the question.

However, even if the patch was improved to still work on W9x, and to not
introduce the other behavioral changes that came with the rewrite, it
still couldn't go into 2.4.x. Likely, 2.4.4 is the final 2.4 release,
and the release candidate for that was already produced.

Regards,
Martin
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
GIDS 2009 .Net:: Save Big, Win Big, Learn Big: Act Before Dec 29 2008 Shaguf ASP .Net 0 12-26-2008 09:29 AM
GIDS 2009 .Net:: Save Big, Win Big, Learn Big: Act Before Dec 29 2008 Shaguf ASP .Net Web Controls 0 12-26-2008 06:11 AM
GIDS 2009 Java:: Save Big, Win Big, Learn Big: Act Before Dec 29 2008 Shaguf Python 0 12-24-2008 07:35 AM
GIDS 2009 Java:: Save Big, Win Big, Learn Big: Act Before Dec 29 2008 Shaguf Ruby 0 12-24-2008 05:07 AM
Boost + Python C/API: Mixing python return types with boost return types Steve Knight Python 2 10-10-2003 10:11 AM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57