Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Help me understand this iterator

Reply
Thread Tools

Help me understand this iterator

 
 
LaundroMat
Guest
Posts: n/a
 
      10-31-2006
Hi,

I've found this script over at effbot
(http://effbot.org/librarybook/os-path.htm), and I can't get my head
around its inner workings. Here's the script:

import os

class DirectoryWalker:
# a forward iterator that traverses a directory tree

def __init__(self, directory):
self.stack = [directory]
self.files = []
self.index = 0

def __getitem__(self, index):
while 1:
try:
file = self.files[self.index]
self.index = self.index + 1
except IndexError:
# pop next directory from stack
self.directory = self.stack.pop()
self.files = os.listdir(self.directory)
self.index = 0
else:
# got a filename
fullname = os.path.join(self.directory, file)
if os.path.isdir(fullname) and not
os.path.islink(fullname):
self.stack.append(fullname)
return fullname

for file in DirectoryWalker("."):
print file

Now, if I look at this script step by step, I don't understand:
- what is being iterated over (what is being called by "file in
DirectoryWalker()"?);
- where it gets the "index" value from;
- where the "while 1:"-loop is quitted.

Thanks in advance,

Mathieu

 
Reply With Quote
 
 
 
 
Peter Otten
Guest
Posts: n/a
 
      10-31-2006
LaundroMat wrote:

> Hi,
>
> I've found this script over at effbot
> (http://effbot.org/librarybook/os-path.htm), and I can't get my head
> around its inner workings. Here's the script:
>
> import os
>
> class DirectoryWalker:
> # a forward iterator that traverses a directory tree
>
> def __init__(self, directory):
> self.stack = [directory]
> self.files = []
> self.index = 0
>
> def __getitem__(self, index):
> while 1:
> try:
> file = self.files[self.index]
> self.index = self.index + 1
> except IndexError:
> # pop next directory from stack
> self.directory = self.stack.pop()
> self.files = os.listdir(self.directory)
> self.index = 0
> else:
> # got a filename
> fullname = os.path.join(self.directory, file)
> if os.path.isdir(fullname) and not
> os.path.islink(fullname):
> self.stack.append(fullname)
> return fullname
>
> for file in DirectoryWalker("."):
> print file
>
> Now, if I look at this script step by step, I don't understand:
> - what is being iterated over (what is being called by "file in
> DirectoryWalker()"?);
> - where it gets the "index" value from;
> - where the "while 1:"-loop is quitted.


With

dw = DirectoryWalker(".")

the for loop is equivalent to

index = 0 # internal variable, not visible from Python
while True:
try:
file = dw[index] # invokes dw.__getitem__(index)
except IndexError:
break
print file

This is an old way of iterating over a sequence which is only used when the
iterator-based approach

dwi = iter(dw) # invokes dw.__iter__()
while True:
try:
file = dwi.next()
except StopIteration:
break
print file

fails.

Peter
 
Reply With Quote
 
 
 
 
Fredrik Lundh
Guest
Posts: n/a
 
      10-31-2006
LaundroMat wrote:

> Now, if I look at this script step by step, I don't understand:
> - what is being iterated over (what is being called by "file in
> DirectoryWalker()"?);


as explained in the text above the script, this class emulates a
sequence. it does this by implementing the __getindex__ method:

http://effbot.org/pyref/__getitem__

> - where it gets the "index" value from;


from the call to __getitem__ done by the for-in loop.

> - where the "while 1:"-loop is quitted.


the loop stops when the stack is empty, and pop raises an IndexError
exception.

note that this is an old example; code written for newer versions of
Python would probably use a recursing generator instead (see the source
code for os.walk in the standard library for an example).

</F>

 
Reply With Quote
 
Steven D'Aprano
Guest
Posts: n/a
 
      10-31-2006
On Tue, 31 Oct 2006 03:36:08 -0800, LaundroMat wrote:

> Hi,
>
> I've found this script over at effbot
> (http://effbot.org/librarybook/os-path.htm), and I can't get my head
> around its inner workings.


[snip code]

> Now, if I look at this script step by step, I don't understand:
> - what is being iterated over (what is being called by "file in
> DirectoryWalker()"?);


What is being iterated over is the list of files in the current directory.
In Unix land (and probably DOS/Windows as well) the directory "." means
"this directory, right here".


> - where it gets the "index" value from;


When Python see's a line like "for x in obj:" it does some special
magic. First it looks to see if obj has a "next" method, that is, it
tries to call obj.next() repeatedly. That's not the case here --
DirectoryWalker is an old-style iterator, not one of the fancy new ones.

Instead, Python tries calling obj[index] starting at 0 and keeps going
until an IndexError exception is raised, then it halts the for loop.

So, think of it like this: pretend that Python expands the following code:

for x in obj:
block

into something like this:

index = 0
while True: # loop forever
try:
x = obj[index]
block # can use x in block
except IndexError:
# catch the exception and escape the while loop
break
index = index + 1
# and now we're done, continue the rest of the program

That's not exactly what Python does, of course, it is much more efficient,
but that's a good picture of what happens.


> - where the "while 1:"-loop is quitted.



The while 1 loop is escaped when the function hits the return statement.



--
Steven.

 
Reply With Quote
 
Peter Otten
Guest
Posts: n/a
 
      10-31-2006
LaundroMat wrote:

[me hitting send too soon]

> Now, if I look at this script step by step, I don't understand:


> - where the "while 1:"-loop is quitted.


> class DirectoryWalker:
> # a forward iterator that traverses a directory tree
>
> def __init__(self, directory):
> self.stack = [directory]
> self.files = []
> self.index = 0
>
> def __getitem__(self, index):
> while 1:
> try:
> file = self.files[self.index]
> self.index = self.index + 1
> except IndexError:
> # pop next directory from stack
> self.directory = self.stack.pop()


If self.stack is empty, pop() will raise an IndexError which terminates both
the 'while 1' loop in __getitem__() and the enclosing 'for file in ...'
loop

> self.files = os.listdir(self.directory)
> self.index = 0
> else:
> # got a filename
> fullname = os.path.join(self.directory, file)
> if os.path.isdir(fullname) and not
> os.path.islink(fullname):
> self.stack.append(fullname)
> return fullname


The return statement feeds the next file to the for loop.

Peter

 
Reply With Quote
 
LaundroMat
Guest
Posts: n/a
 
      10-31-2006
Thanks all, those were some great explanations. It seems I have still
still a long way for me to go before I grasp the intricacies of this
language.

That 'magic index' variable bugs me a little however. It gives me the
same feeling as when I see hard-coded variables. I suppose the
generator class has taken care of this with its next() method (although
- I should have a look - __next__() probable takes self and index as
its arguments). Although I'm very fond of the language (as a
non-formally trained hobbyist developer), that "magic" bit is a tad
disturbing.

Still, thanks for the quick and complete replies!

 
Reply With Quote
 
LaundroMat
Guest
Posts: n/a
 
      10-31-2006
Ack, I get it now. It's not the variable's name ("index") that is
hard-coded, it's just that the for...in... loop sends an argument by
default. That's a lot more comforting.

 
Reply With Quote
 
Fredrik Lundh
Guest
Posts: n/a
 
      10-31-2006
LaundroMat wrote:

> That 'magic index' variable bugs me a little however. It gives me the
> same feeling as when I see hard-coded variables.


what magic index? the variable named "index" is an argument to the
method it's used in.

</F>

 
Reply With Quote
 
LaundroMat
Guest
Posts: n/a
 
      10-31-2006
On Oct 31, 3:53 pm, Fredrik Lundh <(E-Mail Removed)> wrote:
> LaundroMat wrote:
> > That 'magic index' variable bugs me a little however. It gives me the
> > same feeling as when I see hard-coded variables.what magic index? the variable named "index" is an argument to the

> method it's used in.


Yes, I reacted too quickly. Sorry.

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Difference between Java iterator and iterator in Gang of Four Hendrik Maryns Java 18 12-22-2005 05:14 AM
How to convert from std::list<T*>::iterator to std::list<const T*>::iterator? PengYu.UT@gmail.com C++ 6 10-30-2005 03:31 AM
Read all of this to understand how it works. then check around on otherRead all of this to understand how it works. then check around on other thelisa martin Computer Support 2 08-18-2005 06:40 AM
Iterator doubts, Decision on Iterator usage greg C++ 6 07-17-2003 01:26 PM
[1.4] argv[] doesn't understand size() and iterator()?? =?ISO-8859-1?Q?Thomas_Gagn=E9?= Java 13 07-03-2003 05:48 PM



Advertisments