Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > flatten() time trial mystery. (or, 101 ways to flatten a nested list using generators)

Reply
Thread Tools

flatten() time trial mystery. (or, 101 ways to flatten a nested list using generators)

 
 
Francis Avila
Guest
Posts: n/a
 
      11-02-2003
A few days ago (see the 'itertools.flatten()?' thread from October 2 I
became obsessed with refactoring a recursive generator that yielded the
leaves of nested iterables. When the dust settled, I had many flatten
functions at hand.

So I had to time them. Results below.

History of the functions (from flattrial.py):

# There are three basic features:
# 1. Can specify a function that determines iterability.
# 2. Can specify class-iterability.
# 3. Can modify sequence before iterating.

##Function: Features Supported : Author
##
##flatten_fa: 1 : Francis Avila
##flatten_po: 1,2,3 : Peter Otten
##flatten_po2: None : Alex Martelli channeling Peter Otten
##flatten_am: None : Alex Martelli
##flatten_dict: 2 : Peter Otten
##flatten_fastcond: 1,2 : Francis Avila
##flatten_itercond: 1,2,3 : Francis Avila
##flatten_dictcond: 1,2,3 : Francis Avila
##flatten_dictdef: 1,2,3 : Francis Avila
##flatten_trydictdef: 1,2,3 : Francis Avila
##flatten_fastdictdef: 1,2,3 : Francis Avila

Tree test flattens tree:
subtree = ['foo']*18 + [1,2,3]*6
tree = [ subtree*10, [ subtree * 8 ] ]

Node test flattens nodetree:
class Node(object):
def __init__(self, label=None, data=None, children=()):
self.children = children
self.label = label
self.data = data
def __iter__(self):
return iter(self.children)

leaves = [Node(chr(i+65),i) for i in range(10)]
branches = [Node(chr(i+65), i, leaves) for i in range(10,30)]
nodetree = [Node(chr(i+65), i, branches) for i in range(30,50)]

Results (Python 2.2):

.....>python flattrial.py
C:\Docs>python flattrial.py
Tree Tests (100 reps):
02.113544 fa
03.129147 po
02.845054 po2
02.587387 am
00.643718 dict
00.648371 fastcond
00.724689 itercond
00.791277 dictcond
01.006224 dictdef
00.833452 trydictdef
00.776937 fastdictdef
Node Tests (10 reps):
02.877818 po
02.633231 itercond
00.878554 dictcond
01.040838 dictdef
00.897504 trydictdef
00.864411 fastdictdef

I'd post flattrial.py, but it's about 500 lines and I don't have any web
space to put it up. Besides, I'm not sure anyone is interested.

A mystery, though. I did not expect dictdef (my magnum opus) to be as slow
as it was, so I investigated. I went to the obvious first: using dict.get
is quite a bit slower than using try: x = dict[key]; except KeyError: x =
default. This is rather inexplicable....

It was still noticeably slower than dictcond. So, I made fastdict, which
emulated dictcond more closely by not allowing the default handler to modify
the sequence passed to it:

def defaulthandler(seq):
try:
it = iter(seq)
except TypeError:
return False, seq
else:
return True, it

def flatten_fastdictdef(iterable, get_iterbility=None):
#Note defaulthandler is no longer an argument.
if get_iterbility is None:
get_iterbility = {''.__class__:False, u''.__class__:False}

# In dictdef, the following try-except is:

# iterbility = get_iterbility.get(iterable.__class__, defaulthandler)

try:
iterbility = get_iterbility[iterable.__class__]
except KeyError:
#Following added to avoid a function call
#Was:

# iterbility = defaulthandler
#
# if iterbility is defaulthandler:
# iterbility, iterable = defaulthandler(iterable)
# get_iterbility[iterable.__class__] = iterbility

#Now:
t = iterable.__class__
try:
iterable = iter(iterable)
except TypeError:
iterbility = get_iterbility[t] = False
else:
iterbility = get_iterbility[t] = True

if callable(iterbility):
iterbility, iterable = iterbility(iterable)

if not iterbility:
yield iterable
else:
for elem in iterable:
for subelem in flatten_fastdictdef(elem, get_iterbility):
yield subelem


This gave the results you see above: even faster than dictcond. The thing
is, I don't have any idea why. The function call overhead doesn't seem to
be enough to explain the difference between the try version of dictdef and
fastdictdef. Nor the name rebinding (which is very fast). Anyone have any
ideas?

Second, is the speed gain of fastdictdef over trydictdef worth the loss of
specifying a defaulthandler that can dictate what goes into the cache
dictionary and modify sequences? (I know, "it depends", but in the general
case, is that a feature anyone would ever *need*? I can't see how.)

--
Francis Avila

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
flatten a list of list Terry Python 14 08-17-2009 05:30 PM
flatten class and flatten struct wenmang@yahoo.com C++ 2 09-27-2006 08:52 PM
flatten a level one list Robin Becker Python 32 01-13-2006 06:40 PM
Has apparent 2.4b1 bug been fixed? flatten in Lib\compiler\ast.py overloads 'list' name Bengt Richter Python 3 01-19-2005 05:17 PM
List Flatten bearophile Python 10 10-19-2004 09:57 AM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57