Velocity Reviews > flatten() time trial mystery. (or, 101 ways to flatten a nested list using generators)

# flatten() time trial mystery. (or, 101 ways to flatten a nested list using generators)

Francis Avila
Guest
Posts: n/a

 11-02-2003
A few days ago (see the 'itertools.flatten()?' thread from October 2 I
became obsessed with refactoring a recursive generator that yielded the
leaves of nested iterables. When the dust settled, I had many flatten
functions at hand.

So I had to time them. Results below.

History of the functions (from flattrial.py):

# There are three basic features:
# 1. Can specify a function that determines iterability.
# 2. Can specify class-iterability.
# 3. Can modify sequence before iterating.

##Function: Features Supported : Author
##
##flatten_fa: 1 : Francis Avila
##flatten_po: 1,2,3 : Peter Otten
##flatten_po2: None : Alex Martelli channeling Peter Otten
##flatten_am: None : Alex Martelli
##flatten_dict: 2 : Peter Otten
##flatten_fastcond: 1,2 : Francis Avila
##flatten_itercond: 1,2,3 : Francis Avila
##flatten_dictcond: 1,2,3 : Francis Avila
##flatten_dictdef: 1,2,3 : Francis Avila
##flatten_trydictdef: 1,2,3 : Francis Avila
##flatten_fastdictdef: 1,2,3 : Francis Avila

Tree test flattens tree:
subtree = ['foo']*18 + [1,2,3]*6
tree = [ subtree*10, [ subtree * 8 ] ]

Node test flattens nodetree:
class Node(object):
def __init__(self, label=None, data=None, children=()):
self.children = children
self.label = label
self.data = data
def __iter__(self):
return iter(self.children)

leaves = [Node(chr(i+65),i) for i in range(10)]
branches = [Node(chr(i+65), i, leaves) for i in range(10,30)]
nodetree = [Node(chr(i+65), i, branches) for i in range(30,50)]

Results (Python 2.2):

.....>python flattrial.py
C:\Docs>python flattrial.py
Tree Tests (100 reps):
02.113544 fa
03.129147 po
02.845054 po2
02.587387 am
00.643718 dict
00.648371 fastcond
00.724689 itercond
00.791277 dictcond
01.006224 dictdef
00.833452 trydictdef
00.776937 fastdictdef
Node Tests (10 reps):
02.877818 po
02.633231 itercond
00.878554 dictcond
01.040838 dictdef
00.897504 trydictdef
00.864411 fastdictdef

I'd post flattrial.py, but it's about 500 lines and I don't have any web
space to put it up. Besides, I'm not sure anyone is interested.

A mystery, though. I did not expect dictdef (my magnum opus) to be as slow
as it was, so I investigated. I went to the obvious first: using dict.get
is quite a bit slower than using try: x = dict[key]; except KeyError: x =
default. This is rather inexplicable....

It was still noticeably slower than dictcond. So, I made fastdict, which
emulated dictcond more closely by not allowing the default handler to modify
the sequence passed to it:

def defaulthandler(seq):
try:
it = iter(seq)
except TypeError:
return False, seq
else:
return True, it

def flatten_fastdictdef(iterable, get_iterbility=None):
#Note defaulthandler is no longer an argument.
if get_iterbility is None:
get_iterbility = {''.__class__:False, u''.__class__:False}

# In dictdef, the following try-except is:

# iterbility = get_iterbility.get(iterable.__class__, defaulthandler)

try:
iterbility = get_iterbility[iterable.__class__]
except KeyError:
#Following added to avoid a function call
#Was:

# iterbility = defaulthandler
#
# if iterbility is defaulthandler:
# iterbility, iterable = defaulthandler(iterable)
# get_iterbility[iterable.__class__] = iterbility

#Now:
t = iterable.__class__
try:
iterable = iter(iterable)
except TypeError:
iterbility = get_iterbility[t] = False
else:
iterbility = get_iterbility[t] = True

if callable(iterbility):
iterbility, iterable = iterbility(iterable)

if not iterbility:
yield iterable
else:
for elem in iterable:
for subelem in flatten_fastdictdef(elem, get_iterbility):
yield subelem

This gave the results you see above: even faster than dictcond. The thing
is, I don't have any idea why. The function call overhead doesn't seem to
be enough to explain the difference between the try version of dictdef and
fastdictdef. Nor the name rebinding (which is very fast). Anyone have any
ideas?

Second, is the speed gain of fastdictdef over trydictdef worth the loss of
specifying a defaulthandler that can dictate what goes into the cache
dictionary and modify sequences? (I know, "it depends", but in the general
case, is that a feature anyone would ever *need*? I can't see how.)

--
Francis Avila