Velocity Reviews > Split iterator into multiple streams

Split iterator into multiple streams

Steven D'Aprano
Guest
Posts: n/a

 11-06-2010
Suppose I have an iterator that yields tuples of N items (a, b, ... n).

I want to split this into N independent iterators:

iter1 -> a, a2, a3, ...
iter2 -> b, b2, b3, ...
....
iterN -> n, n2, n3, ...

The iterator may be infinite, or at least too big to collect in a list.

def split(iterable, n):
iterators = []
for i, iterator in enumerate(itertools.tee(iterable, n)):
iterators.append((t[i] for t in iterator))
return tuple(iterators)

But it doesn't work, as all the iterators see the same values:

>>> data = [(1,2,3), (4,5,6), (7,8,9)]
>>> a, b, c = split(data, 3)
>>> list(a), list(b), list(c)

([3, 6, 9], [3, 6, 9], [3, 6, 9])

I tried changing the t[i] to use operator.itergetter instead, but no
luck. Finally I got this:

def split(iterable, n):
iterators = []
for i, iterator in enumerate(itertools.tee(iterable, n)):
f = lambda it, i=i: (t[i] for t in it)
iterators.append(f(iterator))
return tuple(iterators)

which seems to work:

>>> data = [(1,2,3), (4,5,6), (7,8,9)]
>>> a, b, c = split(data, 3)
>>> list(a), list(b), list(c)

([1, 4, 7], [2, 5, 8], [3, 6, 9])

Is this the right approach, or have I missed something obvious?

--
Steven

Ian
Guest
Posts: n/a

 11-06-2010
On Nov 6, 2:52*am, Steven D'Aprano <st...@REMOVE-THIS-
cybersource.com.au> wrote:
>
> def split(iterable, n):
> * * iterators = []
> * * for i, iterator in enumerate(itertools.tee(iterable, n)):
> * * * * iterators.append((t[i] for t in iterator))
> * * return tuple(iterators)
>
> But it doesn't work, as all the iterators see the same values:

Because the value of i is not evaluated until the generator is
actually run; so all the generators end up seeing only the final value
of i rather than the intended values. This is a common problem with
generator expressions that are not immediately run.

> I tried changing the t[i] to use operator.itergetter instead, but no
> luck. Finally I got this:
>
> def split(iterable, n):
> * * iterators = []
> * * for i, iterator in enumerate(itertools.tee(iterable, n)):
> * * * * f = lambda it, i=i: (t[i] for t in it)
> * * * * iterators.append(f(iterator))
> * * return tuple(iterators)
>
> which seems to work:
>
> >>> data = [(1,2,3), (4,5,6), (7,8,9)]
> >>> a, b, c = split(data, 3)
> >>> list(a), list(b), list(c)

>
> ([1, 4, 7], [2, 5, 8], [3, 6, 9])
>
> Is this the right approach, or have I missed something obvious?

That avoids the generator problem, but in this case you could get the
same result a bit more straight-forwardly by just using imap instead:

def split(iterable, n):
iterators = []
for i, iterator in enumerate(itertools.tee(iterable, n)):
iterators.append(itertools.imap(operator.itemgette r(i),
iterator))
return tuple(iterators)

>>> map(list, split(data, 3))

[[1, 4, 7], [2, 5, 8], [3, 6, 9]]

Cheers,
Ian

Peter Otten
Guest
Posts: n/a

 11-06-2010
Steven D'Aprano wrote:

> Suppose I have an iterator that yields tuples of N items (a, b, ... n).
>
> I want to split this into N independent iterators:
>
> iter1 -> a, a2, a3, ...
> iter2 -> b, b2, b3, ...
> ...
> iterN -> n, n2, n3, ...
>
> The iterator may be infinite, or at least too big to collect in a list.
>
>
>
> def split(iterable, n):
> iterators = []
> for i, iterator in enumerate(itertools.tee(iterable, n)):
> iterators.append((t[i] for t in iterator))
> return tuple(iterators)
>
> But it doesn't work, as all the iterators see the same values:
>
>>>> data = [(1,2,3), (4,5,6), (7,8,9)]
>>>> a, b, c = split(data, 3)
>>>> list(a), list(b), list(c)

> ([3, 6, 9], [3, 6, 9], [3, 6, 9])
>
>
> I tried changing the t[i] to use operator.itergetter instead, but no
> luck. Finally I got this:
>
> def split(iterable, n):
> iterators = []
> for i, iterator in enumerate(itertools.tee(iterable, n)):
> f = lambda it, i=i: (t[i] for t in it)
> iterators.append(f(iterator))
> return tuple(iterators)
>
> which seems to work:
>
>>>> data = [(1,2,3), (4,5,6), (7,8,9)]
>>>> a, b, c = split(data, 3)
>>>> list(a), list(b), list(c)

> ([1, 4, 7], [2, 5, 8], [3, 6, 9])
>
>
>
>
> Is this the right approach, or have I missed something obvious?

Here's how to do it with operator.itemgetter():

>>> from itertools import *
>>> from operator import itemgetter
>>> data = [(1,2,3), (4,5,6), (7,8,9)]
>>> abc = [imap(itemgetter(i), t) for i, t in enumerate(tee(data, 3))]
>>> map(list, abc)

[[1, 4, 7], [2, 5, 8], [3, 6, 9]]

I'd say the improvement is marginal. If you want to go fancy you can
calculate n:

>>> def split(items, n=None):

.... if n is None:
.... items = iter(items)
.... first = next(items)
.... n = len(first)
.... items = chain((first,), items)
.... return [imap(itemgetter(i), t) for i, t in enumerate(tee(items, n))]
....
>>> map(list, split([(1,2,3), (4,5,6), (7,8,9)]))

[[1, 4, 7], [2, 5, 8], [3, 6, 9]]

Peter

Raymond Hettinger
Guest
Posts: n/a

 11-06-2010
On Nov 6, 1:52*am, Steven D'Aprano <st...@REMOVE-THIS-
cybersource.com.au> wrote:
> I tried changing the t[i] to use operator.itergetter instead, but no
> luck. Finally I got this:
>
> def split(iterable, n):
> * * iterators = []
> * * for i, iterator in enumerate(itertools.tee(iterable, n)):
> * * * * f = lambda it, i=i: (t[i] for t in it)
> * * * * iterators.append(f(iterator))
> * * return tuple(iterators)
>
> which seems to work:
>
> >>> data = [(1,2,3), (4,5,6), (7,8,9)]
> >>> a, b, c = split(data, 3)
> >>> list(a), list(b), list(c)

>
> ([1, 4, 7], [2, 5, 8], [3, 6, 9])
>
> Is this the right approach, or have I missed something obvious?

That looks about right to me.
It can be compacted a bit:

def split(iterable, n):
return tuple(imap(itemgetter(i), it) for i, it in
enumerate(tee(iterable, n)))

Internally, the tee's iterators are going to accumulate a ton of data
unless they are consumed roughly in parallel. Of course, if they are
consumed *exactly* in lockstep, the you don't need to split them into
separate iterables -- just use the tuples as they come.

Raymond

Paul Rubin
Guest
Posts: n/a

 11-06-2010
Steven D'Aprano <(E-Mail Removed)> writes:
> def split(iterable, n):
> iterators = []
> for i, iterator in enumerate(itertools.tee(iterable, n)):
> f = lambda it, i=i: (t[i] for t in it)
> iterators.append(f(iterator))
> return tuple(iterators)
>
> Is this the right approach, or have I missed something obvious?

I think there is no way around using tee. But the for loop looks ugly.
This looks more direct to me, if I didn't mess something up:

def split(iterable, n):
return tuple(imap(itemgetter(i),t) for i,t in enumerate(tee(iterable,n)))

Arnaud Delobelle
Guest
Posts: n/a

 11-06-2010
Steven D'Aprano <(E-Mail Removed)> writes:

> Suppose I have an iterator that yields tuples of N items (a, b, ... n).
>
> I want to split this into N independent iterators:
>
> iter1 -> a, a2, a3, ...
> iter2 -> b, b2, b3, ...
> ...
> iterN -> n, n2, n3, ...
>
> The iterator may be infinite, or at least too big to collect in a list.
>
>
>
> def split(iterable, n):
> iterators = []
> for i, iterator in enumerate(itertools.tee(iterable, n)):
> iterators.append((t[i] for t in iterator))
> return tuple(iterators)
>
> But it doesn't work, as all the iterators see the same values:
>
>>>> data = [(1,2,3), (4,5,6), (7,8,9)]
>>>> a, b, c = split(data, 3)
>>>> list(a), list(b), list(c)

> ([3, 6, 9], [3, 6, 9], [3, 6, 9])
>
>
> I tried changing the t[i] to use operator.itergetter instead, but no
> luck. Finally I got this:
>
> def split(iterable, n):
> iterators = []
> for i, iterator in enumerate(itertools.tee(iterable, n)):
> f = lambda it, i=i: (t[i] for t in it)
> iterators.append(f(iterator))
> return tuple(iterators)
>
> which seems to work:
>
>>>> data = [(1,2,3), (4,5,6), (7,8,9)]
>>>> a, b, c = split(data, 3)
>>>> list(a), list(b), list(c)

> ([1, 4, 7], [2, 5, 8], [3, 6, 9])
>
>
>
>
> Is this the right approach, or have I missed something obvious?

It is quite straightforward to implement your "split" function without
itertools.tee:

from collections import deque

def split(iterable):
it = iter(iterable)
q = [deque([x]) for x in it.next()]
def proj(qi):
while True:
if not qi:
for qj, xj in zip(q, it.next()):
qj.append(xj)
yield qi.popleft()
for qi in q:
yield proj(qi)

>>> data = [(1,2,3), (4,5,6), (7,8,9)]
>>> a, b, c = split(data)
>>> print list(a), list(b), list(c)

[1, 4, 7] [2, 5, 8] [3, 6, 9]

Interestingly, given "split" it is very easy to implement "tee":

def tee(iterable, n=2):
return split(([x]*n for x in iterable))

>>> a, b = tee(range(10), 2)
>>> a.next(), a.next(), b.next()

(0, 1, 0)
>>> a.next(), a.next(), b.next()

(2, 3, 1)

In fact, split(x) is the same as zip(*x) when x is finite. The
difference is that with split(x), x is allowed to be infinite and with
zip(*x), each term of x is allowed to be infinite. It may be good to
have a function unifying the two.

--
Arnaud