Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Split iterator into multiple streams

Reply
Thread Tools

Split iterator into multiple streams

 
 
Steven D'Aprano
Guest
Posts: n/a
 
      11-06-2010
Suppose I have an iterator that yields tuples of N items (a, b, ... n).

I want to split this into N independent iterators:

iter1 -> a, a2, a3, ...
iter2 -> b, b2, b3, ...
....
iterN -> n, n2, n3, ...

The iterator may be infinite, or at least too big to collect in a list.

My first attempt was this:


def split(iterable, n):
iterators = []
for i, iterator in enumerate(itertools.tee(iterable, n)):
iterators.append((t[i] for t in iterator))
return tuple(iterators)

But it doesn't work, as all the iterators see the same values:

>>> data = [(1,2,3), (4,5,6), (7,8,9)]
>>> a, b, c = split(data, 3)
>>> list(a), list(b), list(c)

([3, 6, 9], [3, 6, 9], [3, 6, 9])


I tried changing the t[i] to use operator.itergetter instead, but no
luck. Finally I got this:

def split(iterable, n):
iterators = []
for i, iterator in enumerate(itertools.tee(iterable, n)):
f = lambda it, i=i: (t[i] for t in it)
iterators.append(f(iterator))
return tuple(iterators)

which seems to work:

>>> data = [(1,2,3), (4,5,6), (7,8,9)]
>>> a, b, c = split(data, 3)
>>> list(a), list(b), list(c)

([1, 4, 7], [2, 5, 8], [3, 6, 9])




Is this the right approach, or have I missed something obvious?



--
Steven

 
Reply With Quote
 
 
 
 
Ian
Guest
Posts: n/a
 
      11-06-2010
On Nov 6, 2:52*am, Steven D'Aprano <st...@REMOVE-THIS-
cybersource.com.au> wrote:
> My first attempt was this:
>
> def split(iterable, n):
> * * iterators = []
> * * for i, iterator in enumerate(itertools.tee(iterable, n)):
> * * * * iterators.append((t[i] for t in iterator))
> * * return tuple(iterators)
>
> But it doesn't work, as all the iterators see the same values:


Because the value of i is not evaluated until the generator is
actually run; so all the generators end up seeing only the final value
of i rather than the intended values. This is a common problem with
generator expressions that are not immediately run.

> I tried changing the t[i] to use operator.itergetter instead, but no
> luck. Finally I got this:
>
> def split(iterable, n):
> * * iterators = []
> * * for i, iterator in enumerate(itertools.tee(iterable, n)):
> * * * * f = lambda it, i=i: (t[i] for t in it)
> * * * * iterators.append(f(iterator))
> * * return tuple(iterators)
>
> which seems to work:
>
> >>> data = [(1,2,3), (4,5,6), (7,8,9)]
> >>> a, b, c = split(data, 3)
> >>> list(a), list(b), list(c)

>
> ([1, 4, 7], [2, 5, 8], [3, 6, 9])
>
> Is this the right approach, or have I missed something obvious?


That avoids the generator problem, but in this case you could get the
same result a bit more straight-forwardly by just using imap instead:

def split(iterable, n):
iterators = []
for i, iterator in enumerate(itertools.tee(iterable, n)):
iterators.append(itertools.imap(operator.itemgette r(i),
iterator))
return tuple(iterators)

>>> map(list, split(data, 3))

[[1, 4, 7], [2, 5, 8], [3, 6, 9]]

Cheers,
Ian
 
Reply With Quote
 
 
 
 
Peter Otten
Guest
Posts: n/a
 
      11-06-2010
Steven D'Aprano wrote:

> Suppose I have an iterator that yields tuples of N items (a, b, ... n).
>
> I want to split this into N independent iterators:
>
> iter1 -> a, a2, a3, ...
> iter2 -> b, b2, b3, ...
> ...
> iterN -> n, n2, n3, ...
>
> The iterator may be infinite, or at least too big to collect in a list.
>
> My first attempt was this:
>
>
> def split(iterable, n):
> iterators = []
> for i, iterator in enumerate(itertools.tee(iterable, n)):
> iterators.append((t[i] for t in iterator))
> return tuple(iterators)
>
> But it doesn't work, as all the iterators see the same values:
>
>>>> data = [(1,2,3), (4,5,6), (7,8,9)]
>>>> a, b, c = split(data, 3)
>>>> list(a), list(b), list(c)

> ([3, 6, 9], [3, 6, 9], [3, 6, 9])
>
>
> I tried changing the t[i] to use operator.itergetter instead, but no
> luck. Finally I got this:
>
> def split(iterable, n):
> iterators = []
> for i, iterator in enumerate(itertools.tee(iterable, n)):
> f = lambda it, i=i: (t[i] for t in it)
> iterators.append(f(iterator))
> return tuple(iterators)
>
> which seems to work:
>
>>>> data = [(1,2,3), (4,5,6), (7,8,9)]
>>>> a, b, c = split(data, 3)
>>>> list(a), list(b), list(c)

> ([1, 4, 7], [2, 5, 8], [3, 6, 9])
>
>
>
>
> Is this the right approach, or have I missed something obvious?


Here's how to do it with operator.itemgetter():

>>> from itertools import *
>>> from operator import itemgetter
>>> data = [(1,2,3), (4,5,6), (7,8,9)]
>>> abc = [imap(itemgetter(i), t) for i, t in enumerate(tee(data, 3))]
>>> map(list, abc)

[[1, 4, 7], [2, 5, 8], [3, 6, 9]]

I'd say the improvement is marginal. If you want to go fancy you can
calculate n:

>>> def split(items, n=None):

.... if n is None:
.... items = iter(items)
.... first = next(items)
.... n = len(first)
.... items = chain((first,), items)
.... return [imap(itemgetter(i), t) for i, t in enumerate(tee(items, n))]
....
>>> map(list, split([(1,2,3), (4,5,6), (7,8,9)]))

[[1, 4, 7], [2, 5, 8], [3, 6, 9]]

Peter
 
Reply With Quote
 
Raymond Hettinger
Guest
Posts: n/a
 
      11-06-2010
On Nov 6, 1:52*am, Steven D'Aprano <st...@REMOVE-THIS-
cybersource.com.au> wrote:
> I tried changing the t[i] to use operator.itergetter instead, but no
> luck. Finally I got this:
>
> def split(iterable, n):
> * * iterators = []
> * * for i, iterator in enumerate(itertools.tee(iterable, n)):
> * * * * f = lambda it, i=i: (t[i] for t in it)
> * * * * iterators.append(f(iterator))
> * * return tuple(iterators)
>
> which seems to work:
>
> >>> data = [(1,2,3), (4,5,6), (7,8,9)]
> >>> a, b, c = split(data, 3)
> >>> list(a), list(b), list(c)

>
> ([1, 4, 7], [2, 5, 8], [3, 6, 9])
>
> Is this the right approach, or have I missed something obvious?



That looks about right to me.
It can be compacted a bit:

def split(iterable, n):
return tuple(imap(itemgetter(i), it) for i, it in
enumerate(tee(iterable, n)))

Internally, the tee's iterators are going to accumulate a ton of data
unless they are consumed roughly in parallel. Of course, if they are
consumed *exactly* in lockstep, the you don't need to split them into
separate iterables -- just use the tuples as they come.


Raymond

 
Reply With Quote
 
Paul Rubin
Guest
Posts: n/a
 
      11-06-2010
Steven D'Aprano <(E-Mail Removed)> writes:
> def split(iterable, n):
> iterators = []
> for i, iterator in enumerate(itertools.tee(iterable, n)):
> f = lambda it, i=i: (t[i] for t in it)
> iterators.append(f(iterator))
> return tuple(iterators)
>
> Is this the right approach, or have I missed something obvious?


I think there is no way around using tee. But the for loop looks ugly.
This looks more direct to me, if I didn't mess something up:

def split(iterable, n):
return tuple(imap(itemgetter(i),t) for i,t in enumerate(tee(iterable,n)))
 
Reply With Quote
 
Arnaud Delobelle
Guest
Posts: n/a
 
      11-06-2010
Steven D'Aprano <(E-Mail Removed)> writes:

> Suppose I have an iterator that yields tuples of N items (a, b, ... n).
>
> I want to split this into N independent iterators:
>
> iter1 -> a, a2, a3, ...
> iter2 -> b, b2, b3, ...
> ...
> iterN -> n, n2, n3, ...
>
> The iterator may be infinite, or at least too big to collect in a list.
>
> My first attempt was this:
>
>
> def split(iterable, n):
> iterators = []
> for i, iterator in enumerate(itertools.tee(iterable, n)):
> iterators.append((t[i] for t in iterator))
> return tuple(iterators)
>
> But it doesn't work, as all the iterators see the same values:
>
>>>> data = [(1,2,3), (4,5,6), (7,8,9)]
>>>> a, b, c = split(data, 3)
>>>> list(a), list(b), list(c)

> ([3, 6, 9], [3, 6, 9], [3, 6, 9])
>
>
> I tried changing the t[i] to use operator.itergetter instead, but no
> luck. Finally I got this:
>
> def split(iterable, n):
> iterators = []
> for i, iterator in enumerate(itertools.tee(iterable, n)):
> f = lambda it, i=i: (t[i] for t in it)
> iterators.append(f(iterator))
> return tuple(iterators)
>
> which seems to work:
>
>>>> data = [(1,2,3), (4,5,6), (7,8,9)]
>>>> a, b, c = split(data, 3)
>>>> list(a), list(b), list(c)

> ([1, 4, 7], [2, 5, 8], [3, 6, 9])
>
>
>
>
> Is this the right approach, or have I missed something obvious?


It is quite straightforward to implement your "split" function without
itertools.tee:

from collections import deque

def split(iterable):
it = iter(iterable)
q = [deque([x]) for x in it.next()]
def proj(qi):
while True:
if not qi:
for qj, xj in zip(q, it.next()):
qj.append(xj)
yield qi.popleft()
for qi in q:
yield proj(qi)

>>> data = [(1,2,3), (4,5,6), (7,8,9)]
>>> a, b, c = split(data)
>>> print list(a), list(b), list(c)

[1, 4, 7] [2, 5, 8] [3, 6, 9]

Interestingly, given "split" it is very easy to implement "tee":

def tee(iterable, n=2):
return split(([x]*n for x in iterable))

>>> a, b = tee(range(10), 2)
>>> a.next(), a.next(), b.next()

(0, 1, 0)
>>> a.next(), a.next(), b.next()

(2, 3, 1)

In fact, split(x) is the same as zip(*x) when x is finite. The
difference is that with split(x), x is allowed to be infinite and with
zip(*x), each term of x is allowed to be infinite. It may be good to
have a function unifying the two.

--
Arnaud
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
What makes an iterator an iterator? Steven D'Aprano Python 28 04-20-2007 03:34 AM
Difference between Java iterator and iterator in Gang of Four Hendrik Maryns Java 18 12-22-2005 05:14 AM
How to convert from std::list<T*>::iterator to std::list<const T*>::iterator? PengYu.UT@gmail.com C++ 6 10-30-2005 03:31 AM
Why does split operate over multiple lines in the absence of "ms" ? And why doesn't $_ work with split? Sara Perl Misc 6 04-12-2004 09:07 AM
Iterator doubts, Decision on Iterator usage greg C++ 6 07-17-2003 01:26 PM



Advertisments