Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Itertools

Reply
Thread Tools

Itertools

 
 
Ronald Legere
Guest
Posts: n/a
 
      07-29-2003
The new itertools stuff is pretty cool. One thing that bothers me though is
that
there seems to be no way to copy an iterator. How does one work around this?
Is
there a trick that I am missing?

Without being able to 'split' an iterator, I can't
think of how to do some of the cool things you can do with lazy lists in
Haskell. It seems
quite often you want to be able to do this 'splitting', and if I
had more coffee I would post an example The ones I
can think of are also defined recursively, which is another kettle of fish.

But I hope that someone knows
what I am talking about and has thought of a way...
Cheers!



 
Reply With Quote
 
 
 
 
Gonšalo Rodrigues
Guest
Posts: n/a
 
      07-29-2003
On Tue, 29 Jul 2003 08:32:14 -0400, "Ronald Legere"
<(E-Mail Removed)> wrote:

>The new itertools stuff is pretty cool. One thing that bothers me though is
>that
>there seems to be no way to copy an iterator. How does one work around this?
>Is
>there a trick that I am missing?
>
>Without being able to 'split' an iterator, I can't
>think of how to do some of the cool things you can do with lazy lists in
>Haskell. It seems
>quite often you want to be able to do this 'splitting', and if I
>had more coffee I would post an example The ones I
>can think of are also defined recursively, which is another kettle of fish.
>


Iterables can be more far general than lazy lists: think of a socket
as an iterable churning out lines of text, for example. How would one
go about copying these general iterables?

You can take slices of iterables, though - can't remember the name,
islice? But since you don't know how to take copies of iterables in
general this may not be that useful.

> But I hope that someone knows
>what I am talking about and has thought of a way...
>Cheers!
>


With my best regards,
G. Rodrigues
 
Reply With Quote
 
 
 
 
Heiko Wundram
Guest
Posts: n/a
 
      07-29-2003
Untested code that does basically this, but beware of any traps (read
and understand the code before trying to use it!!!), uses Python 2.3
functionality, but could be amended:

--- START OF CODE ---

import copy

class IDup(object):
def __init__(self,iterin):
self.__iter = iterin
self.__iterno = 0
self.__iteritems = []
self.__hasstopped = None

def registerIter():
iterno = self.__iterno
self.__iterno += 1
self.__iteritems.append([])

def getNext(self,iterno):
if self.__iteritems[iterno]:
iteritem = self.__iteritems[iterno].pop(0)
elif self.__hasstopped is not None:
raise self.__hasstopped
else:
try:
iteritem = self.__iter.next()
except StopIteration, e:
self.__hasstopped = e
raise
for id, i in enumerate(self.__iteritems):
if id <> iterno:
i.append(copy.deepcopy(iteritem))
return iteritem

class IDupped(object):
def __init__(self,idup):
self.__idup = idup
self.__iterno = idup.registerIter()

def next(self):
return self.__idup.getNext(self.__iterno)

def isplit(iterin,splitno=2):
idup = IDup(iterin)
iduppeds = []
for i in range(splitno):
iduppeds.append(IDupped(idup))
return tuple(iduppeds)

test = ["hello","how","are","you?"]
x, y = isplit(iter(test))

for i in x:
print i
for i in y:
print i

--- END OF CODE ---

Pitfalls:

1) You may not use the original iterator after it has been split.
2) Your iterator must return values which are copyable. If you wish for
the values to be truly equal (even in ID) or they are not copyable, then
just remove the copy.deepcopy from the code.

HTH!

Heiko.


 
Reply With Quote
 
Heiko Wundram
Guest
Posts: n/a
 
      07-29-2003
Just tested the code, amend as stated (remove the - lines, and add
the + lines):

--- START OF CODE ---

import copy

class IDup(object):
def __init__(self,iterin):
self.__iter = iterin
self.__iterno = 0
self.__iteritems = []
self.__hasstopped = None

- def registerIter():
+ def registerIter(self):
iterno = self.__iterno
self.__iterno += 1
self.__iteritems.append([])
+ return iterno

def getNext(self,iterno):
if self.__iteritems[iterno]:
iteritem = self.__iteritems[iterno].pop(0)
elif self.__hasstopped is not None:
raise self.__hasstopped
else:
try:
iteritem = self.__iter.next()
except StopIteration, e:
self.__hasstopped = e
raise
for id, i in enumerate(self.__iteritems):
if id <> iterno:
i.append(copy.deepcopy(iteritem))
return iteritem

class IDupped(object):
def __init__(self,idup):
self.__idup = idup
self.__iterno = idup.registerIter()

def next(self):
return self.__idup.getNext(self.__iterno)

+ def __iter__(self):
+ return self

def isplit(iterin,splitno=2):
idup = IDup(iterin)
iduppeds = []
for i in range(splitno):
iduppeds.append(IDupped(idup))
return tuple(iduppeds)

test = ["hello","how","are","you?"]
x, y = isplit(iter(test))

for i in x:
print i
for i in y:
print i

--- END OF CODE ---

This works as stated.

Heiko.


 
Reply With Quote
 
Beni Cherniavsky
Guest
Posts: n/a
 
      07-30-2003
Ronald Legere wrote on 2003-07-29:

> The new itertools stuff is pretty cool. One thing that bothers me
> though is that there seems to be no way to copy an iterator. How
> does one work around this? Is there a trick that I am missing?
>
> Without being able to 'split' an iterator, I can't think of how to
> do some of the cool things you can do with lazy lists in Haskell. It
> seems quite often you want to be able to do this 'splitting', and if
> I had more coffee I would post an example The ones I can think of
> are also defined recursively, which is another kettle of fish.
>

Strange, the need seems to come up more freqeuntly since I implemented
it . Or is it just me paying more attention to it? After a couple
of design iterations, I've got:

http://www.technion.ac.il/~cben/python/streams.py

[NEWS: The last changes were addition of __nonzero__ methods, clued by
discussion on c.l.py <wink>, and a `.force(count=None)` method to
force computing values.]

I concluded that it's best to separate the two abstractions, providing
ability to convert both ways:

- An iterable immutable (but lazily computed) linked list, with O(1)
tail access and ability to cons in front of it.

- An iterator object whose `.next()` method is necessarily
destructive, on another hand. Turns out the most flexible
implementation of the iterator is simply a pointer to a linked list;
this way you can split the iterator and even "unyield" values back
onto it. See the module for more details.

In general it's most clean to separate iterables from iterators.
Iterables should be changed by iterating.

Any feedback welcome. I'd like to make it as Pythonic as possible.
An perhaps make it standard. Particular open questions:

- Terminology. There are too many names for the same things .

- The linked lists are called "streams" by me because they are lazy
and that's what such beasts are called in functional languages.
But I still have many refernces to them as linked lists,
especially w.r.t. the `Cons` class which happens to be the base
class of `Stream`. So `Stream` is defined as a lazy linked list
node. Should I try to squash refences to "linked lists", treating
them as "stream nodes which are already computed" to reduce
confusion with Python lists?

- car/cdr vs. first/rest vs. head/tail. I chose head/tail.
car/cdr would be opaque to people without lisp background.
first/rest are of different length .

- `Cons` still requires lisp background but I don't know any name
that would be recognizable to non-lispers anyway. And it's not
a bad name.

- Is `StreamIter` the best name?

- I called the operation for creating new iterator from same place
"forking". Alternative names: `split`, `copy`, `clone`, `dup`,
"lookahead", etc. What's best?

- Consing in front of a StreamIter's stream is called `.unyeild`.
The point is that it's the opposite of the ``yield`` statement
but the name is a brave new invention. Alternatives: `prev`
(antonym of `.next()`, probably misguided since it doesn't
return the previous value but rather takes it from you),
`pushback`, `unread`, `stuff`, etc.

- Functionality:

- Should the operations (`.head`/`.tail`, indexing) on the
underlying stream be proxied by `StreamIter` or should the
programmer have to spell out e.g. ``iterator.stream.head`` as now?

- What to do about __repr__ and __str__? The streams could well be
infinite and forcing lookahead for even a few values is
unacceptable as this could be expensive.

- What to do about __hash__ and __eq__?

- Should I weakly memoize the `Stream` for each given iterable?
That would reduce the risk of wrapping a destructive iterator
twice with distinct streams, so that each produced item is seen by
only one stream, with access-order-dependant interleaving,
violating the functional intent of the laziness. OTOH, this would
make the programmer less aware of the risk. Besides, I'm not sure
what to return on the second attempt to wrap same iterator after
some of it has already been consumed since the first time.

- Would some convenience operations for lookahead, e.g.
`.startswith()` be useful?

- Anything else anybody needs?

--
Beni Cherniavsky <(E-Mail Removed)>

Put a backslash at the evening to continue hacking onto the next day.

 
Reply With Quote
 
Mike Rovner
Guest
Posts: n/a
 
      07-30-2003
Beni Cherniavsky wrote:
> Any feedback welcome. I'd like to make it as Pythonic as possible.


> An perhaps make it standard. Particular open questions:


> - The linked lists are called "streams" by me because they are lazy
> and that's what such beasts are called in functional languages.


Stream is used differently in Tcl and Unix.
Minded separation Python from functional approach (deprecation of filter,
map;
doubtfulness of implementing tail recursion optimization) probably it is not
so
inspiring for regular python programmer.

Lazy "streams" are a good thing provided they can be easily used for
organizing data pathes and connections.

> - `Cons` still requires lisp background but I don't know any name
> that would be recognizable to non-lispers anyway. And it's not
> a bad name.



Construct, Glue, Merge?
Taken readability any whole word is better.

> - I called the operation for creating new iterator from same place
> "forking". Alternative names: `split`, `copy`, `clone`, `dup`,
> "lookahead", etc. What's best?


copy is already using construct in python. Why not use it?

> - Anything else anybody needs?


If I got the intend of Stream correctly, I wish use them as Tcl streams,
so I'd need combining and filtering and i/o.

Just 0.02

Mike




 
Reply With Quote
 
Beni Cherniavsky
Guest
Posts: n/a
 
      07-31-2003
Mike Rovner wrote on 2003-07-30:

> Beni Cherniavsky wrote:
> > Any feedback welcome. I'd like to make it as Pythonic as possible.

>
> > An perhaps make it standard. Particular open questions:

>
> > - The linked lists are called "streams" by me because they are lazy
> > and that's what such beasts are called in functional languages.

>
> Stream is used differently in Tcl and Unix. Minded separation
> Python from functional approach (deprecation of filter, map;
> doubtfulness of implementing tail recursion optimization) probably
> it is not so inspiring for regular python programmer.
>

Point taken. But what do you call it then? Lazy linked lists has
only one single-word name that I'm aware of and that is "streams";
there is no name for it in Tcl and Unix becauit is almost never used
in them, as far as I know.

> Lazy "streams" are a good thing provided they can be easily used for
> organizing data pathes and connections.
>

Python already provides the minimal construct for connecting producers
and consumers of data: the iterator protocol. It is minimalistic and
becomes inconvenient when you want to use the same value more than one
time, for lookahead, backtracking or simply connecting the same source
to many consumers. All this is quite easy with the head/tail access
to streams; it's a very powerfull abstraction. Since streams are
implemented as linked lists, they don't leak memory for infinite
iterators, unless you keep a reference to a fixed position in the
stream. Over streams I've built an iterator which also provides the
same powers but more in line with Python's iterator protocol.

> > - `Cons` still requires lisp background but I don't know any name
> > that would be recognizable to non-lispers anyway. And it's not
> > a bad name.

>
> Construct, Glue, Merge?
> Taken readability any whole word is better.
>

`Construct` is too long. Compare with `str`, `int`, `dict` rather
than `string`, `integer` and `dictionary`.

How would `Glue` and `Merge` be meaningful here? The only
associasions of "glue" are what TeX puts between boxes and "glue
layers" between different languages or libraries. "Merge" sounds like
the process of taking several sequences of values and interlevaing
them in some way (like in merge sort; BTW, it needs a lookahead of 1
for each stream, so it should be convenient to implement with
streams). But this has nothing to do with what `Cons` does: creating
a new stream by prepending given head to given tail. `Cons` is the
only term I know of that unabigously refers to this operation, again
coming from functional languages <wink>.

There is another possibility: rename `Cons` to `Stream` and `Stream`
to `LazyStream`; `Stream` would then be a factory function creating
either depending on whether it got 1 or two arguments. Would this be
better?

> > - I called the operation for creating new iterator from same place
> > "forking". Alternative names: `split`, `copy`, `clone`, `dup`,
> > "lookahead", etc. What's best?

>
> copy is already using construct in python. Why not use it?
>

Copy was adviced against by Erik Max Francis, see:

http://groups.google.com/groups?selm...%40alcyone.com

However, it really copies the iterator, it's just very fast. So I
think I will indeed name this `.copy()` (aliased as `.__copy__`).

> > - Anything else anybody needs?

>
> If I got the intend of Stream correctly, I wish use them as Tcl streams,
> so I'd need combining and filtering and i/o.
>

You can use a `Stream` or `StreamIter` over a file object to get
lookahead and mulitple passes over it, without requiring the file to
be seekable! That's one of the main uses. It also gives you ability
to "unread" any amount of data. Note that if you simply apply it to a
file the stream operates on a line at a time; you will need to wrap
the file with a custom generator if you wish to operate at character
level; working at token level might be the best compromise.

I'm not familiar with Tcl streams, so I don't understand the
"combining and filtering" reference - can you elaborate?

--
Beni Cherniavsky <(E-Mail Removed)>

Put a backslash at the evening to continue hacking onto the next day.

 
Reply With Quote
 
Mike Rovner
Guest
Posts: n/a
 
      08-08-2003
Beni Cherniavsky wrote:
> Mike Rovner wrote on 2003-07-30:


I'm sorry for taking so long to answer (I had a project deadline to meet
- done).

>> Beni Cherniavsky wrote:

> Point taken. But what do you call it then? Lazy linked lists has
> only one single-word name that I'm aware of and that is "streams";
> there is no name for it in Tcl and Unix becauit is almost never used
> in them, as far as I know.


I made an intensive search on internet and convinced now that term 'streams'
is globaly associated with i/o. I still think using this term for universal
data
structure will be misleading. However I can't come with a good word
(my best is LazyList).

>>> - `Cons` still requires lisp background but I don't know any
>>> name that would be recognizable to non-lispers anyway. And
>>> it's not a bad name.

>>
>> Construct, Glue, Merge?
>> Taken readability any whole word is better.
>>

> `Construct` is too long. Compare with `str`, `int`, `dict` rather
> than `string`, `integer` and `dictionary`.


But they all are (long-lived) built-ins. AFAIK now there is a tendency
to make language more readable (isinstance, enumerate to name a few
hmm recent additions)

> How would `Glue` and `Merge` be meaningful here? The only
> associasions of "glue" are what TeX puts between boxes and "glue
> layers" between different languages or libraries. "Merge" sounds like
> the process of taking several sequences of values and interlevaing
> them in some way (like in merge sort; BTW, it needs a lookahead of 1
> for each stream, so it should be convenient to implement with
> streams). But this has nothing to do with what `Cons` does: creating
> a new stream by prepending given head to given tail. `Cons` is the
> only term I know of that unabigously refers to this operation, again
> coming from functional languages <wink>.
>
> There is another possibility: rename `Cons` to `Stream` and `Stream`
> to `LazyStream`; `Stream` would then be a factory function creating
> either depending on whether it got 1 or two arguments. Would this be
> better?


No. In discussion about list.insert() and/or range() difference in behaivor
dependent on args was prononced A Bad Thing (IIRC).

>>> - Anything else anybody needs?

>>
>> If I got the intend of Stream correctly, I wish use them as Tcl
>> streams, so I'd need combining and filtering and i/o.
>>

> You can use a `Stream` or `StreamIter` over a file object to get
> lookahead and mulitple passes over it, without requiring the file to
> be seekable! That's one of the main uses. It also gives you ability
> to "unread" any amount of data. Note that if you simply apply it to a
> file the stream operates on a line at a time; you will need to wrap
> the file with a custom generator if you wish to operate at character
> level; working at token level might be the best compromise.
>
> I'm not familiar with Tcl streams, so I don't understand the
> "combining and filtering" reference - can you elaborate?


I didn't touch tcl for several years
But the idea is pretty simple: to pass file object throw 'filter' to get
'another' file(-like) object.

import compress as Z
for line in tarfile.open('-','r',Z.open(filename)).extractiter()

instead of

stream = tarfile.open('-','r',Z.open(filename))
for elem in tarfile.open('-','r',Z.open(filename)).getmembers():
for line in stream.extract(elem):
...

or something like that. I admit that later example based on new 2.3 syntax
fulfill my (long being) expectations quite well.

Thanks,
Mike




 
Reply With Quote
 
Ng Pheng Siong
Guest
Posts: n/a
 
      08-08-2003
According to Mike Rovner <(E-Mail Removed)>:
> I made an intensive search on internet and convinced now that term 'streams'
> is globaly associated with i/o. I still think using this term for universal
> data
> structure will be misleading. However I can't come with a good word
> (my best is LazyList).


Once upon a time, in the days of 386BSD, certain people claimed to be
reinnovating Sys V streams for 386BSD, which were to be called "currents".

Somehow this episode stuck in my mind. (I think it was the sound of a
balloon bursting from too much hot air.


--
Ng Pheng Siong <(E-Mail Removed)>

http://firewall.rulemaker.net -+- Manage Your Firewall Rulebase Changes
http://www.post1.com/home/ngps -+- Open Source Python Crypto & SSL
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Good use for itertools.dropwhile and itertools.takewhile Nick Mellor Python 35 12-06-2012 09:29 PM
Fate of itertools.dropwhile() and itertools.takewhile() Raymond Hettinger Python 17 02-18-2008 12:39 PM
Anything like python's itertools.chain()? workusenet1@brianhv.org C++ 0 06-09-2005 03:08 PM
wishlist item: itertools.partition (WAS: Wishlist item: itertools.flatten) Steven Bethard Python 0 03-12-2005 07:50 PM
itertools.take? Dan Williams Python 1 07-15-2003 03:05 AM



Advertisments