Velocity Reviews > need help on generator...

# need help on generator...

Francis Girard
Guest
Posts: n/a

 01-26-2005
Le mardi 25 Janvier 2005 19:52, Steven Bethard a écrit*:

Thank you Nick and Steven for the idea of a more generic imerge.

To work with the Hamming problem, the imerge function _must_not_ allow
duplicates and we can assume all of the specified iteratables are of the same
size, i.e. infinite !

Therefore, Nick's solution fulfills the need. But it is admittedly confusing
to call the function "imerge" when it doesn't merge everything (including
duplicates). Anyway both solution sheds new light and brings us a bit
farther.

That's the beauty of many brains from all over the world collabarating.
Really, it makes me emotive thinking about it.

For the imerge function, what we really need to make the formulation clear is
a way to look at the next element of an iteratable without consuming it. Or
else, a way to put back "consumed" elements in the front an iteration flow,
much like the list constructors in FP languages like Haskell.

It is simple to encapsulate an iterator inside another iterator class that
would do just that. Here's one. The additional "fst" method returns the next
element to consume without consuming it and the "isBottom" checks if there is
something left to consume from the iteration flow, without actually consuming
anything. I named the class "IteratorDeiterator" for lack of imagination :

-- BEGIN SNAP
class IteratorDeiterator:
def __init__(self, iterator):
self._iterator = iterator.__iter__()
self._firstVal = None ## Avoid consuming if not requested from outside
## Works only if iterator itself can't return None

def __iter__(self): return self

def next(self):
valReturn = self._firstVal
if valReturn is None:
valReturn = self._iterator.next()
self._firstVal = None
return valReturn

def fst(self):
if self._firstVal is None:
self._firstVal = self._iterator.next()
return self._firstVal

def isBottom(self):
if self._firstVal is None:
try: self._firstVal = self._iterator.next()
except StopIteration:
return True
return False
-- END SNAP

Now the imerge_infinite which merges infinite lists, while allowing
duplicates, is quite straightforward :

-- BEGIN SNAP
def imerge_infinite(*iterators):
vItTopable = [IteratorDeiterator(it) for it in iterators]
while vItTopable:
yield reduce(lambda itSmallest, it:
itSmallest.fst() > it.fst() and it or itSmallest,
vItTopable).next()
-- END SNAP

To merge finite lists of possibly different lengths is a bit more work as you
have to eliminate iterators that reached the bottom before the others. This
is quite easily done by simply filtering them out.
The imerge_finite function below merges "lists" of possibly different sizes.

-- BEGIN SNAP
def imerge_finite(*iterators):
vItDeit = [IteratorDeiterator(it) for it in iterators]
vItDeit = filter(lambda it: not it.isBottom(), vItDeit)
while vItDeit:
yield reduce(lambda itSmallest, it:
itSmallest.fst() > it.fst() and it or itSmallest,
vItDeit).next()
vItDeit = filter(lambda it: not it.isBottom(), vItDeit)
-- END SNAP

Now, we want versions of these two merge functions that do not allow
duplicates. Building upon what we've already done in a semi FP way, we just
filter out the duplicates from the iteration streams returned by the above
functions. The job is the same for the infinite and finite versions, hence
the imerge_nodup generic function.

-- BEGIN SNAP
def imerge_nodup(fImerge, *iterators):
it = fImerge(*iterators)
el = it.next()
yield el
while True:
el = dropwhile(lambda _next: _next == el, it).next()
yield el

imerge_inf_nodup = \
lambda *iterators: imerge_nodup(imerge_infinite, *iterators)
imerge_finite_nodup = \
lambda *iterators: imerge_nodup(imerge_finite, *iterators)
-- END SNAP

I used the lambda notation for imerge_inf_nodup and imerge_finite_nodup to
avoid another useless for-yield loop. Here the notation really just express
function equivalence with a bounded argument (it would be great to have a
notation for this in Python, i.e. binding a function argument to yield a new
function).

The merge function to use with hamming() is imerge_inf_nodup.

Regards,

Francis Girard
FRANCE

> Nick Craig-Wood wrote:
> > Steven Bethard <(E-Mail Removed)> wrote:
> >> Nick Craig-Wood wrote:
> >>>imerge taking any number of arguments will look neater, eg
> >>>
> >>>def imerge(*generators):
> >>> values = [ g.next() for g in generators ]
> >>> while True:
> >>> x = min(values)
> >>> yield x
> >>> for i in range(len(values)):
> >>> if values[i] == x:
> >>> values[i] = generators[i].next()
> >>
> >> This kinda looks like it dies after the first generator is exhausted,
> >> but I'm not certain.

> >
> > Yes it will stop iterating then (rather like zip() on lists of unequal
> > size). Not sure what the specification should be! It works for the
> > hamming problem though.

>
> Actually, it stops iterating on lists of equal size too:
>
> py> def imerge(*iterators):
> ... iterators = [iter(i) for i in iterators]
> ... values = [i.next() for i in iterators]
> ... while True:
> ... x = min(values)
> ... yield x
> ... for i, val in enumerate(values):
> ... if val == x:
> ... values[i] = iterators[i].next()
> ...
> py> list(imerge([1, 4, 7], [2, 5, 8], [3, 6, 9]))
> [1, 2, 3, 4, 5, 6, 7]
>
> Note that 8 and 9 are not in the list.
>
> Steve

Steven Bethard
Guest
Posts: n/a

 01-26-2005
Francis Girard wrote:
> For the imerge function, what we really need to make the formulation clear is
> a way to look at the next element of an iteratable without consuming it. Or
> else, a way to put back "consumed" elements in the front an iteration flow,
> much like the list constructors in FP languages like Haskell.
>
> It is simple to encapsulate an iterator inside another iterator class that
> would do just that. Here's one. The additional "fst" method returns the next
> element to consume without consuming it and the "isBottom" checks if there is
> something left to consume from the iteration flow, without actually consuming
> anything. I named the class "IteratorDeiterator" for lack of imagination :
>

[snip]

Another implementation is my peekable class:

http://aspn.activestate.com/ASPN/Coo.../Recipe/304373

It allows you to peek several elements ahead if that's necessary.

Steve

Nick Craig-Wood
Guest
Posts: n/a

 01-27-2005
Francis Girard <(E-Mail Removed)> wrote:
> Thank you Nick and Steven for the idea of a more generic imerge.

You are welcome [It came to me while walking the children to school!]

[snip]
> class IteratorDeiterator:
> def __init__(self, iterator):
> self._iterator = iterator.__iter__()
> self._firstVal = None ## Avoid consuming if not requested from outside
> ## Works only if iterator itself can't return None

You can use a sentinel here if you want to avoid the "can't return
None" limitation. For a sentinel you need an object your iterator
couldn't possibly return. You can make one up, eg

self._sentinel = object()
self._firstVal = self._sentinel

Or you could use self (but I'm not 100% sure that your recursive
functions wouldn't return it!)

> def __iter__(self): return self
>
> def next(self):
> valReturn = self._firstVal
> if valReturn is None:

and
if valReturn is self._sentinel:

> valReturn = self._iterator.next()
> self._firstVal = None

self._firstVal = self._sentinel

etc..

[snip more code]

Thanks for some more examples of fp-style code. I find it hard to get
my head round so its been good exercise!
--
Nick Craig-Wood <(E-Mail Removed)> -- http://www.craig-wood.com/nick

Francis Girard
Guest
Posts: n/a

 01-27-2005
Le jeudi 27 Janvier 2005 10:30, Nick Craig-Wood a Ã©critÂ*:
> Francis Girard <(E-Mail Removed)> wrote:
> > Thank you Nick and Steven for the idea of a more generic imerge.

>
> You are welcome [It came to me while walking the children to school!]
>

Sometimes fresh air and children purity is all what it takes. Much better than
coffee, cigarrette and flat screen.

> [snip]
>
> > class IteratorDeiterator:
> > def __init__(self, iterator):
> > self._iterator = iterator.__iter__()
> > self._firstVal = None ## Avoid consuming if not requested from
> > outside ## Works only if iterator itself can't return None

>
> You can use a sentinel here if you want to avoid the "can't return
> None" limitation. For a sentinel you need an object your iterator
> couldn't possibly return. You can make one up, eg
>

Great idea. I'll use it.

> self._sentinel = object()
> self._firstVal = self._sentinel
>
> Or you could use self (but I'm not 100% sure that your recursive
> functions wouldn't return it!)
>
> > def __iter__(self): return self
> >
> > def next(self):
> > valReturn = self._firstVal
> > if valReturn is None:

>
> and
>
> if valReturn is self._sentinel:
> > valReturn = self._iterator.next()
> > self._firstVal = None

>
> self._firstVal = self._sentinel
>
> etc..
>
> [snip more code]
>
> Thanks for some more examples of fp-style code. I find it hard to get
> my head round so its been good exercise!

Introduction to functional programming
Prentice Hall
1988

This is the very best intro I ever read. The book is without hype, doesn't
show its age and is language neutral.
Authors are world leaders in the field today. Only really strong guys have the
kindness to do understandable introductions without trying to hide the
difficulties (because they are strong enough to face them with simplicity).

It's been a real pleasure.

Regards,

Francis Girard
FRANCE

> --
> Nick Craig-Wood <(E-Mail Removed)> -- http://www.craig-wood.com/nick

Paul Rubin
Guest
Posts: n/a

 01-28-2005
Francis Girard <(E-Mail Removed)> writes:
> Thank you Nick and Steven for the idea of a more generic imerge.

If you want to get fancy, the merge should use a priority queue (like
in the heapsort algorithm) instead of a linear scan through the
incoming iters, to find the next item to output. That lowers the
running time to O(n log k) instead of O(n*k), where k is the number of
iterators and n is the length.

Michael Spencer
Guest
Posts: n/a

 01-28-2005
Paul Rubin wrote:
> Francis Girard <(E-Mail Removed)> writes:
>
>>Thank you Nick and Steven for the idea of a more generic imerge.

>
>
> If you want to get fancy, the merge should use a priority queue (like
> in the heapsort algorithm) instead of a linear scan through the
> incoming iters, to find the next item to output. That lowers the
> running time to O(n log k) instead of O(n*k), where k is the number of
> iterators and n is the length.

I looked at a heapq solution but didn't see any clean way of dealing with
multiple iterators having equal values. The dict solution below deals cleanly
with that, since one key can be shared by any number of iterators. Extracting
the minimum, and the associated iterables is fast, but the overall solution is
still slower than the brute force approach for the 3 hamming iterators.

>>> def imerge(*iterables):

... cache = {}
... iterators = map(iter,iterables)
... number = len(iterables)
... exhausted = 0
... while 1:
# First time through, looked at all of them
# Subsequently, update only the minimum iterators
... for it in iterators:
... try:
# Key each iterator by its next() value
# Multiple iterators may share the same key
... cache.setdefault(it.next(),[]).append(it)
... except StopIteration:
... exhausted += 1
... if exhausted == number:
... raise StopIteration
# Get the lowest value
... val = min(cache)
# and all the iterators that have that value
... iterators = cache.pop(val)
... yield val

>>> list(imerge([1, 2, 3, 6], [1, 2, 3, 7], [1, 2, 3, 4, 5]))

[1, 2, 3, 4, 5, 6, 7]
>>>

Michael

Francis Girard
Guest
Posts: n/a

 01-28-2005
Le vendredi 28 Janvier 2005 08:27, Paul Rubin a écrit*:
> Francis Girard <(E-Mail Removed)> writes:
> > Thank you Nick and Steven for the idea of a more generic imerge.

>
> If you want to get fancy, the merge should use a priority queue (like
> in the heapsort algorithm) instead of a linear scan through the
> incoming iters, to find the next item to output. That lowers the
> running time to O(n log k) instead of O(n*k), where k is the number of
> iterators and n is the length.

The goal was only to show some FP construct on small problems. Here the number
of iterators are intended to be fairly small.

Otherwise, yes, you could exploit the fact that, at one loop execution, you
already seen most of the elements at previous loop execution, storing them in
some heap structure and therefore not having to recan the whole list of
"already seen" iterator values at each loop execution.

Thank you

Francis Girard