Velocity Reviews > Ordered Sets

# Ordered Sets

pataphor
Guest
Posts: n/a

 03-26-2009
Raymond Hettinger wrote:

> Right. Here's a link to a weakref version (though I think the
> previous __del__ version also does the trick):
>
> http://code.activestate.com/recipes/576696/

problem? By the way collections.MutableSet needs python >= 2.6

P.

import collections

class OrderedSet(collections.MutableSet):
'Set that remembers the order elements were added'
# Big-O running times for all methods are the same as for regular sets.

def __init__(self, iterable=None):
self.__map = {} # key --> [prev,key,next]
if iterable is not None:
self |= iterable

def __len__(self):
return len(self.__map)

def __contains__(self, key):
return key in self.__map

if not self.__map:
self.start = key
self.__map = {key: [key,key,key]}
elif key not in self.__map:
a,b,c = self.__map[self.start]
self.__map[a][2] = key
self.__map[b][0] = key
self.__map[key] = [a,key,b]

if key in self.__map:
a,b,c = self.__map.pop(key)
if self.__map:
self.__map[a][2] = c
self.__map[c][0] = a
if b == self.start:
self.start = c

def __iter__(self):
if self.__map:
a,b,c = self.__map[self.start]
while c is not self.start:
yield b
a,b,c = self.__map[c]
yield b

def __reversed__(self):
if self.__map:
a,b,c = self.__map[self.start]
while a is not self.start:
yield a
a,b,c = self.__map[a]
yield a

def pop(self, last=True):
if not self:
raise KeyError('set is empty')
key = next(reversed(self)) if last else next(iter(self))
return key

def __repr__(self):
if not self:
return '%s()' % (self.__class__.__name__,)
return '%s(%r)' % (self.__class__.__name__, list(self))

def __eq__(self, other):
if isinstance(other, OrderedSet):
return len(self) == len(other) and list(self) == list(other)
return not self.isdisjoint(other)

def test():
D = OrderedSet('abcd')
R = OrderedSet()
for x in list(D):
print(D,R)
print(D,R)

if __name__ == '__main__':
test()

Gabriel Genellina
Guest
Posts: n/a

 03-28-2009
En Thu, 26 Mar 2009 12:20:17 -0300, Scott David Daniels
<(E-Mail Removed)> escribió:

> (2) Why, oh why, do people feel so comforted adding double_underscores
> to data structures? If I want to inherit from your mapping in order
> to log the attempts to discard a key not actually in the set, I
> have to either us the nasty name mangling to get at self.__map, or
> name my subclass OrderedSet and take advantage of a not-guaranteed
> name mangling. What on earth makes you unwilling to go with "_map"
> and credit your code's consumers with a bit of common sense?
>
> Sorry, this latter rant has been building up for a while (spurred on by
> a sojourn into the unittest code where they have the private disease in

My commiserations.

--
Gabriel Genellina

Aahz
Guest
Posts: n/a

 03-29-2009
In article <01d457aa\$0\$17208\$(E-Mail Removed)>,
Steven D'Aprano <(E-Mail Removed)> wrote:
>On Fri, 20 Mar 2009 11:50:28 -0700, Scott David Daniels wrote:
>> Raymond Hettinger wrote:
>>> [Aahz]
>>>>
>>>> The doubly-linked list part is what's sick and perverted.
>>>
>>> The doubly-linked list part is what gives it big-oh running times the
>>> same as regular sets. If the underlying sequence is stored as a list,
>>> then deletion goes from O(1) to O(n). If the insertion times are
>>> recorded in a dictionary, then the insertion order has to be sorted
>>> prior to iteration which makes iteration go from O(n) to O(n log n).

>>
>> The double-linked list part could be done with weakrefs and perhaps Aahz
>> would relax a bit.

>
>I'm not sure whether Aahz's description is meant to be taken seriously,
>or if there is a smiley hidden in his post. After all, doubly-linked
>lists are a standard building block in data structure design.

It's a half-smiley. I find the trick of using a Python list to store the
doubly-linked list difficult to understand (as opposed to the usual
mechanism of a node class). I understand why it was done (time and space
efficiency), but I also still feel emotionally that it's somewhat sick
and perverted. I probably would feel more comfortable if the
doubly-linked list were abstracted out and commented. Heck, the whole
thing needs unit tests.

I needed to look up the code again, so I might as well repeat the URL to
make it handy for everyone else:

http://code.activestate.com/recipes/576694/
--
Aahz ((E-Mail Removed)) <*> http://www.pythoncraft.com/

"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are, by
definition, not smart enough to debug it." --Brian W. Kernighan

CTO
Guest
Posts: n/a

 03-29-2009
On Mar 28, 3:33*am, "Gabriel Genellina" <(E-Mail Removed)>
wrote:
> En Thu, 26 Mar 2009 12:20:17 -0300, Scott David Daniels *
> <(E-Mail Removed)> escribió:
>
> > (2) Why, oh why, do people feel so comforted adding double_underscores
> > * * *to data structures?

Probably because other authors feel too comfortable using single
underscores for
things the end user should mess with, as in namedtuple.

pataphor
Guest
Posts: n/a

 03-30-2009
On Mon, 30 Mar 2009 03:30:04 -0500
Nick Craig-Wood <(E-Mail Removed)> wrote:

> >>> class Node(object):

> ... __slots__ = ["prev", "next", "this"]
> ... def __init__(self, prev, next, this):
> ... self.prev = prev
> ... self.next = next
> ... self.this = this

[...]

> So the Node class actually takes less memory 38 Mbytes vs 53 Mbytes
> for the list.

Well OK, but if we're going to start optimizing, it seems we don't need
to store the key itself in the Node or the list. Here's a version of my
script that stores only prev and next. Another twist is that it is now
possible to arbitrarily set the starting point for the iteration. (Just
in case someone needed a cue for further ranting)

P.

import collections

class OrderedSet(collections.MutableSet):
'Set that remembers the order elements were added'
# Big-O running times for all methods are the same as for regular
sets.
def __init__(self, iterable=None):
self.links = {} # key --> [prev,next]
if iterable is not None:
self |= iterable

def __len__(self):

def __contains__(self, key):

if not self:
self.start = key
start = self.start

if key == self.start:
self.start = next
else:
del self.start

def __iter__(self):
start = self.start
yield start
while next != start:
yield next

def __reversed__(self):
start = self.start
while prev != start:
yield prev
yield start

def pop(self, last=True):
if not self:
raise KeyError('set is empty')
key = next(reversed(self)) if last else next(iter(self))
return key

def __repr__(self):
if not self:
return '%s()' % (self.__class__.__name__,)
return '%s(%r)' % (self.__class__.__name__, list(self))

def __eq__(self, other):
if isinstance(other, OrderedSet):
return len(self) == len(other) and list(self) == list(other)
return not self.isdisjoint(other)

def test():
D = OrderedSet('abcd')
R = OrderedSet()
for x in list(D):
print(D,R)
print(D,R)
print()
R.start = 'c'
print(list(reversed(R)))

if __name__ == '__main__':
test()

pataphor
Guest
Posts: n/a

 03-31-2009
On Mon, 30 Mar 2009 19:57:39 -0700 (PDT)
Alex_Gaynor <(E-Mail Removed)> wrote:

> I really like the Ordered Set class(I've been thinking about one ever
> since ordered dict hit the std lib), is there any argument against
> adding one to the collections module? I'd be willing to write a PEP
> up for it.

Suppose the implementation would use a circular linked list. Then the
iteration could start from any point, the thing is symmetric after all.
But this would also mean we could add items to the left of that new
starting point, since that would now be the 'end' of the circle. This
is something different from merely remembering insertion order. How do

P.

pataphor
Guest
Posts: n/a

 03-31-2009
On Tue, 31 Mar 2009 10:33:26 +0200
pataphor <(E-Mail Removed)> wrote:

> On Mon, 30 Mar 2009 19:57:39 -0700 (PDT)
> Alex_Gaynor <(E-Mail Removed)> wrote:
>
> > I really like the Ordered Set class(I've been thinking about one
> > ever since ordered dict hit the std lib), is there any argument
> > against adding one to the collections module? I'd be willing to
> > write a PEP up for it.

>
> Suppose the implementation would use a circular linked list. Then the
> iteration could start from any point, the thing is symmetric after
> all. But this would also mean we could add items to the left of that
> new starting point, since that would now be the 'end' of the circle.
> This is something different from merely remembering insertion order.
> How do you feel about that?

def move(self,key1,key2):
#self ==> key1,(key2 ... end), (key1+1... key2-1)
if set([key1,key2]) and self :
start = self.start

This takes [key2:] (isn't pseudo slice notation wonderful?) and
inserts it after key1.

for example:

R = OrderedSet(range(10))
print(list(R))
R.move(3,7)
print(list(R))

gives:

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[0, 1, 2, 3, 7, 8, 9, 4, 5, 6]

All in O(1) of course.

P.

pataphor
Guest
Posts: n/a

 03-31-2009
On Tue, 31 Mar 2009 06:03:13 -0700 (PDT)
Alex_Gaynor <(E-Mail Removed)> wrote:

> My inclination would be to more or less *just* have it implement the
> set API, the way ordered dict does in 2.7/3.1.

two key variables: The iterator start position and the underlying map.
There is no need for more than basic set API since people can use
those two variables to subclass their own iterators.

P.