![]() |
|
|
|||||||
![]() |
Python - Proposed implementation for an Ordered Dictionary |
|
|
Thread Tools | Search this Thread |
|
|
#1 |
|
Here's a proposed implementation for Py2.7 and Py3.1:
http://code.activestate.com/recipes/576669/ Would like you guys to kick the tires, exercise it a bit, and let me know what you think. The recipe runs under 2.6 and 3.0 without modification so it should be easy to play with. The main virtue of the proposed API is a near-zero learning curve. The API is a same as for regular dictionaries but it remembers the insertion order and used that for iteration, repr, etc. Raymond Raymond Hettinger |
|
|
|
|
#2 |
|
Posts: n/a
|
Raymond Hettinger <python <at> rcn.com> writes:
> > Here's a proposed implementation for Py2.7 and Py3.1: > > http://code.activestate.com/recipes/576669/ > > Would like you guys to kick the tires, exercise it a bit, and let me > know what you think. The recipe runs under 2.6 and 3.0 without > modification so it should be easy to play with. Why not just inherit from collections.MutableMapping? Benjamin Peterson |
|
|
|
#3 |
|
Posts: n/a
|
Raymond Hettinger <> writes:
> Here's a proposed implementation for Py2.7 and Py3.1: > > http://code.activestate.com/recipes/576669/ Several methods like __contains__() and __getitem__() are not overridden, so their performance is just as fast as a regular dictionary. Methods like __setitem__ and __delitem__ are overridden but have a fast path depending on whether or not the key already exists. It seems that __delitem__ of an existing key is O(n), whereas it's amortized constant time for dicts. (__setitem__ is constant time for both.) Is there a way to avoid this? If not, it should probably be documented, since it differs from dict. Hrvoje Niksic |
|
|
|
#4 |
|
Posts: n/a
|
[Benjamin Peterson]
> Why not just inherit from collections.MutableMapping? It makes the recipe shorter to inherit some methods from dict. Also, subclassing from dict gives a speedup for __getitem__(), __len__(), and get(). Raymond Hettinger |
|
|
|
#5 |
|
Posts: n/a
|
[Hrvoje Niksic]
> It seems that __delitem__ of an existing key is O(n), whereas it's > amortized constant time for dicts. *(__setitem__ is constant time for > both.) *Is there a way to avoid this? I don't see any fast/clean way. It's possible to tracking pending deletions and do them all at once but that's a bit messy and slow. > If not, it should probably be documented, since it differs from dict. That makes sense. Raymond Raymond Hettinger |
|
|
|
#6 |
|
Posts: n/a
|
Raymond Hettinger <> writes:
> I don't see any fast/clean way. It's possible to tracking pending > deletions and do them all at once but that's a bit messy and slow. What about using a second dictionary (indexed by the incrementing counter) instead of a list to record the insertion order? Then you have O(1) deletion, and traversal takes an O(n log n) sorting operation. Paul Rubin |
|
|
|
#7 |
|
Posts: n/a
|
[Paul Rubin]
> What about using a second dictionary (indexed by the incrementing > counter) instead of a list to record the insertion order? *Then you > have O(1) deletion, and traversal takes an O(n log n) sorting > operation. My first attempt used exactly that approach and it works fine though it does complicate the code quite a bit and slows down all of the common operations by a constant factor. Raymond Raymond Hettinger |
|
|
|
#8 |
|
Posts: n/a
|
> > What about using a second dictionary (indexed by the incrementing
> > counter) instead of a list to record the insertion order? *Then you > > have O(1) deletion, and traversal takes an O(n log n) sorting > > operation. > My first attempt used exactly that approach and it works fine > though it does complicate the code quite a bit and slows down > all of the common operations by a constant factor. FWIW, here is the code for that approach. -------------------------------------- from collections import MutableMapping class OrderedDict(dict): def __init__(self, *args, **kwds): if len(args) > 1: raise TypeError('expected at 1 argument, got %d', len (args)) self.clear() self.update(*args, **kwds) def clear(self): self.__cached_sort = None self.__cnt = 0 self.__order = {} dict.clear(self) def __setitem__(self, key, value): if key not in self: self.__cached_sort = None self.__cnt += 1 self.__order[key] = self.__cnt dict.__setitem__(self, key, value) def __delitem__(self, key): if key in self: self.__cached_sort = None del self.__order[key] dict.__delitem__(self, key) def __sorted(self): # return keys in insertion order and cache the result if self.__cached_sort is None: self.__cached_sort = sorted(dict.keys(self), key=self.__order.__getitem__) return self.__cached_sort def __iter__(self): return iter(self.__sorted()) def __reversed__(self): return iter(reversed(self.__sorted())) def popitem(self): # O(n) cost on the first pop and O(1) on successive pops if not self: raise KeyError key = self.__sorted().pop() del self.__order[key] value = dict.pop(self, key) return key, value # Methods with indirect access via the above methods setdefault = MutableMapping.setdefault update = MutableMapping.update pop = MutableMapping.pop keys = MutableMapping.keys values = MutableMapping.values items = MutableMapping.items def __repr__(self): return '{' + ', '.join(map('%r: %r'.__mod__, self.items())) + '}' def copy(self): return self.__class__(self) @classmethod def fromkeys(cls, iterable, value=None): d = cls() for key in iterable: d[key] = value return d def __reduce__(self): # pickle an items list and any extra info in the instance dict items = list(self.items()) inst_dict = vars(self).copy() for attr in vars(self.__class__): inst_dict.pop(attr, None) return (self.__class__, (items,), inst_dict) Raymond Hettinger |
|
|
|
#9 |
|
Posts: n/a
|
Raymond Hettinger <> writes:
> > What about using a second dictionary > My first attempt used exactly that approach and it works fine > though it does complicate the code quite a bit and slows down > all of the common operations by a constant factor. Ehh, I guess I'm not surprised at the slowdown and extra complexity from the second dict. Oh well. If the module really turns out to be really used a lot, another (messy) approach would be to write a C extension that uses a doubly linked list some day. Paul Rubin |
|
|
|
#10 |
|
Posts: n/a
|
[Paul Rubin]
> Ehh, I guess I'm not surprised at the slowdown and extra complexity > from the second dict. *Oh well. *If the module really turns out to be > really used a lot, another (messy) approach would be to write a C > extension that uses a doubly linked list some day. That seems like an ideal implementation to me. O(1): appending, popping, searching, and deletion O(n): traversal Raymond Raymond Hettinger |
|