![]() |
|
|
|||||||
![]() |
Python - Proposed implementation for an Ordered Dictionary |
|
|
Thread Tools | Search this Thread |
|
|
#11 |
|
Raymond Hettinger:
>Paul Rubin: >>another (messy) approach would be to write a C >>extension that uses a doubly linked list some day. > > That seems like an ideal implementation to me. This was my Python implementation, where the delete too is O(1), but it's slow: http://code.activestate.com/recipes/498195/ I think the C extension with the doubly linked list is the best approach. Note that in modern CPUs caches are able to change the performance a lot. So reducing the memory used is very important. So using the XOR (or subtraction) trick to use only 1 pointer for each key-value may be useful to keep performance not too much far from the normal python dicts: http://en.wikipedia.org/wiki/XOR_linked_list Bye, bearophile bearophileHUGS@lycos.com |
|
|
|
|
#12 |
|
Posts: n/a
|
writes:
> So using the XOR (or subtraction) trick to use only 1 pointer for > each key-value may be useful to keep performance not too much far > from the normal python dicts: http://en.wikipedia.org/wiki/XOR_linked_list I don't see how to delete a randomly chosen node if you use that trick, since the hash lookup doesn't give you two consecutive nodes in the linked list to xor together. Maybe you could do something intricate with skip lists and end up with O(log n) deletion. Paul Rubin |
|
|
|
#13 |
|
Posts: n/a
|
Paul Rubin:
>I don't see how to delete a randomly chosen node if you use that trick, since the hash lookup doesn't give you two consecutive nodes in the linked list to xor together.< Thank you, I think you are right, I am sorry. So on 32-bit CPUs you need to add 8 bytes to each value. On 64-bit CPUs you may use a double indexed list, instead of a double linked list. And each index can be stored in 6 bytes instead of 8 (281_474_976_710_656 buckets may be enough for most people), so you need only 12 bytes for two indexes instead of 16 bytes, this helps the L1 cache (and the small L2 cache too on Core i7) a bit. But this may slow down too much the ordered iteration. Bye, bearophile bearophileHUGS@lycos.com |
|
|
|
#14 |
|
Posts: n/a
|
wrote:
> Paul Rubin: >> I don't see how to delete a randomly chosen node if you use that trick, since the hash lookup doesn't give you two consecutive nodes in the linked list to xor together.< > > Thank you, I think you are right, I am sorry. > So on 32-bit CPUs you need to add 8 bytes to each value. > > On 64-bit CPUs you may use a double indexed list, instead of a double > linked list. And each index can be stored in 6 bytes instead of 8 > (281_474_976_710_656 buckets may be enough for most people), so you > need only 12 bytes for two indexes instead of 16 bytes, this helps the > L1 cache (and the small L2 cache too on Core i7) a bit. But this may > slow down too much the ordered iteration. > A sort of premature pessimization, then. regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 Holden Web LLC http://www.holdenweb.com/ Steve Holden |
|
|
|
#15 |
|
Posts: n/a
|
Steve Holden:
> A sort of premature pessimization, then. Maybe not, the save in memory may lead to higher speed anyway. So you need to test it to know the overall balance. And in data structures with general purpose you want all the speed you can get. Bye, bearophile bearophileHUGS@lycos.com |
|
|
|
#16 |
|
Posts: n/a
|
[Steve Holden]
> A sort of premature pessimization, then. QOTW! _ ~ @ @ \_/ Raymond Raymond Hettinger |
|
|
|
#17 |
|
Posts: n/a
|
>> A sort of premature pessimization, then. QOTW! +1 Malcolm python@bdurham.com |
|
|
|
#18 |
|
Posts: n/a
|
Raymond Hettinger wrote:
> [Paul Rubin] >> Ehh, I guess I'm not surprised at the slowdown and extra complexity >> from the second dict. Oh well. If the module really turns out to be >> really used a lot, another (messy) approach would be to write a C >> extension that uses a doubly linked list some day. > > That seems like an ideal implementation to me. > > O(1): appending, popping, searching, and deletion > O(n): traversal > > Raymond Sometimes, it's useful to be able to obtain the data in the sorted sequence. You might consider adding functionality like: def seqItems(self): '''To return the items, sorted by key. ''' return [self[k] for k in self.seqKeys()] def seqKeys(self): ''' To return the keys sorted. ''' k= self.keys() k.sort() return k def seqValues(self): ''' To return the values, with their keys, sorted by value. ''' v= [(it[1], it[0]) for it in self.items()] v.sort() return v Colin W. Colin J. Williams |
|
|
|
#19 |
|
Posts: n/a
|
Colin J. Williams wrote:
> Sometimes, it's useful to be able to > obtain the data in the sorted sequence. > > You might consider adding functionality > like: > > def seqItems(self): > '''To return the items, sorted > by key. ''' > return [self[k] for k in > self.seqKeys()] Amazingly, the Python time-machine provides such functionality! Using just a handful of pre-existing pieces, you can do this: sorted(mydict.items()) sorted(mydict.keys()) [mydict[x] for x in sorted(mydict.keys)] and any other sequence you might need. That's the beauty of a general purpose programming language. You don't need everything to be a built-in *wink* -- Steven Steven D'Aprano |
|
|
|
#20 |
|
Posts: n/a
|
Steven D'Aprano wrote:
> Colin J. Williams wrote: > >> Sometimes, it's useful to be able to >> obtain the data in the sorted sequence. >> >> You might consider adding functionality >> like: >> >> def seqItems(self): >> '''To return the items, sorted >> by key. ''' >> return [self[k] for k in >> self.seqKeys()] > > Amazingly, the Python time-machine provides such functionality! Using just a > handful of pre-existing pieces, you can do this: > > sorted(mydict.items()) > sorted(mydict.keys()) > [mydict[x] for x in sorted(mydict.keys)] > > and any other sequence you might need. That's the beauty of a general > purpose programming language. You don't need everything to be a built-in > *wink* > > Thanks, you are right, you have a neat way of handling the first two, they work with an instance of Raymond Hettinger's. OrderedDict. The third suggestion has a couple of problems, which are fixed below. if __name__ == '__main__': d= OrderedDict.fromkeys('abracadabra', value= 'zzz') print(d) print(d.seqKeys()) print(d.seqItems()) print(d.seqValues()) # Steven D'Aprano's alternative mydict= d.copy() print sorted(mydict.items()) print sorted(mydict.keys()) # print [mydict[x] for x in sorted(mydict.keys)] Instance object is not iterable print(sorted(iter([(x[1], x[0]) for x in mydict.iteritems()]))) # This works Colin W. Colin J. Williams |
|