Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Re: insert unique data in a list

Reply
Thread Tools

Re: insert unique data in a list

 
 
knifenomad
Guest
Posts: n/a
 
      12-14-2009
On 12월14일, 오*2시57분, mattia <(E-Mail Removed)> wrote:
> Il Sun, 13 Dec 2009 16:37:20 +0000, mattia ha scritto:
>
> > How can I insert non-duplicate data in a list? I mean, is there a
> > particular option in the creation of a list that permit me not to use
> > something like:
> > def append_unique(l, val):
> > * * if val not in l:
> > * * * * l.append(val)

>
> > Thanks,
> > Mattia

>
> Ok, so you all suggest to use a set. Now the second question, more
> interesting. Why can't I insert a list into a set? I mean, I have a
> function that returns a list. I call this function several times and
> maybe the list returned is the same as another one already returned. I
> usually put all this lists into another list. How can I assure that my
> list contains only unique lists? Using set does'n work (i.e. the python
> interpreter tells me: TypeError: unhashable type: 'list')...


this makes the set type hashable.

class Set(set):
__hash__ = lambda self: id(self)

s = Set()
s.add(1)
s.add(2)
s.add(1)

print s
set([1, 2])

d = {}
d[s] = 'voila'

print d
{Set([1,2]):'voila'}

print d[s]
'voila'
 
Reply With Quote
 
 
 
 
knifenomad
Guest
Posts: n/a
 
      12-14-2009
On 12월14일, 오*10시19분, knifenomad <(E-Mail Removed)> wrote:
> On 12월14일, 오*2시57분, mattia <(E-Mail Removed)> wrote:
>
>
>
>
>
> > Il Sun, 13 Dec 2009 16:37:20 +0000, mattia ha scritto:

>
> > > How can I insert non-duplicate data in a list? I mean, is there a
> > > particular option in the creation of a list that permit me not to use
> > > something like:
> > > def append_unique(l, val):
> > > * * if val not in l:
> > > * * * * l.append(val)

>
> > > Thanks,
> > > Mattia

>
> > Ok, so you all suggest to use a set. Now the second question, more
> > interesting. Why can't I insert a list into a set? I mean, I have a
> > function that returns a list. I call this function several times and
> > maybe the list returned is the same as another one already returned. I
> > usually put all this lists into another list. How can I assure that my
> > list contains only unique lists? Using set does'n work (i.e. the python
> > interpreter tells me: TypeError: unhashable type: 'list')...

>
> this makes the set type hashable.
>
> class Set(set):
> * * __hash__ = lambda self: id(self)
>
> s = Set()
> s.add(1)
> s.add(2)
> s.add(1)
>
> print s
> set([1, 2])
>
> d = {}
> d[s] = 'voila'
>
> print d
> {Set([1,2]):'voila'}
>
> print d[s]
> 'voila'- 원본 텍스트 숨기기 -
>
> - 원본 텍스트 보기 -


although it's not what you've asked about. it's intereting to make set
hashable using __hash__.
 
Reply With Quote
 
 
 
 
Chris Rebert
Guest
Posts: n/a
 
      12-14-2009
On Sun, Dec 13, 2009 at 5:19 PM, knifenomad <(E-Mail Removed)> wrote:
> On 12월14일, 오*2시57분, mattia <(E-Mail Removed)> wrote:
>> Il Sun, 13 Dec 2009 16:37:20 +0000, mattia ha scritto:
>> > How can I insert non-duplicate data in a list? I mean, is there a
>> > particular option in the creation of a list that permit me not to use
>> > something like:
>> > def append_unique(l, val):
>> > * * if val not in l:
>> > * * * * l.append(val)

>>
>> Ok, so you all suggest to use a set. Now the second question, more
>> interesting. Why can't I insert a list into a set? I mean, I have a
>> function that returns a list. I call this function several times and
>> maybe the list returned is the same as another one already returned. I
>> usually put all this lists into another list. How can I assure that my
>> list contains only unique lists? Using set does'n work (i.e. the python
>> interpreter tells me: TypeError: unhashable type: 'list')...

>
> this makes the set type hashable.
>
> class Set(set):
> * *__hash__ = lambda self: id(self)


Or you could just use frozenset and get the correct semantics:
http://docs.python.org/library/stdtypes.html#frozenset

Cheers,
Chris
--
http://blog.rebertia.com
 
Reply With Quote
 
Steven D'Aprano
Guest
Posts: n/a
 
      12-14-2009
On Sun, 13 Dec 2009 17:19:17 -0800, knifenomad wrote:


> this makes the set type hashable.
>
> class Set(set):
> __hash__ = lambda self: id(self)



That's a *seriously* broken hash function.

>>> key = "voila"
>>> d = { Set(key): 1 }
>>> d

{Set(['i', 'a', 'l', 'o', 'v']): 1}
>>> d[ Set(key) ]

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
KeyError: Set(['i', 'a', 'l', 'o', 'v'])


--
Steven
 
Reply With Quote
 
knifenomad
Guest
Posts: n/a
 
      12-14-2009
On 12월14일, 오후12시42분, Steven D'Aprano
<(E-Mail Removed)> wrote:
> On Sun, 13 Dec 2009 17:19:17 -0800, knifenomad wrote:
> > this makes the set type hashable.

>
> > class Set(set):
> > * * __hash__ = lambda self: id(self)

>
> That's a *seriously* broken hash function.
>
> >>> key = "voila"
> >>> d = { Set(key): 1 }
> >>> d

>
> {Set(['i', 'a', 'l', 'o', 'v']): 1}>>> d[ Set(key) ]
>
> Traceback (most recent call last):
> * File "<stdin>", line 1, in <module>
> KeyError: Set(['i', 'a', 'l', 'o', 'v'])
>
> --
> Steven


of course it is broken as long as it uses it's instance id.
i added this to notify that unhashable can become hashable
implementing __hash__ inside the class. which probably set to None by
default.

 
Reply With Quote
 
Lie Ryan
Guest
Posts: n/a
 
      12-14-2009
On 12/15/2009 4:13 AM, mattia wrote:
>> >
>> > of course it is broken as long as it uses it's instance id. i added this
>> > to notify that unhashable can become hashable implementing __hash__
>> > inside the class. which probably set to None by default.

> Ok, nice example, but I believe that using id() as the hash function can
> lead to unexpected collisions.


For dict and set to work correctly, the hash function must conform to
the contract that:
- if A == B then hash(A) == hash(B)

If the id() of two objects differ but their content equal (i.e. they are
two equivalent, but distinct object), they should have the same hash. If
id() is used for the hash of an arbitrary object, the contract will be
broken unless you define A == B in terms of id(A) == id(B).
 
Reply With Quote
 
Steven D'Aprano
Guest
Posts: n/a
 
      12-14-2009
On Mon, 14 Dec 2009 17:13:24 +0000, mattia wrote:

> Il Sun, 13 Dec 2009 21:17:28 -0800, knifenomad ha scritto:
>
>> On 12월14일, 오후12시42분, Steven D'Aprano
>> <(E-Mail Removed)> wrote:
>>> On Sun, 13 Dec 2009 17:19:17 -0800, knifenomad wrote:
>>> > this makes the set type hashable.
>>>
>>> > class Set(set):
>>> > * * __hash__ = lambda self: id(self)
>>>
>>> That's a *seriously* broken hash function.
>>>
>>> >>> key = "voila"
>>> >>> d = { Set(key): 1 }
>>> >>> d
>>>
>>> {Set(['i', 'a', 'l', 'o', 'v']): 1}>>> d[ Set(key) ]
>>>
>>> Traceback (most recent call last):
>>> * File "<stdin>", line 1, in <module>
>>> KeyError: Set(['i', 'a', 'l', 'o', 'v'])
>>>
>>> --
>>> Steven

>>
>> of course it is broken as long as it uses it's instance id. i added
>> this to notify that unhashable can become hashable implementing
>> __hash__ inside the class. which probably set to None by default.

>
> Ok, nice example, but I believe that using id() as the hash function can
> lead to unexpected collisions.


No, you have that backwards. Using id() as the hash function means that
equal keys will hash unequal -- rather than unexpected collisions, it
leads to unexpected failure-to-collide-when-it-should-collide.

And it isn't a "nice example", it is a terrible example.

Firstly, the example fails to behave correctly. It simply doesn't work as
advertised.

Secondly, and much worse, it encourages people to do something dangerous
without thinking about the consequences. If it is so easy to hash mutable
objects, why don't built-in lists and sets don't have a __hash__ method?
Do you think that the Python developers merely forgot?

No, there is good reason why mutable items shouldn't be used as keys in
dicts or in sets, and this example simply papers over the reasons why and
gives the impression that using mutable objects as keys is easy and safe
when it is neither.

Using mutable objects as keys or set elements leads to surprising,
unexpected behaviour and hard-to-find bugs. Consider the following set
with lists as elements:

L = [1, 2]
s = Set() # set that allows mutable elements
s.add(L)
s.add([1, 2, 3])

So far so good. But what should happen now?

L.append(3)

The set now has two equal elements, which breaks the set invariant that
it has no duplicates.

Putting the problem of duplicates aside, there is another problem:

L = [1, 2]
s = Set([L])
L.append(3)

There are two possibilities: either the hash function of L changes when
the object changes, or it doesn't. Suppose it changes. Then the hash of L
after the append will be different from the hash of L before the append,
and so membership testing (L in s) will fail.

Okay, suppose we somehow arrange matters so that the hash of the object
doesn't change as the object mutates. This will solve the problem above:
we can mutate L as often as we like, and L in s will continue to work
correctly. But now the hash of an object doesn't just depend on it's
value, but on its history. That means that two objects which are
identical can hash differently, and we've already seen this is a problem.

There is one final approach which could work: we give the object a
constant hash function, so that all objects of that type hash
identically. This means that the performance of searches and lookups in
sets and dicts will fall to that of lists. There is no point in paying
all the extra overhead of a dict to get behaviour as slow, or slower,
than a list.

In other words, no matter what you do, using mutable objects as keys or
set elements leads to serious problems that need to be dealt with. It
simply isn't true that all you need to do to make mutable objects usable
in dicts or sets is to add a hash function. That part is trivial.



--
Steven
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Is there a unique method in python to unique a list? Token Type Python 9 09-09-2012 02:13 PM
Is there any advantage or disadvantage to using sets over list compsto ensure a list of unique entries? deathweaselx86 Python 5 06-25-2011 08:05 AM
Re: insert unique data in a list Fire Crow Python 4 12-14-2009 02:06 AM
list question... unique values in all possible unique spots ToshiBoy Python 6 08-12-2008 05:01 AM
INSERT statement contains fewer items than the insert list J. Muenchbourg ASP General 3 09-30-2003 09:24 PM



Advertisments