Velocity Reviews

Velocity Reviews (http://www.velocityreviews.com/forums/index.php)
-   Python (http://www.velocityreviews.com/forums/f43-python.html)
-   -   Doing set operation on non-hashable objects (http://www.velocityreviews.com/forums/t649964-doing-set-operation-on-non-hashable-objects.html)

5lvqbwl02@sneakemail.com 12-24-2008 07:16 PM

Doing set operation on non-hashable objects
 
Hi,

I'm writing an application which is structured roughly as follows:

"db" is a dict, where the values are also dicts.
A function searches through db and returns a list of values, each of
which is a dict as described above.
I need to perform set operations on these lists (intersection and
union)
However the objects themselves are not hashable, and therefore can't
be in a set, because they are dicts.
I'm not sure how large each set will be, but the point is I'm trying
to avoid anything looking like an O(n^2) algorithm, so I can't just
use naive double-looping to check for intersection/union on a pair of
lists.

The only way I can think of to do this right is to hash the dicts by
freezing them, turning them all into tuples, which can then be
hashed. But this is a problem because more dicts might be nested
inside. At any rate, I'd need to thaw them out when I'm done and turn
them back into dicts, which seems complicated and annoying, and all
this leads me to believe there is a better way.

What I really need is a set of pointers, so at the end of the
operation I have the actual objects pointed to. Can I somehow use the
object IDs as set elements, then recreate a list with the actual
objects they point to?

How do you get the object back from an ID in python?

thanks
Michael

Gabriel Genellina 12-24-2008 08:21 PM

Re: Doing set operation on non-hashable objects
 
En Wed, 24 Dec 2008 17:16:59 -0200, <5lvqbwl02@sneakemail.com> escribió:

> I'm writing an application which is structured roughly as follows:
>
> "db" is a dict, where the values are also dicts.
> A function searches through db and returns a list of values, each of
> which is a dict as described above.
> I need to perform set operations on these lists (intersection and
> union)
> However the objects themselves are not hashable, and therefore can't
> be in a set, because they are dicts.


See this thread from last week:
http://groups.google.com/group/comp....818ff713bd4421

> What I really need is a set of pointers, so at the end of the
> operation I have the actual objects pointed to. Can I somehow use the
> object IDs as set elements, then recreate a list with the actual
> objects they point to?


If you *only* care about object identity, you might use a dictionary that
only compares by identity to anyone else:

class nc_dict(dict):
"A hashable dictionary that isn't equal to anyone else"

def __eq__(self, other):
return cmp(id(self),id(other))==0

def __hash__(self):
return hash(id(self))

d1 = nc_dict(a=1, b=2, c=3)
d2 = nc_dict(b=2, c=0, d=4)
d3 = nc_dict(a=1, c=3, e=5)
dd1 = nc_dict(x=d1, y=d2)
dd2 = nc_dict(x=d1, y=d3)
dd3 = nc_dict(y=d2, z=d3, w=d1)
l1 = [dd1, dd2]
l2 = [dd2, dd3]
s1 = set(l1)
s2 = set(l2)
print s1-s2
print s2-s1
print s1&s2

# instances of nc_dict with the same contents are NOT equal
d4 = nc_dict(a=1, b=2, c=3)
print d1, d4, d1==d4 # output: False

# but we can use this function to compare contents
def cmp_contents(d1, d2):
try: return cmp(sorted(d1.items()), sorted(d2.items()))
except Exception: return 1 # assume they're unequal

print cmp_contents(d1,d4)==0 # output: True

> How do you get the object back from an ID in python?


You don't :)

--
Gabriel Genellina


Carl Banks 12-24-2008 08:52 PM

Re: Doing set operation on non-hashable objects
 
On Dec 24, 1:16*pm, 5lvqbw...@sneakemail.com wrote:
> Hi,
>
> I'm writing an application which is structured roughly as follows:
>
> "db" is a dict, where the values are also dicts.
> A function searches through db and returns a list of values, each of
> which is a dict as described above.
> I need to perform set operations on these lists (intersection and
> union)
> However the objects themselves are not hashable, and therefore can't
> be in a set, because they are dicts.
> I'm not sure how large each set will be, but the point is I'm trying
> to avoid anything looking like an O(n^2) algorithm, so I can't just
> use naive double-looping to check for intersection/union on a pair of
> lists.
>
> The only way I can think of to do this right is to hash the dicts by
> freezing them, turning them all into tuples, which can then be
> hashed. *But this is a problem because more dicts might be nested
> inside. *At any rate, I'd need to thaw them out when I'm done and turn
> them back into dicts, which seems complicated and annoying, and all
> this leads me to believe there is a better way.
>
> What I really need is a set of pointers, so at the end of the
> operation I have the actual objects pointed to. *Can I somehow use the
> object IDs as set elements, then recreate a list with the actual
> objects they point to?
>
> How do you get the object back from an ID in python?


Quick and dirty way is to reference the dicts with a lightweight
hashable object that uses the dict's identity. For instance:


class Identity(object):
def __init__(self,obj):
self.obj = obj
def __hash__(self):
return hash(id(self.obj))
def __eq__(self,other):
return self.obj is other.obj


Then, instead of "s.add(d)" use "s.add(Identity(d))".

Instead of "d = s.pop()" use "d = s.pop().obj".

Instead of "d in s" use "Identity(d) in s".

And so on.


To do it a bit better, you could create an IdentitySet class that
subclasses set and wraps the methods so that they automatically apply
wrap and unwrap the arguments on their way in and out (I'd bet there's
already a cookbook recipe to do that).


Carl Banks

5lvqbwl02@sneakemail.com 12-27-2008 01:28 AM

Re: Doing set operation on non-hashable objects
 
On Dec 24, 12:52*pm, Carl Banks <pavlovevide...@gmail.com> wrote:
> On Dec 24, 1:16*pm, 5lvqbw...@sneakemail.com wrote:
>
>
>
>
>
> > Hi,

>
> > I'm writing an application which is structured roughly as follows:

>
> > "db" is a dict, where the values are also dicts.
> > A function searches through db and returns a list of values, each of
> > which is a dict as described above.
> > I need to perform set operations on these lists (intersection and
> > union)
> > However the objects themselves are not hashable, and therefore can't
> > be in a set, because they are dicts.
> > I'm not sure how large each set will be, but the point is I'm trying
> > to avoid anything looking like an O(n^2) algorithm, so I can't just
> > use naive double-looping to check for intersection/union on a pair of
> > lists.

>
> > The only way I can think of to do this right is to hash the dicts by
> > freezing them, turning them all into tuples, which can then be
> > hashed. *But this is a problem because more dicts might be nested
> > inside. *At any rate, I'd need to thaw them out when I'm done and turn
> > them back into dicts, which seems complicated and annoying, and all
> > this leads me to believe there is a better way.

>
> > What I really need is a set of pointers, so at the end of the
> > operation I have the actual objects pointed to. *Can I somehow use the
> > object IDs as set elements, then recreate a list with the actual
> > objects they point to?

>
> > How do you get the object back from an ID in python?

>
> Quick and dirty way is to reference the dicts with a lightweight
> hashable object that uses the dict's identity. *For instance:
>
> class Identity(object):
> * * def __init__(self,obj):
> * * * * self.obj = obj
> * * def __hash__(self):
> * * * * return hash(id(self.obj))
> * * def __eq__(self,other):
> * * * * return self.obj is other.obj
>
> Then, instead of "s.add(d)" use "s.add(Identity(d))".
>
> Instead of "d = s.pop()" use "d = s.pop().obj".
>
> Instead of "d in s" use "Identity(d) in s".
>
> And so on.
>
> To do it a bit better, you could create an IdentitySet class that
> subclasses set and wraps the methods so that they automatically apply
> wrap and unwrap the arguments on their way in and out (I'd bet there's
> already a cookbook recipe to do that).
>
> Carl Banks


Thank you, I like this idea a lot. It's close to what I've been
thinking but a bit cleaner.

Michael

5lvqbwl02@sneakemail.com 12-28-2008 08:44 AM

Re: Doing set operation on non-hashable objects
 
> > ... "db" is a dict, where the values are also dicts.
> > A function searches through db and returns a list of values, each of
> > which is a dict as described above.
> > I need to perform set operations on these lists (intersection and
> > union)
> > However the objects themselves are not hashable, and therefore can't
> > be in a set, because they are dicts.
> > I'm not sure how large each set will be, but the point is I'm trying
> > to avoid anything looking like an O(n^2) algorithm, so I can't just
> > use naive double-looping to check for intersection/union on a pair of
> > lists.

>
> Well, if the lists are ordered, you can do intersection and union
> in O(n) time. *If they are orderable, you can sort each first, giving
> O(n log n). *Note you can do lst.sort(key=id) which will sort on ids.


The lists are not ordered, since the elements of the list could be
arbitrarily complex things consisting of resistors, capacitors, sub-
circuits, etc. I'm trying do a little EDA program, taking the best
parts from the different EDA/cad tools I've used. I finally came up
with an idea using dictionaries in a certain way, so I'd like to be
able to do set operators on arbitrary things that may themselves not
be hashable.

> > How do you get the object back from an ID in python?

>
> You don't; you remember the mapping in a dictionary and look it up.


Exactly. A mapping between the ID and the thing itself.

> If there were a way to go from id to object, the whole idea of garbage
> collection and reference counts would fly out the window, leading to
> nasty crashes (or you might get to an object that is the re-used id of
> an older object).


Yup, this makes perfect sense. It was rattling around in my head for
a bit afer I wrote the original post, then this makes sense.

>


5lvqbwl02@sneakemail.com 12-28-2008 08:54 AM

Re: Doing set operation on non-hashable objects
 
On Dec 24, 12:21*pm, "Gabriel Genellina" <gagsl-...@yahoo.com.ar>
wrote:
> En Wed, 24 Dec 2008 17:16:59 -0200, <5lvqbw...@sneakemail.com> escribió:
>
> > I'm writing an application which is structured roughly as follows:

>
> > "db" is a dict, where the values are also dicts.
> > A function searches through db and returns a list of values, each of
> > which is a dict as described above.
> > I need to perform set operations on these lists (intersection and
> > union)
> > However the objects themselves are not hashable, and therefore can't
> > be in a set, because they are dicts.

>
>
> If you *only* care about object identity, you might use a dictionary that *
> only compares by identity to anyone else:
>
> class nc_dict(dict):
> * *"A hashable dictionary that isn't equal to anyone else"
>
> * *def __eq__(self, other):
> * * *return cmp(id(self),id(other))==0
>
> * *def __hash__(self):
> * * *return hash(id(self))
>
> d1 = nc_dict(a=1, b=2, c=3)
> d2 = nc_dict(b=2, c=0, d=4)
> d3 = nc_dict(a=1, c=3, e=5)
> dd1 = nc_dict(x=d1, y=d2)
> dd2 = nc_dict(x=d1, y=d3)
> dd3 = nc_dict(y=d2, z=d3, w=d1)
> l1 = [dd1, dd2]
> l2 = [dd2, dd3]
> s1 = set(l1)
> s2 = set(l2)
> print s1-s2
> print s2-s1
> print s1&s2
>
> # instances of nc_dict with the same contents are NOT equal
> d4 = nc_dict(a=1, b=2, c=3)
> print d1, d4, d1==d4 *# output: False
>
> # but we can use this function to compare contents
> def cmp_contents(d1, d2):
> * * *try: return cmp(sorted(d1.items()), sorted(d2.items()))
> * * *except Exception: return 1 # assume they're unequal
>
> print cmp_contents(d1,d4)==0 # output: True


Good idea. I'll try this out. I don't think it's likely that I'll
have a case where the dicts have identical values, but different
identities. And if they do I probably don't care too much.

Thanks






All times are GMT. The time now is 08:12 AM.

Powered by vBulletin®. Copyright ©2000 - 2014, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.