Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Re: set using alternative hash function?

Reply
Thread Tools

Re: set using alternative hash function?

 
 
Austin Bingham
Guest
Posts: n/a
 
      10-15-2009
That's definitely a workable solution, but it still rubs me the wrong
way. The uniqueness criteria of a set seems, to me, like a property of
the set, whereas the python model forces it onto each set element.

Another issue I have with the HashWrapper approach is its space
requirements. Logically, what I'm asking to do is switch out a single
function reference (i.e. to point at get_name() rather than hash()),
but in practice I'm forced to instantiate a new object for each of my
set members. On a large set, this could be disastrous.

Don't get me wrong...your solution is a good one, but it's just not
what I am looking for.

Austin

On Thu, Oct 15, 2009 at 1:36 PM, Chris Rebert <(E-Mail Removed)> wrote:
> On Thu, Oct 15, 2009 at 4:24 AM, Austin Bingham
> <(E-Mail Removed)> wrote:
>> If I understand things correctly, the set class uses hash()
>> universally to calculate hash values for its elements. Is there a
>> standard way to have set use a different function? Say I've got a
>> collection of objects with names. I'd like to create a set of these
>> objects where the hashing is done on these names. Using the __hash__
>> function seems inelegant because it means I have to settle on one type
>> of hashing for these objects all of the time, i.e. I can't create a
>> set of them based on a different uniqueness criteria later. I'd like
>> to create a set instance that uses, say, 'hash(x.name)' rather than
>> 'hash(x)'.
>>
>> Is this possible? Am I just thinking about this problem the wrong way?
>> Admittedly, I'm coming at this from a C++/STL perspective, so perhaps
>> I'm just missing the obvious. Thanks for any help on this.

>
> You could use wrapper objects that define an appropriate __hash__():
>
> #*completely untested*
> class HashWrapper(object):
> * *def __init__(self, obj, criteria):
> * * * *self._wrapee = obj
> * * * *self._criteria = criteria
>
> * *#override __hash__() / hash()
> * *def __hash__(self):
> * * * *return hash(self._criteria(self._wrapee))
>
> * *#proxying code
> * *def __getattr__(self, name):
> * * * *return getattr(self._wrapee, name)
>
> * *def __setattr__(self, name, val):
> * * * *setattr(self._wrapee, name, val)
>
> #example
> def name_of(obj):
> * *return obj.name
>
> def name_and_serial_num(obj):
> * *return obj.name, obj.serial_number
>
> no_same_names = set(HashWrapper(obj, name_of) for obj in some_collection)
> no_same_name_and_serial = set(HashWrapper(obj, name_and_serial_num)
> for obj in some_collection)
>
> Cheers,
> Chris
> --
> http://blog.rebertia.com
>

 
Reply With Quote
 
 
 
 
Diez B. Roggisch
Guest
Posts: n/a
 
      10-15-2009
Austin Bingham wrote:

> That's definitely a workable solution, but it still rubs me the wrong
> way. The uniqueness criteria of a set seems, to me, like a property of
> the set, whereas the python model forces it onto each set element.


This is a POV, but to to me, the set just deals with a very minimal
protocol - hash-value & equality. Whatever you feed it, it has to cope with
that. It strikes *me* as odd to ask for something else.

The ideal solution would be to have several hash/eq-methods on your object,
and somehow make the surroundings decide which to use. But there is no
OO-pragma for that.

> Another issue I have with the HashWrapper approach is its space
> requirements. Logically, what I'm asking to do is switch out a single
> function reference (i.e. to point at get_name() rather than hash()),
> but in practice I'm forced to instantiate a new object for each of my
> set members. On a large set, this could be disastrous.
>
> Don't get me wrong...your solution is a good one, but it's just not
> what I am looking for.


It is the only one you have, short of implementing set your own. And then
the terms of implementation would be that you pass a hash- and eq-function
& create wrapper-objects internally.

Diez
 
Reply With Quote
 
 
 
 
Austin Bingham
Guest
Posts: n/a
 
      10-15-2009
On Thu, Oct 15, 2009 at 2:23 PM, Diez B. Roggisch <(E-Mail Removed)> wrote:
> Austin Bingham wrote:
> This is a POV, but to to me, the set just deals with a very minimal
> protocol - hash-value & equality. Whatever you feed it, it has to cope with
> that. It strikes *me* as odd to ask for something else.


But I'm not really asking for anything that changes the paradigm of
how things currently work. All I need is the ability to say something
like this:

*s = set(hash_func = lambda x: hash(x.name))

If set could be de-hardcoded from using hash(), nothing else about how
it works would have to change. Or am I wrong on that? I see that you
mention equality in the protocol, but I don't know why a set would
need that if a) hash(x) == hash(y) --> x == y and b) hash function
must return a 32 bit value (which I think is the case)...so maybe I
misunderstand something.

I wonder...does the requirement of using hash() have to do with the
low-level implementation of set? That might explain the state of
things.

> The ideal solution would be to have several hash/eq-methods on your object,
> and somehow make the surroundings decide which to use. But there is no
> OO-pragma for that.


But why force objects to know about all of the wacky contexts in which
they might be used? Let objects be objects and hash-functions be
hash-functions. If someone wants a set where uniqueness is defined by
the number of occurrences of 'a' in an object's name, the object
should never have to know that...it should just have a name.

In any event, it sounds like set doesn't have any notion of switching
out its hash function, and that more or less answers my question.

Austin
 
Reply With Quote
 
Diez B. Roggisch
Guest
Posts: n/a
 
      10-15-2009
Austin Bingham wrote:

> On Thu, Oct 15, 2009 at 2:23 PM, Diez B. Roggisch <(E-Mail Removed)>
> wrote:
>> Austin Bingham wrote:
>> This is a POV, but to to me, the set just deals with a very minimal
>> protocol - hash-value & equality. Whatever you feed it, it has to cope
>> with that. It strikes *me* as odd to ask for something else.

>
> But I'm not really asking for anything that changes the paradigm of
> how things currently work. All I need is the ability to say something
> like this:
>
> s = set(hash_func = lambda x: hash(x.name))
>
> If set could be de-hardcoded from using hash(), nothing else about how
> it works would have to change. Or am I wrong on that? I see that you
> mention equality in the protocol, but I don't know why a set would
> need that if a) hash(x) == hash(y) --> x == y and b) hash function
> must return a 32 bit value (which I think is the case)...so maybe I
> misunderstand something.


You do. Hashes can collide, and then you need equality. Sets are *based* on
equality actually, the hash is just one optimization. Other optimizations
build on order (usually, the STL requires you to define < on objects for
that. Which is enough because a == b <=> a < b && b < a). But eventually,
it all boils down to equality. You could implement a set without hash, by
imposing linear behavior on insert and lookup - but none based purely on a
hash.

>
> I wonder...does the requirement of using hash() have to do with the
> low-level implementation of set? That might explain the state of
> things.
>
>> The ideal solution would be to have several hash/eq-methods on your
>> object, and somehow make the surroundings decide which to use. But there
>> is no OO-pragma for that.

>
> But why force objects to know about all of the wacky contexts in which
> they might be used? Let objects be objects and hash-functions be
> hash-functions. If someone wants a set where uniqueness is defined by
> the number of occurrences of 'a' in an object's name, the object
> should never have to know that...it should just have a name.


Because these functions need intimate knowledge of the objects to actually
work. Sure, in python it's easy to define them outside, and just reach into
the object as you wish. But aside this implementation detail, a hash (and
equality of course) always are based upon the object in question. So I
think it's natural to have it defined right there (which __hash__ and
__eq__ show that this seems to be accepted in general).

And if there were something that would decide on context which of several
implementations to use, you'd have less to worry. As things are, there
isn't such thing (I don't even have the slightest idea what could work),
you are as well off with defining two functions.


Diez
 
Reply With Quote
 
Austin Bingham
Guest
Posts: n/a
 
      10-15-2009
On Thu, Oct 15, 2009 at 3:02 PM, Diez B. Roggisch <(E-Mail Removed)> wrote:
> Austin Bingham wrote:
> You do. Hashes can collide, and then you need equality. Sets are *based* on
> equality actually, the hash is just one optimization. ...


Right, thanks for clearing that up. Not reading closely enough ==
public shaming

> Because these functions need intimate knowledge of the objects to actually
> work. Sure, in python it's easy to define them outside, and just reach into
> the object as you wish. But aside this implementation detail, a hash (and
> equality of course) always are based upon the object in question. So I
> think it's natural to have it defined right there (which __hash__ and
> __eq__ show that this seems to be accepted in general).
>
> And if there were something that would decide on context which of several
> implementations to use, you'd have less to worry. As things are, there
> isn't such thing (I don't even have the slightest idea what could work),
> you are as well off with defining two functions.


But this "context decider" you're talking about sounds exactly like
what I'm talking about. If I had some class with __hash1__ and
__hash2__ (and associated equality operators), you're saying it would
be nice to be able to select which to use based on the context (e.g.
what type of set I'm using.) It might look like this:

s = set(hash_func = lambda x: x.__hash2__, eq_func = lambda x, y:
x.__eq2__(y))

And if *that* works for you, do you still have a problem with:

s = set(hash_func = lambda x: hash(x.name), eq_func = lambda x,y:
x.name == y.name)

?

If you don't like those, what would work for you? As far as I can
tell, your "context decider" and my "alternative hash functions" are
the same thing, and we just need to agree on a syntax.
 
Reply With Quote
 
Diez B. Roggisch
Guest
Posts: n/a
 
      10-15-2009
>> And if there were something that would decide on context which of several
>> implementations to use, you'd have less to worry. As things are, there
>> isn't such thing (I don't even have the slightest idea what could work),
>> you are as well off with defining two functions.

>
> But this "context decider" you're talking about sounds exactly like
> what I'm talking about. If I had some class with __hash1__ and
> __hash2__ (and associated equality operators), you're saying it would
> be nice to be able to select which to use based on the context (e.g.
> what type of set I'm using.) It might look like this:
>
> s = set(hash_func = lambda x: x.__hash2__, eq_func = lambda x, y:
> x.__eq2__(y))
>
> And if *that* works for you, do you still have a problem with:
>
> s = set(hash_func = lambda x: hash(x.name), eq_func = lambda x,y:
> x.name == y.name)
>
> ?
>
> If you don't like those, what would work for you? As far as I can
> tell, your "context decider" and my "alternative hash functions" are
> the same thing, and we just need to agree on a syntax.


The context-decider isn't the same thing because it isn't designed yet
And most probably won't ever be. It's just the abstract idea that
hashing/equality change for one object depending on the circumstances they
are used in, and that one wishes that the decision what to use is as simple
& transparent as possible.

Your approach is certainly viable, but I guess the current
set-implementation is optimized on working with __hash__ and __eq__ on
objects because for these exist slots in the python object structure in C.
So while you can implement your idea, you will end up passing wrappers as
Chris & me proposed into the current set implementation.

However, it might be worth thinking of proposing this change to the set-type
in general. But then for orthogonality, dicts should have it as well I
guess. Which opens a whole new can of worms.

I'm undecided on this. I think the cure is straight forward & simple to
implement, and once you have the level to understand & need different
hashing/equality, you should be comfortable writing this simple wrapper
yourself. So for me personally, it's not a needed addition to the language.
YMMV, and thus - go for it if you like.

Diez
 
Reply With Quote
 
Austin Bingham
Guest
Posts: n/a
 
      10-15-2009
On Thu, Oct 15, 2009 at 3:43 PM, Diez B. Roggisch <(E-Mail Removed)> wrote:
> The context-decider isn't the same thing because it isn't designed yet
> And most probably won't ever be. It's just the abstract idea that
> hashing/equality change for one object depending on the circumstances they
> are used in, and that one wishes that the decision what to use is as simple
> & transparent as possible.


Fair enough

> Your approach is certainly viable, but I guess the current
> set-implementation is optimized on working with __hash__ and __eq__ on
> objects because for these exist slots in the python object structure in C.
> So while you can implement your idea, you will end up passing wrappers as
> Chris & me proposed into the current set implementation.


Yes, I figured that the guts of set relied on particulars to which we
are not privy at the python level. If the syntax let me describe sets
the way I've been laying out here, I could probably tolerate the
underlying implementation relying on wrappers.

> However, it might be worth thinking of proposing this change to the set-type
> in general. But then for orthogonality, dicts should have it as well I
> guess. Which opens a whole new can of worms.


dicts would certainly have to be looked at as well, but I don't think
the can would have that many worms in it if we solve the set issue to
everyone's satisfaction.

In any event, thanks for helping me work through this issue.

Austin
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
hash of hash of hash of hash in c++ rp C++ 1 11-10-2011 04:45 PM
set using alternative hash function? Austin Bingham Python 20 10-17-2009 12:55 AM
Re: set using alternative hash function? Chris Rebert Python 2 10-15-2009 02:57 PM
Quicker finding strings? Alternative for array, hash, set? Patrick Put Ruby 9 02-03-2009 03:50 PM
Hash#select returns an array but Hash#reject returns a hash... Srijayanth Sridhar Ruby 19 07-02-2008 12:49 PM



Advertisments