Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Java > hashCode

Reply
Thread Tools

hashCode

 
 
Arne Vajh°j
Guest
Posts: n/a
 
      08-10-2012
On 8/10/2012 6:22 PM, bob smith wrote:
> On Friday, August 10, 2012 11:34:28 AM UTC-5, Eric Sosman wrote:
>> On 8/10/2012 11:47 AM, bob smith wrote:
>>> Is it always technically correct to override the hashCode function like so:
>>> @Override
>>> public int hashCode() {
>>> return 1;
>>> }
>>> Would it be potentially better if that was Object's implementation?

>>
>> Define "better."

>
> Better in the sense that you would never HAVE to override hashCode.
>
> Now, there are cases where you HAVE to override it, or your code is very broken.


It is not broken.

It will perform poorly in many cases.

Arne

 
Reply With Quote
 
 
 
 
Arne Vajh°j
Guest
Posts: n/a
 
      08-10-2012
On 8/10/2012 6:32 PM, Lew wrote:
> bob smith wrote:
>> Now, there are cases where you HAVE to override it, or your code is very broken.

>
> No.


> As long as 'hashCode()' fulfills the contract, your code will work - functionally. But a bad
> 'hashCode()' could and likely will noticeably affect performance. There is more to correctness
> than mere functional conformance.


If the code per specs is guaranteed to work then it is correct.

Good (or just decent) performance is not necessary for code to
be correct.

At least not in the traditional programming terminology.

In plain English maybe.

Arne


 
Reply With Quote
 
 
 
 
Roedy Green
Guest
Posts: n/a
 
      08-11-2012
On Fri, 10 Aug 2012 12:45:07 -0700 (PDT), Lew <(E-Mail Removed)>
wrote, quoted or indirectly quoted someone who said :

> h =3D 31 * h + attribute.hashCode();
> }

In my essay I recommend XOR which is an inherentely faster operation
than multiply. I wonder which actually works out better. If you had a
large number of fields, the multiply effect could fall off the left
hand end. It is the algorithm used for String which could have very
long strings, so Sun must have thought of that.
--
Roedy Green Canadian Mind Products http://mindprod.com
A new scientific truth does not triumph by convincing its opponents and making them see the light,
but rather because its opponents eventually die, and a new generation grows up that is familiar with it.
~ Max Planck 1858-04-23 1947-10-04


 
Reply With Quote
 
Eric Sosman
Guest
Posts: n/a
 
      08-11-2012
On 8/10/2012 6:22 PM, bob smith wrote:
[... many blank lines removed for legibility's sake ...]
> On Friday, August 10, 2012 11:34:28 AM UTC-5, Eric Sosman wrote:
>> On 8/10/2012 11:47 AM, bob smith wrote:
>>
>>> Is it always technically correct to override the hashCode function like so:
>>>
>>> @Override
>>> public int hashCode() {
>>> return 1;
>>> }
>>>
>>> Would it be potentially better if that was Object's implementation?

>>
>> Define "better."

>
> Better in the sense that you would never HAVE to override hashCode.
>
> Now, there are cases where you HAVE to override it, or your code is very broken.


I cannot think of a case where you HAVE to override hashCode(),
except as a consequence of other choices that you didn't HAVE to
make. You don't HAVE to invent classes where distinct instances
are considered equal, and even if you do you don't HAVE to put those
instances in HashMaps or HashSets or whatever.

But that's a bit specious: All it says is that you don't HAVE
to override hashCode() because you don't HAVE to use things that
call it. It's like "You don't HAVE to pay taxes, because you don't
HAVE to live outside prison." So, let's take it as a given that
you will often need to write classes that override equals() and
hashCode() -- I imagine you understand that they go together.

Okay: Then returning a constant 1 (or 42 or 0 or whatever)
would in fact satisfy the letter of the law regarding hashCode():
Whenever x.equals(y) is true, x.hashCode() == y.hashCode(). In
your example this would be trivially true because x,y,z,... all
have the same hashCode() value, whether they're equal or not --
You have lived up to the letter of the law.

Of course, such a hashCode() would make all those hash-based
containers pretty much useless: They would work in the sense that
they would get the Right Answer, but they'd be abominably slow,
with expected performance of O(N) instead of O(1). See
<http://www.cs.rice.edu/~scrosby/hash/CrosbyWallach_UsenixSec2003/>
for a survey of some denial-of-service attacks that work by driving
hash tables from O(1) to O(N), resulting in catastrophic failure
of the attacked system.

In other words, the letter of the law on hashCode() is a bare
minimum that guarantees correct functioning, but it is not enough
to guarantee usability. Why isn't the law more specific? Because
nobody knows how to write "hashCode() must be correct *and* usable"
in terms that would cover all the classes all the Java programmers
have dreamed up and will dream up. Your hashCode() meets the bare
minimum requirement, but is not "usable." The actual hashCode()
provided by Object also meets the bare minimum requirement, and *is*
usable as it stands, until (and unless; you don't HAVE to) you
choose to implement other equals() semantics, and a hashCode() to
match them.


--
Eric Sosman
http://www.velocityreviews.com/forums/(E-Mail Removed)d
 
Reply With Quote
 
Eric Sosman
Guest
Posts: n/a
 
      08-11-2012
On 8/10/2012 7:25 PM, Arne Vajh°j wrote:
> On 8/10/2012 3:17 PM, Roedy Green wrote:
>> On Fri, 10 Aug 2012 08:47:12 -0700 (PDT), bob smith
>> <(E-Mail Removed)> wrote, quoted or indirectly quoted someone
>> who said :
>>
>>> @Override
>>> public int hashCode() {
>>> return 1;
>>> }

>>
>> that's about the worst possible hashCode function.

>
> Yes, but the posted asked "Is it always technically correct to ..."
> not whether is was "best possible".


He also asked whether it would "be potentially better."

--
Eric Sosman
(E-Mail Removed)d
 
Reply With Quote
 
Jan Burse
Guest
Posts: n/a
 
      08-11-2012
bob smith schrieb:
> Is it always technically correct to override the hashCode function like so:
>
> @Override
> public int hashCode() {
> return 1;
> }
>
> Would it be potentially better if that was Object's implementation?
>


Maybe it would make sense to spell out what the contract
for hashCode() is. Well the contract is simply, the
following invariant should hold:

/* invariant that should hold */
if a.equals(b) then a.hashCode()==b.hashCode()

It should be noted that this does not imply:

/* not implied and thus not required by the invariant */
if a.hashCode()==b.hashCode() then a.equals(b)

It is also quite unlikely that a hashCode() would satisfy
the later, although the closer it comes to the later, the
better it works for HashMap, etc..

The default objects implementation of hashCode() matches
the default objects impementation of equals(). The default
objcts implementation of equals() is ==. And the default
objects implementation of hashCode() is
System.identityHashCode().

The System identity hash code is stored in the object
and generated by the system. It does not change during
GC although the internal object address might change
during GC. It is only 32bit although internal object
addresses might by 64bit with a corresponding JVM.

Returning a constant, any constant c not only 1, would be
technically correct correct for the default implementation
of the class object. Since it trivially satisfies the
invariant:

if a.equals(b) then c==c

is trivially true, since c==c is true. But it is not
better. Since you would get very degenerated HashMaps,
etc..

You need to override the hashhCode() when there is danger
that the invariant is not anymore satisified. This is
not the case when equals() is not overridden. So overriding
hashCode() just for fun when equals() is not overriden,
usually doesn't make sense. It will probably only slow
down the hashCode() calculation. So the following:

hashCode() = sum attr_i * c^i

Is not necessary. But it would be a possible way to go
when equals() were overriden in the following way:

equals(other) = and_i attr_i.equals(other.attr_i)

The above happens when you turn your object into a container
of other objects irrespective of the own object identity.
But beware if the container contains itself somewhere. This
is why we find in the code for Hashtable the following
complication:

public synchronized int hashCode() {
/*
* This code detects the recursion caused by computing the hash code
* of a self-referential hash table and prevents the stack overflow
* that would otherwise result. This allows certain 1.1-era
* applets with self-referential hash tables to work. This code
* abuses the loadFactor field to do double-duty as a hashCode
* in progress flag, so as not to worsen the space performance.
* A negative load factor indicates that hash code computation is
* in progress.
*/

Interestingly it will return a constant for the object when
it detects a loop. Maybe one could do better... Dunno

Bye


 
Reply With Quote
 
Jan Burse
Guest
Posts: n/a
 
      08-11-2012
Jan Burse schrieb:
> during GC. It is only 32bit although internal object
> addresses might by 64bit with a corresponding JVM.


Typically even less bits, since the same space
is used for some object flags.
 
Reply With Quote
 
Arne Vajh°j
Guest
Posts: n/a
 
      08-11-2012
On 8/11/2012 8:00 AM, Eric Sosman wrote:
> On 8/10/2012 7:25 PM, Arne Vajh°j wrote:
>> On 8/10/2012 3:17 PM, Roedy Green wrote:
>>> On Fri, 10 Aug 2012 08:47:12 -0700 (PDT), bob smith
>>> <(E-Mail Removed)> wrote, quoted or indirectly quoted someone
>>> who said :
>>>
>>>> @Override
>>>> public int hashCode() {
>>>> return 1;
>>>> }
>>>
>>> that's about the worst possible hashCode function.

>>
>> Yes, but the posted asked "Is it always technically correct to ..."
>> not whether is was "best possible".

>
> He also asked whether it would "be potentially better."


"better to use Object hashCode" which again should bring the
correctness question before the performance question.

Arne


 
Reply With Quote
 
Joerg Meier
Guest
Posts: n/a
 
      08-11-2012
On Sat, 11 Aug 2012 04:54:09 -0700, Roedy Green wrote:

> On Fri, 10 Aug 2012 12:45:07 -0700 (PDT), Lew <(E-Mail Removed)>
> wrote, quoted or indirectly quoted someone who said :
>> h =3D 31 * h + attribute.hashCode();
>> }

> In my essay I recommend XOR which is an inherentely faster operation
> than multiply.


Hasn't that been wrong since about the invention of the 80386 processor
family ? Pretty sure by now MUL and XOR both take one cycle and that's it.


Liebe Gruesse,
Joerg

--
Ich lese meine Emails nicht, replies to Email bleiben also leider
ungelesen.
 
Reply With Quote
 
Jan Burse
Guest
Posts: n/a
 
      08-11-2012
Roedy Green schrieb:
> If you had a
> large number of fields, the multiply effect could fall off the left
> hand end.


Actually this does not happen, since you multiply with 31,
which is 1+2+4+8+16. So that:

a*31+b = a*16+a*8+a*4+a*2+a+b

So for a HashMap that uses an index = hash & (2^n - 1) (which is
the same as hash mod 2^n), the impact of a will be still seen,
even when it occurs at the very left hand side.

There is some Microsoft C# HashMap implementation which does
not use mod 2^n, but instead some primes. In case the
implementation choses 31 as the designated prime, all
information but for the first field will be lost. But since
mod 2^32 is also applied, this might not be completely true.

For 2^n I don't know exactly how the impact could be
described. I guess in a HashMap with index = hash mod 2^1 the
hash amounts to a parity bit, since the sum in a+b acts like
an xor on the first right hand bit. But 2^n with n>1 the
31 multiplication is a little more crude.
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Hashcode of primitive types Dimitri Pissarenko Java 5 01-29-2004 11:05 PM
Improving hashCode() to match equals() Marco Java 10 01-17-2004 09:55 PM
Designing hashCode() methods kelvSYC Java 1 12-24-2003 02:56 AM
equals and hashCode Gregory A. Swarthout Java 2 12-20-2003 12:34 AM
hashCode for byte[] Roedy Green Java 1 08-22-2003 02:08 AM



Advertisments