Velocity Reviews > Java > hashCode

# hashCode

Jan Burse
Guest
Posts: n/a

 08-11-2012
To: Roedy Green
From: Jan Burse <(E-Mail Removed)>

Roedy Green schrieb:
> large number of fields, the multiply effect could fall off the left
> hand end.

Actually this does not happen, since you multiply with 31, which is 1+2+4+8+16.
So that:

a*31+b = a*16+a*8+a*4+a*2+a+b

So for a HashMap that uses an index = hash & (2^n - 1) (which is the same as
hash mod 2^n), the impact of a will be still seen, even when it occurs at the
very left hand side.

There is some Microsoft C# HashMap implementation which does not use mod 2^n,
but instead some primes. In case the implementation choses 31 as the designated
prime, all information but for the first field will be lost. But since mod 2^32
is also applied, this might not be completely true.

For 2^n I don't know exactly how the impact could be described. I guess in a
HashMap with index = hash mod 2^1 the hash amounts to a parity bit, since the
sum in a+b acts like an xor on the first right hand bit. But 2^n with n>1 the
31 multiplication is a little more crude.

* Origin: Prism bbs (1:261/3
Time Warp of the Future BBS - telnet://time.synchro.net:24

Mike Winter
Guest
Posts: n/a

 08-11-2012
To: Joerg Meier
From: Mike Winter <(E-Mail Removed)>

On 11/08/2012 17:25, Joerg Meier wrote:
> On Sat, 11 Aug 2012 04:54:09 -0700, Roedy Green wrote:
>[...]
>> In my essay I recommend XOR which is an inherentely faster operation
>> than multiply.

>
> Hasn't that been wrong since about the invention of the 80386 processor
> family ?

Not that far back: the Pentium required 9-11 cycles to complete a MUL
instruction compared to 1-3 for XOR (and the like), depending on operand
locations and widths.

> Pretty sure by now MUL and XOR both take one cycle and that's it.

More-or-less, but the former is still slower for wider operands. However, your
point is well-taken: it needn't be as much a concern in most cases.

--
Mike Winter
Replace ".invalid" with ".uk" to reply by e-mail.

* Origin: Prism bbs (1:261/3
Time Warp of the Future BBS - telnet://time.synchro.net:24

rossum
Guest
Posts: n/a

 08-11-2012
To: Lew
From: rossum <(E-Mail Removed)>

On Fri, 10 Aug 2012 12:45:07 -0700 (PDT), Lew <(E-Mail Removed)> wrote:

>public static int calculateHash(Foo arg) {
> int h = 0;
>
> for ( each attribute of Foo that contributes to 'equals()' )
> {
> h = 31 * h + attribute.hashCode();
> }
> return h;
>}

Bloch starts with:

int h = 17;

He says that works beter in cases where the first one or more
attribute.hashCode() values are zero, and hence will not register.

He suggessts any constant non-zero value.

rossum

* Origin: Prism bbs (1:261/3
Time Warp of the Future BBS - telnet://time.synchro.net:24

Lew
Guest
Posts: n/a

 08-12-2012
To: Arne Vajhøj
From: Lew <(E-Mail Removed)>

On 08/10/2012 04:30 PM, Arne Vajh-,j wrote:
> On 8/10/2012 6:32 PM, Lew wrote:
>> bob smith wrote:
>>> Now, there are cases where you HAVE to override it, or your code is very
>>> broken.

>>
>> No.

>
>> As long as 'hashCode()' fulfills the contract, your code will work -
>> 'hashCode()' could and likely will noticeably affect performance. There is
>> more to correctness
>> than mere functional conformance.

>
> If the code per specs is guaranteed to work then it is correct.
>
> Good (or just decent) performance is not necessary for code to
> be correct.
>
> At least not in the traditional programming terminology.
>
> In plain English maybe.

I see your point, but that is not to say that the specs exclude performance
considerations.

In the case of 'hashCode()', the Javadocs do say, "This method is supported for
the benefit of hash tables such as those provided by HashMap."
<http://docs.oracle.com/javase/7/docs/api/java/lang/Object.html#hashCode()>

The key question here is how you define "benefit". I argue that a hash code
that is constant does not benefit, say, a 'HashMap' because one of our desired
uses is constant-order retrieval.

"This implementation provides constant-time performance for the basic
operations (get and put), assuming the hash function disperses the elements
properly among the buckets."
<http://docs.oracle.com/javase/7/docs/api/java/util/HashMap.html>

Each specification refers to the other. Ergo they are meant to be considered
together. Taken together, the documentation clearly specifies that "correct" or
"proper" includes performance considerations. Therefore, by what you say, the
simple "return 1;" is not correct.

It certainly would not be correct for the 'Object' implementation. "As much as
is reasonably practical, the hashCode method defined by class Object does
return distinct integers for distinct objects." [op. cit.]

As you say, Arne, "correct" means it follows the spec. The OP's suggested
implementation violates the spec on two fronts.

--
Lew
Honi soit qui mal y pense.

* Origin: Prism bbs (1:261/3
Time Warp of the Future BBS - telnet://time.synchro.net:24

Lew
Guest
Posts: n/a

 08-12-2012
To: Eric Sosman
From: Lew <(E-Mail Removed)>

Eric Sosman wrote:
> Okay: Then returning a constant 1 (or 42 or 0 or whatever)
> would in fact satisfy the letter of the law regarding hashCode():

Not if you consider all aspects of what the Javadocs promise.

> Whenever x.equals(y) is true, x.hashCode() == y.hashCode(). In
> your example this would be trivially true because x,y,z,... all
> have the same hashCode() value, whether they're equal or not --
> You have lived up to the letter of the law.

No, because the law requires that the method support 'HashMap', which in turn
calls for "properly" hashed objects.

> Of course, such a hashCode() would make all those hash-based
> containers pretty much useless: They would work in the sense that
> they would get the Right Answer, but they'd be abominably slow,

Indeed.

> with expected performance of O(N) instead of O(1). See
> <http://www.cs.rice.edu/~scrosby/hash/CrosbyWallach_UsenixSec2003/>
> for a survey of some denial-of-service attacks that work by driving
> hash tables from O(1) to O(N), resulting in catastrophic failure
> of the attacked system.
>
> In other words, the letter of the law on hashCode() is a bare
> minimum that guarantees correct functioning, but it is not enough
> to guarantee usability. Why isn't the law more specific? Because

Actually, if you consider all that the Javadocs tell you, this "letter of the
law" to which you refer is like saying the sequence "ABC" constitutes all of
"the ABCs".

> nobody knows how to write "hashCode() must be correct *and* usable"
> in terms that would cover all the classes all the Java programmers
> have dreamed up and will dream up. Your hashCode() meets the bare
> minimum requirement, but is not "usable." The actual hashCode()
> provided by Object also meets the bare minimum requirement, and *is*
> usable as it stands, until (and unless; you don't HAVE to) you
> choose to implement other equals() semantics, and a hashCode() to
> match them.

As Arne states, "correct" means "fulfills the specification". The specification
for Java API methods is the standard Javadocs, which do impose performance
considerations on 'hashCode()'.

One understands that the spec isn't always fully enforceable by the compiler.
[1] It is correct that the compiler will allow 'return 1;'. It is not correct
that that fulfills the specification.

[1] Doesn't one?

--
Lew
Honi soit qui mal y pense.

* Origin: Prism bbs (1:261/3
Time Warp of the Future BBS - telnet://time.synchro.net:24

Lew
Guest
Posts: n/a

 08-12-2012
To: Jan Burse
From: Lew <(E-Mail Removed)>

Jan Burse wrote:
> Maybe it would make sense to spell out what the contract
> for hashCode() is. Well the contract is simply, the
> following invariant should hold:
>
> /* invariant that should hold */
> if a.equals(b) then a.hashCode()==b.hashCode()

True, but if you read the specification for 'hashCode()' fully, that is not the
entire contract, only the compiler-enforceable part of it.

The entire specification requires that as much as feasible, the 'Object'
implementation distinguish distinct instances, and that the method generally
support 'HashMap', which promises O(1) 'get()' and 'put()' with a "proper"
(i.e., compliant) 'hashCode()'.

--
Lew
Honi soit qui mal y pense.

* Origin: Prism bbs (1:261/3
Time Warp of the Future BBS - telnet://time.synchro.net:24

Lew
Guest
Posts: n/a

 08-12-2012
To: Lew
From: Lew <(E-Mail Removed)>

Lew wrote:
> Jan Burse wrote:
>> Maybe it would make sense to spell out what the contract
>> for hashCode() is. Well the contract is simply, the
>> following invariant should hold:
>>
>> /* invariant that should hold */
>> if a.equals(b) then a.hashCode()==b.hashCode()

>
> True, but if you read the specification for 'hashCode()' fully, that is not
> the entire contract, only the compiler-enforceable part of it.

Oooops!

Not even that is compiler-enforceable.

--
Lew
Honi soit qui mal y pense.

* Origin: Prism bbs (1:261/3
Time Warp of the Future BBS - telnet://time.synchro.net:24

Arne VajhÃ¸j
Guest
Posts: n/a

 08-12-2012
To: Lew
From: =?UTF-8?B?QXJuZSBWYWpow7hq?= <(E-Mail Removed)>

On 8/11/2012 7:24 PM, Lew wrote:
> On 08/10/2012 04:30 PM, Arne Vajh-,j wrote:
>> On 8/10/2012 6:32 PM, Lew wrote:
>>> bob smith wrote:
>>>> Now, there are cases where you HAVE to override it, or your code is
>>>> very
>>>> broken.
>>>
>>> No.

>>
>>> As long as 'hashCode()' fulfills the contract, your code will work -
>>> 'hashCode()' could and likely will noticeably affect performance.
>>> There is
>>> more to correctness
>>> than mere functional conformance.

>>
>> If the code per specs is guaranteed to work then it is correct.
>>
>> Good (or just decent) performance is not necessary for code to
>> be correct.
>>
>> At least not in the traditional programming terminology.
>>
>> In plain English maybe.

>
> I see your point, but that is not to say that the specs exclude
> performance considerations.
>
> In the case of 'hashCode()', the Javadocs do say, "This method is
> supported for the benefit of hash tables such as those provided by
> HashMap."
> <http://docs.oracle.com/javase/7/docs/api/java/lang/Object.html#hashCode()>
>
> The key question here is how you define "benefit". I argue that a hash
> code that is constant does not benefit, say, a 'HashMap' because one of
> our desired uses is constant-order retrieval.

Object having the method defined to support effective hashing does not imply
that it has to it just means that the potential is there.

> "This implementation provides constant-time performance for the basic
> operations (get and put), assuming the hash function disperses the
> elements properly among the buckets."

Yes. And here it makes an assumption. Not that hashCode is implemented correct,
but that it is implemented in a certain way.

> Each specification refers to the other. Ergo they are meant to be
> considered together. Taken together, the documentation clearly specifies
> that "correct" or "proper" includes performance considerations.
> Therefore, by what you say, the simple "return 1;" is not correct.

> As you say, Arne, "correct" means it follows the spec. The OP's
> suggested implementation violates the spec on two fronts.

No it does not.

It follows exactly the explicit stated contract in the Java doc:

<quote>
The general contract of hashCode is:

Whenever it is invoked on the same object more than once during an
execution of a Java application, the hashCode method must consistently return
the same integer, provided no information used in equals comparisons on the
object is modified. This integer need not remain consistent from one execution
of an application to another execution of the same application.
If two objects are equal according to the equals(Object) method,
then calling the hashCode method on each of the two objects must produce the
same integer result.
It is not required that if two objects are unequal according to the
equals(java.lang.Object) method, then calling the hashCode method on each of
the two objects must produce distinct integer results. However, the programmer
should be aware that producing distinct integer results for unequal objects may
improve the performance of hashtables.
</quote>

The ability to support something does not make it part of the contract.

This is a classic test question in basic Java SE. And that returning a constant
is correct but not smart should be in most Java SE text books.

Arne

* Origin: Prism bbs (1:261/3
Time Warp of the Future BBS - telnet://time.synchro.net:24

Arne VajhÃ¸j
Guest
Posts: n/a

 08-12-2012
To: Lew
From: =?UTF-8?B?QXJuZSBWYWpow7hq?= <(E-Mail Removed)>

On 8/11/2012 7:29 PM, Lew wrote:
> Eric Sosman wrote:
>> Okay: Then returning a constant 1 (or 42 or 0 or whatever)
>> would in fact satisfy the letter of the law regarding hashCode():

>
> Not if you consider all aspects of what the Javadocs promise.
>
>
>> Whenever x.equals(y) is true, x.hashCode() == y.hashCode(). In
>> your example this would be trivially true because x,y,z,... all
>> have the same hashCode() value, whether they're equal or not --
>> You have lived up to the letter of the law.

>
> No, because the law requires that the method support 'HashMap', which in
> turn calls for "properly" hashed objects.
>
>> Of course, such a hashCode() would make all those hash-based
>> containers pretty much useless: They would work in the sense that
>> they would get the Right Answer, but they'd be abominably slow,

>
> Indeed.
>
>> with expected performance of O(N) instead of O(1). See
>> <http://www.cs.rice.edu/~scrosby/hash/CrosbyWallach_UsenixSec2003/>
>> for a survey of some denial-of-service attacks that work by driving
>> hash tables from O(1) to O(N), resulting in catastrophic failure
>> of the attacked system.
>>
>> In other words, the letter of the law on hashCode() is a bare
>> minimum that guarantees correct functioning, but it is not enough
>> to guarantee usability. Why isn't the law more specific? Because

>
> Actually, if you consider all that the Javadocs tell you, this "letter
> of the law" to which you refer is like saying the sequence "ABC"
> constitutes all of "the ABCs".
>
>> nobody knows how to write "hashCode() must be correct *and* usable"
>> in terms that would cover all the classes all the Java programmers
>> have dreamed up and will dream up. Your hashCode() meets the bare
>> minimum requirement, but is not "usable." The actual hashCode()
>> provided by Object also meets the bare minimum requirement, and *is*
>> usable as it stands, until (and unless; you don't HAVE to) you
>> choose to implement other equals() semantics, and a hashCode() to
>> match them.

>
> As Arne states, "correct" means "fulfills the specification". The
> specification for Java API methods is the standard Javadocs, which do
> impose performance considerations on 'hashCode()'.
>
> One understands that the spec isn't always fully enforceable by the
> compiler. [1] It is correct that the compiler will allow 'return 1;'. It
> is not correct that that fulfills the specification.

It fulfills the spec.

It does not fulfill you bizarre interpretation of "support".

Arne

* Origin: Prism bbs (1:261/3
Time Warp of the Future BBS - telnet://time.synchro.net:24

Arne VajhÃ¸j
Guest
Posts: n/a

 08-12-2012
To: Lew
From: =?UTF-8?B?QXJuZSBWYWpow7hq?= <(E-Mail Removed)>

On 8/11/2012 7:34 PM, Lew wrote:
> Jan Burse wrote:
>> Maybe it would make sense to spell out what the contract
>> for hashCode() is. Well the contract is simply, the
>> following invariant should hold:
>>
>> /* invariant that should hold */
>> if a.equals(b) then a.hashCode()==b.hashCode()

>
> True, but if you read the specification for 'hashCode()' fully, that is
> not the entire contract, only the compiler-enforceable part of it.
>
> The entire specification requires that as much as feasible, the 'Object'
> implementation distinguish distinct instances, and that the method
> generally support 'HashMap', which promises O(1) 'get()' and 'put()'
> with a "proper" (i.e., compliant) 'hashCode()'.

Two wrong statements.

It says that the method is defined to support HashMap

And HashMap does not guarantee O(1) with a correct hashCode - it guarantee that
for one that return good distributed values.

Arne

* Origin: Prism bbs (1:261/3
Time Warp of the Future BBS - telnet://time.synchro.net:24