Object.hashCode implementation

how-to
Jul 16, 200814 mins

After the previous post, I just had to look. The implementation of Object.equals is, as was previously noted, just “return this == obj”, but the implementation of Object.hashCode is far more complicated.

Taken straight from the latest hg-pulled OpenJDK sources, Object.hashCode is a native method registered from Object.c that calls into a Hotspot-exported function, JVM_IHashCode(), from hotspotsrcsharevmprimsjvm.cpp:

JVM_ENTRY(jint, JVM_IHashCode(JNIEnv* env, jobject handle))<br>
JVMWrapper(<span style="color: #006080">"JVM_IHashCode"</span>);<br>
<span style="color: #008000">// as implemented in the classic virtual machine; return
0 if object is NULL</span>
<br>
<span style="color: #0000ff">return</span> handle == NULL ? 0 : ObjectSynchronizer::FastHashCode
(THREAD, JNIHandles::resolve_non_null(handle)) ;<br>
JVM_END<br>

which in turn calls ObjectSynchronizer::FastHashCode, defined in hotspotsrcsharevmruntimesynchronizer.cpp as:

intptr_t ObjectSynchronizer::FastHashCode (Thread * Self, oop obj) {<br>
<span style="color: #0000ff">if</span> (UseBiasedLocking) {<br>
<span style="color: #008000">// NOTE: many places throughout the JVM do not expect
a safepoint</span>
<br>
<span style="color: #008000">// to be taken here, in particular most operations on
perm gen</span>
<br>
<span style="color: #008000">// objects. However, we only ever bias Java instances
and all of</span>
<br>
<span style="color: #008000">// the call sites of identity_hash that might revoke
biases have</span>
<br>
<span style="color: #008000">// been checked to make sure they can handle a safepoint.
The</span>
<br>
<span style="color: #008000">// added check of the bias pattern is to avoid useless
calls to</span>
<br>
<span style="color: #008000">// thread-local storage.</span>
<br>
<span style="color: #0000ff">if</span> (obj->mark()->has_bias_pattern()) {<br>
<span style="color: #008000">// Box and unbox the raw reference just in case we cause
a STW safepoint.</span>
<br>
Handle hobj (Self, obj) ;<br>
<span style="color: #008000">// Relaxing assertion for bug 6320749.</span>
<br>
assert (Universe::verify_in_progress() ||<br>
!SafepointSynchronize::is_at_safepoint(),<br>
<span style="color: #006080">"biases should not be seen by VM thread here"</span>);<br>
BiasedLocking::revoke_and_rebias(hobj, false, JavaThread::current());<br>
obj = hobj() ;<br>
assert(!obj->mark()->has_bias_pattern(), <span style="color: #006080">"biases
should be revoked by now"</span>);<br>
}<br>
}<br>
<br>
<span style="color: #008000">// hashCode() is a heap mutator ...</span>
<br>
<span style="color: #008000">// Relaxing assertion for bug 6320749.</span>
<br>
assert (Universe::verify_in_progress() ||<br>
!SafepointSynchronize::is_at_safepoint(), <span style="color: #006080">"invariant"</span>)
;<br>
assert (Universe::verify_in_progress() ||<br>
Self->is_Java_thread() , <span style="color: #006080">"invariant"</span>) ;<br>
assert (Universe::verify_in_progress() ||<br>
((JavaThread *)Self)->thread_state() != _thread_blocked, <span style="color: #006080">"invariant"</span>)
;<br>
<br>
ObjectMonitor* monitor = NULL;<br>
markOop temp, test;<br>
intptr_t hash;<br>
markOop mark = ReadStableMark (obj);<br>
<br>
<span style="color: #008000">// object should remain ineligible for biased locking</span>
<br>
assert (!mark->has_bias_pattern(), <span style="color: #006080">"invariant"</span>)
;<br>
<br>
<span style="color: #0000ff">if</span> (mark->is_neutral()) {<br>
hash = mark->hash(); <span style="color: #008000">// this is a normal header</span>
<br>
<span style="color: #0000ff">if</span> (hash) { <span style="color: #008000">// if
it has hash, just return it</span>
<br>
<span style="color: #0000ff">return</span> hash;<br>
}<br>
hash = get_next_hash(Self, obj); <span style="color: #008000">// allocate a new hash
code</span>
<br>
temp = mark->copy_set_hash(hash); <span style="color: #008000">// merge the hash
code into header</span>
<br>
<span style="color: #008000">// use (machine word version) atomic operation to install
the hash</span>
<br>
test = (markOop) Atomic::cmpxchg_ptr(temp, obj->mark_addr(), mark);<br>
<span style="color: #0000ff">if</span> (test == mark) {<br>
<span style="color: #0000ff">return</span> hash;<br>
}<br>
<span style="color: #008000">// If atomic operation failed, we must inflate the header</span>
<br>
<span style="color: #008000">// into heavy weight monitor. We could add more code
here</span>
<br>
<span style="color: #008000">// for fast path, but it does not worth the complexity.</span>
<br>
} <span style="color: #0000ff">else</span> <span style="color: #0000ff">if</span> (mark->has_monitor())
{<br>
monitor = mark->monitor();<br>
temp = monitor->header();<br>
assert (temp->is_neutral(), <span style="color: #006080">"invariant"</span>) ;<br>
hash = temp->hash();<br>
<span style="color: #0000ff">if</span> (hash) {<br>
<span style="color: #0000ff">return</span> hash;<br>
}<br>
<span style="color: #008000">// Skip to the following code to reduce code size</span>
<br>
} <span style="color: #0000ff">else</span> <span style="color: #0000ff">if</span> (Self->is_lock_owned((address)mark->locker()))
{<br>
temp = mark->displaced_mark_helper(); <span style="color: #008000">// this is a
lightweight monitor owned</span>
<br>
assert (temp->is_neutral(), <span style="color: #006080">"invariant"</span>) ;<br>
hash = temp->hash(); <span style="color: #008000">// by current thread, check if
the displaced</span>
<br>
<span style="color: #0000ff">if</span> (hash) { <span style="color: #008000">// header
contains hash code</span>
<br>
<span style="color: #0000ff">return</span> hash;<br>
}<br>
<span style="color: #008000">// WARNING:</span>
<br>
<span style="color: #008000">// The displaced header is strictly immutable.</span>
<br>
<span style="color: #008000">// It can NOT be changed in ANY cases. So we have</span>
<br>
<span style="color: #008000">// to inflate the header into heavyweight monitor</span>
<br>
<span style="color: #008000">// even the current thread owns the lock. The reason</span>
<br>
<span style="color: #008000">// is the BasicLock (stack slot) will be asynchronously</span>
<br>
<span style="color: #008000">// read by other threads during the inflate() function.</span>
<br>
<span style="color: #008000">// Any change to stack may not propagate to other threads</span>
<br>
<span style="color: #008000">// correctly.</span>
<br>
}<br>
<br>
<span style="color: #008000">// Inflate the monitor to set hash code</span>
<br>
monitor = ObjectSynchronizer::inflate(Self, obj);<br>
<span style="color: #008000">// Load displaced header and check it has hash code</span>
<br>
mark = monitor->header();<br>
assert (mark->is_neutral(), <span style="color: #006080">"invariant"</span>) ;<br>
hash = mark->hash();<br>
<span style="color: #0000ff">if</span> (hash == 0) {<br>
hash = get_next_hash(Self, obj);<br>
temp = mark->copy_set_hash(hash); <span style="color: #008000">// merge hash code
into header</span>
<br>
assert (temp->is_neutral(), <span style="color: #006080">"invariant"</span>) ;<br>
test = (markOop) Atomic::cmpxchg_ptr(temp, monitor, mark);<br>
<span style="color: #0000ff">if</span> (test != mark) {<br>
<span style="color: #008000">// The only update to the header in the monitor (outside
GC)</span>
<br>
<span style="color: #008000">// is install the hash code. If someone add new usage
of</span>
<br>
<span style="color: #008000">// displaced header, please update this code</span>
<br>
hash = test->hash();<br>
assert (test->is_neutral(), <span style="color: #006080">"invariant"</span>) ;<br>
assert (hash != 0, <span style="color: #006080">"Trivial unexpected object/monitor
header usage."</span>);<br>
}<br>
}<br>
<span style="color: #008000">// We finally get the hash</span>
<br>
<span style="color: #0000ff">return</span> hash;<br>
}<br>

Hope this answers all the debates. 🙂

Editor’s note: Yes, I know it’s a long quotation of code completely out of context; my goal here is simply to suggest that the hashCode() implementation is not just a integerification of the object’s address in memory, as was suggested in other discussions. For whatever it’s worth, the get_next_hash() implementation that’s referenced in the FastHashCode() method looks like:

<span style="color: #008000">//
hashCode() generation :</span>
<br>
<span style="color: #008000">//</span>
<br>
<span style="color: #008000">// Possibilities:</span>
<br>
<span style="color: #008000">// * MD5Digest of {obj,stwRandom}</span>
<br>
<span style="color: #008000">// * CRC32 of {obj,stwRandom} or any linear-feedback
shift register function.</span>
<br>
<span style="color: #008000">// * A DES- or AES-style SBox[] mechanism</span>
<br>
<span style="color: #008000">// * One of the Phi-based schemes, such as:</span>
<br>
<span style="color: #008000">// 2654435761 = 2^32 * Phi (golden ratio)</span>
<br>
<span style="color: #008000">// HashCodeValue = ((uintptr_t(obj) >> 3) * 2654435761)
^ GVars.stwRandom ;</span>
<br>
<span style="color: #008000">// * A variation of Marsaglia's shift-xor RNG scheme.</span>
<br>
<span style="color: #008000">// * (obj ^ stwRandom) is appealing, but can result</span>
<br>
<span style="color: #008000">// in undesirable regularity in the hashCode values of
adjacent objects</span>
<br>
<span style="color: #008000">// (objects allocated back-to-back, in particular). This
could potentially</span>
<br>
<span style="color: #008000">// result in hashtable collisions and reduced hashtable
efficiency.</span>
<br>
<span style="color: #008000">// There are simple ways to "diffuse" the middle address
bits over the</span>
<br>
<span style="color: #008000">// generated hashCode values:</span>
<br>
<span style="color: #008000">//</span>
<br>
<br>
<span style="color: #0000ff">static</span> <span style="color: #0000ff">inline</span> intptr_t
get_next_hash(Thread * Self, oop obj) {<br>
intptr_t value = 0 ;<br>
<span style="color: #0000ff">if</span> (hashCode == 0) {<br>
<span style="color: #008000">// This form uses an unguarded global Park-Miller RNG,</span>
<br>
<span style="color: #008000">// so it's possible for two threads to race and generate
the same RNG.</span>
<br>
<span style="color: #008000">// On MP system we'll have lots of RW access to a global,
so the</span>
<br>
<span style="color: #008000">// mechanism induces lots of coherency traffic.</span>
<br>
value = os::random() ;<br>
} <span style="color: #0000ff">else</span>
<br>
<span style="color: #0000ff">if</span> (hashCode == 1) {<br>
<span style="color: #008000">// This variation has the property of being stable (idempotent)</span>
<br>
<span style="color: #008000">// between STW operations. This can be useful in some
of the 1-0</span>
<br>
<span style="color: #008000">// synchronization schemes.</span>
<br>
intptr_t addrBits = intptr_t(obj) >> 3 ;<br>
value = addrBits ^ (addrBits >> 5) ^ GVars.stwRandom ;<br>
} <span style="color: #0000ff">else</span>
<br>
<span style="color: #0000ff">if</span> (hashCode == 2) {<br>
value = 1 ; <span style="color: #008000">// for sensitivity testing</span>
<br>
} <span style="color: #0000ff">else</span>
<br>
<span style="color: #0000ff">if</span> (hashCode == 3) {<br>
value = ++GVars.hcSequence ;<br>
} <span style="color: #0000ff">else</span>
<br>
<span style="color: #0000ff">if</span> (hashCode == 4) {<br>
value = intptr_t(obj) ;<br>
} <span style="color: #0000ff">else</span> {<br>
<span style="color: #008000">// Marsaglia's xor-shift scheme with thread-specific
state</span>
<br>
<span style="color: #008000">// This is probably the best overall implementation --
we'll</span>
<br>
<span style="color: #008000">// likely make this the default in future releases.</span>
<br>
<span style="color: #0000ff">unsigned</span> t = Self->_hashStateX ;<br>
t ^= (t << 11) ;<br>
Self->_hashStateX = Self->_hashStateY ;<br>
Self->_hashStateY = Self->_hashStateZ ;<br>
Self->_hashStateZ = Self->_hashStateW ;<br>
<span style="color: #0000ff">unsigned</span> v = Self->_hashStateW ;<br>
v = (v ^ (v >> 19)) ^ (t ^ (t >> 8)) ;<br>
Self->_hashStateW = v ;<br>
value = v ;<br>
}<br>
<br>
value &= markOopDesc::hash_mask;<br>
<span style="color: #0000ff">if</span> (value == 0) value = 0xBAD ;<br>
assert (value != markOopDesc::no_hash, <span style="color: #006080">"invariant"</span>)
;<br>
TEVENT (hashCode: GENERATE) ;<br>
<span style="color: #0000ff">return</span> value;<br>
}<br>

Thus (hopefully) putting the idea that it might be allocating a hash based on the object’s identity completely to rest.

For the record, this is all from the OpenJDK source base–naturally, it’s possible that earlier VM implementations did something entirely different.


Enterprise consulting, mentoring or instruction. Java, C++, .NET or XML services. 1-day or multi-day workshops available. Contact me for details.

ted_neward

Ted Neward is an independent consultant specializing in high-scale enterprise systems, working with clients ranging in size from Fortune 500 corporations to small 10-person shops. He is an authority in Java and .NET technologies, particularly in the areas of Java/.NET integration (both in-process and via integration tools like Web services), back-end enterprise software systems, and virtual machine/execution engine plumbing. He lives in the Pacific Northwest with his wife, two sons, and eight PCs.

More from this author