随机整数作为哈希码

huangapple go评论84阅读模式
英文:

Random integer as hashCode

问题

这是否是一个好主意,在构造函数中生成一个随机数,并从 hashCode 方法中返回此值?存在碰撞的可能性,但这适用于在编写自己的 hashCode 方法时。那么有什么缺点呢?将该对象用于 HashMap 中时,它将以随机数作为哈希值进行存储,然后通过相同的随机数进行检索。如果存在碰撞,将通过 equals 方法来解决。

英文:

Is it a good idea to generate a random number in the constructor and return this value from hashCode method?
There is a chance for collisions, but this applies when writing your own hashCode method. So what are the drawbacks?
When using this object in a HashMap it will be stored with the random number as a hash and then retrieved by the same. If there are collisions they will be resolved by equals.

答案1

得分: 7

hashCode 合约 指明,除其他内容外,

> 如果两个对象根据 equals(Object) 方法判定为相等,则在这两个对象上调用 hashCode 方法必须产生相同的整数结果。

因此,不,将其设为随机值是一个不好的想法。

英文:

The hashCode contract specifies, among other things, that

> If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.

So no, making it random is a bad idea.

答案2

得分: 1

基本回答

在典型的用例中,您希望检索与用于将其插入哈希集合的相同(从标识角度看)键对象不同,而是使用在逻辑上等效的对象(从相等角度看)。

因此,您希望以下代码在映射中查找元素

Key k1 = new Key("param1", "param2");
Key k2 = new Key("param1", "param2");
Value val1 = new Value();

map.put(k1, value);

Value val2 = map.get(k2);

如果 hashCode 仅基于随机数,那么 k1 和 k2 具有不同的哈希值,因此可能指向 HashMap 的不同存储桶。

我们展示了对于实现了 equals 的对象,随机的 hashCode 是一个不好的想法。
这是 哈希码契约 的一部分的推理:
> 如果两个对象根据 equals(Object) 方法是相等的,则对这两个对象中的每个对象调用 hashCode 方法必须产生相同的整数结果。

稍微深入一点的部分:

让我们看一下 Object.hashCode 的实现
注意,身份 hashCode() 的实现取决于 JVM。

查看 JDK 源代码,get_next_hash 提供了六种计算 hashCode 的方法:

0. 随机生成的数字。
1. 对象的内存地址的函数。
2. 硬编码的 1(用于敏感性测试)。
3. 一个序列。
4. 对象的内存地址,强制转换为 int。
5. 线程状态与 xorshift 结合(https://en.wikipedia.org/wiki/Xorshift)

参考代码:

static inline intptr_t get_next_hash(Thread * Self, oop obj) {
  intptr_t value = 0 ;
  if (hashCode == 0) {
     // 此形式使用了一个不受保护的全局 Park-Miller 随机数生成器,
     // 因此两个线程可能会竞争并生成相同的随机数。
     // 在多处理器系统中,我们将频繁访问全局变量,因此机制会引发大量的一致性流量。
     value = os::random() ;
  } else
  if (hashCode == 1) {
     // 此变体的特点是在 STW 操作之间是稳定的(幂等的)。
     // 这在某些 1-0 同步方案中可能是有用的。
     intptr_t addrBits = cast_from_oop<intptr_t>(obj) >> 3 ;
     value = addrBits ^ (addrBits >> 5) ^ GVars.stwRandom ;
  } else
  if (hashCode == 2) {
     value = 1 ;            // 用于敏感性测试
  } else
  if (hashCode == 3) {
     value = ++GVars.hcSequence ;
  } else
  if (hashCode == 4) {
     value = cast_from_oop<intptr_t>(obj) ;
  } else {
     // Marsaglia 的 xor-shift 方案与线程特定状态
     // 这可能是总体上最好的实现 —— 在未来的版本中,我们可能会将其作为默认值。
     unsigned t = Self->_hashStateX ;
     t ^= (t << 11) ;
     Self->_hashStateX = Self->_hashStateY ;
     Self->_hashStateY = Self->_hashStateZ ;
     Self->_hashStateZ = Self->_hashStateW ;
     unsigned v = Self->_hashStateW ;
     v = (v ^ (v >> 19)) ^ (t ^ (t >> 8)) ;
     Self->_hashStateW = v ;
     value = v ;
  }

  value &= markOopDesc::hash_mask;
  if (value == 0) value = 0xBAD ;
  assert (value != markOopDesc::no_hash, "invariant") ;
  TEVENT (hashCode: GENERATE) ;
  return value;
}

OpenJDK 7 和 OpenJDK 6 都使用第一种方法,即随机数生成器。

OpenJDK 8 将默认值更改为 5 - 线程状态与 xorshift 结合。

请注意,所有这些 Object.hashCode 的实现都符合 Object.equals(就哈希码契约而言)。

总之,您将实现 Object.hashCode 背后的策略之一,但如果实现 equals,有可能违反契约。

英文:

Basic answer

In the typical use case, you want to retrieve not with the same (in terms of identity) key object that was used to insert it in the hashing collection, but with an object which is logically equivalent (in terms of equal).

Thus, you want the following code to find the element in the map

Key k1 = new Key(&quot;param1&quot;, &quot;param2&quot;);
Key k2 = new Key(&quot;param1&quot;, &quot;param2&quot;);
Value val1 = new Value();

map.put(k1, value);

Value val2 = map.get(k2);

If hashCode is based solely on random number, than k1 and k2 have different hashes, and thus may point you to different buckets of the HashMap.

We showed that for objects implementing equals, random hashCode is a poor idea.
This is the reasoning behind the part of hash code contract:
> If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.

A bit more advanced part:

Let's take a look at Object.hashCode implementation.
Note that the identity hashCode() implementation is dependant on the JVM.

Looking at JDK sources, get_next_hash offers six methods to calculate hashCode:

0. A randomly generated number.
1. A function of memory address of the object.
2. A hardcoded 1 (used for sensitivity testing.)
3. A sequence.
4. The memory address of the object, cast to int.
5. Thread state combined with xorshift (https://en.wikipedia.org/wiki/Xorshift)

Code for reference:

static inline intptr_t get_next_hash(Thread * Self, oop obj) {
  intptr_t value = 0 ;
  if (hashCode == 0) {
     // This form uses an unguarded global Park-Miller RNG,
     // so it&#39;s possible for two threads to race and generate the same RNG.
     // On MP system we&#39;ll have lots of RW access to a global, so the
     // mechanism induces lots of coherency traffic.
     value = os::random() ;
  } else
  if (hashCode == 1) {
     // This variation has the property of being stable (idempotent)
     // between STW operations.  This can be useful in some of the 1-0
     // synchronization schemes.
     intptr_t addrBits = cast_from_oop&lt;intptr_t&gt;(obj) &gt;&gt; 3 ;
     value = addrBits ^ (addrBits &gt;&gt; 5) ^ GVars.stwRandom ;
  } else
  if (hashCode == 2) {
     value = 1 ;            // for sensitivity testing
  } else
  if (hashCode == 3) {
     value = ++GVars.hcSequence ;
  } else
  if (hashCode == 4) {
     value = cast_from_oop&lt;intptr_t&gt;(obj) ;
  } else {
     // Marsaglia&#39;s xor-shift scheme with thread-specific state
     // This is probably the best overall implementation -- we&#39;ll
     // likely make this the default in future releases.
     unsigned t = Self-&gt;_hashStateX ;
     t ^= (t &lt;&lt; 11) ;
     Self-&gt;_hashStateX = Self-&gt;_hashStateY ;
     Self-&gt;_hashStateY = Self-&gt;_hashStateZ ;
     Self-&gt;_hashStateZ = Self-&gt;_hashStateW ;
     unsigned v = Self-&gt;_hashStateW ;
     v = (v ^ (v &gt;&gt; 19)) ^ (t ^ (t &gt;&gt; 8)) ;
     Self-&gt;_hashStateW = v ;
     value = v ;
  }

  value &amp;= markOopDesc::hash_mask;
  if (value == 0) value = 0xBAD ;
  assert (value != markOopDesc::no_hash, &quot;invariant&quot;) ;
  TEVENT (hashCode: GENERATE) ;
  return value;
}

Both OpenJDK 7 and OpenJDK 6 use the first method, a random number generator.

OpenJDK 8 changed the default to 5 - Thread state combined with xorshift.

Note that all of these implementations of Object.hashCode are consistent with Object.equals (in terms of hash code contract)

To sum up, you would be implementing one of the strategies behind Object.hashCode, but taking a risk of breaking the contract if you implement equals.

答案3

得分: 1

如果什么都不做,并使用 Object.hashCode()(对象的“内存地址”),那么基本上就可以实现您想要的效果。因此,您可以拥有任何类的 HashMap/HashSet 键。

仍然要保证安全的 equals

英文:

If you do nothing and use Object.hashCode() (the "memory address" of the object), then you more or less have what you want to achieve. Hence you can have a HashMap/HashSet key of any class.

Still have a safe equals.

huangapple
  • 本文由 发表于 2020年10月21日 22:38:10
  • 转载请务必保留本文链接:https://go.coder-hub.com/64465953.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定