2020年10月21日 22:38:10go评论84阅读模式

英文:

Random integer as hashCode

问题

这是否是一个好主意，在构造函数中生成一个随机数，并从 hashCode 方法中返回此值？存在碰撞的可能性，但这适用于在编写自己的 hashCode 方法时。那么有什么缺点呢？将该对象用于 HashMap 中时，它将以随机数作为哈希值进行存储，然后通过相同的随机数进行检索。如果存在碰撞，将通过 equals 方法来解决。

英文:

Is it a good idea to generate a random number in the constructor and return this value from hashCode method?
There is a chance for collisions, but this applies when writing your own hashCode method. So what are the drawbacks?
When using this object in a HashMap it will be stored with the random number as a hash and then retrieved by the same. If there are collisions they will be resolved by equals.

答案1

得分: 7

hashCode 合约指明，除其他内容外，

> 如果两个对象根据 equals(Object) 方法判定为相等，则在这两个对象上调用 hashCode 方法必须产生相同的整数结果。

因此，不，将其设为随机值是一个不好的想法。

英文:

The hashCode contract specifies, among other things, that

> If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.

So no, making it random is a bad idea.

答案2

得分: 1

基本回答

在典型的用例中，您希望检索与用于将其插入哈希集合的相同（从标识角度看）键对象不同，而是使用在逻辑上等效的对象（从相等角度看）。

因此，您希望以下代码在映射中查找元素

Key k1 = new Key("param1", "param2");
Key k2 = new Key("param1", "param2");
Value val1 = new Value();

map.put(k1, value);

Value val2 = map.get(k2);

如果 hashCode 仅基于随机数，那么 k1 和 k2 具有不同的哈希值，因此可能指向 HashMap 的不同存储桶。

我们展示了对于实现了 equals 的对象，随机的 hashCode 是一个不好的想法。
这是哈希码契约的一部分的推理：
> 如果两个对象根据 equals(Object) 方法是相等的，则对这两个对象中的每个对象调用 hashCode 方法必须产生相同的整数结果。

稍微深入一点的部分：

让我们看一下 Object.hashCode 的实现。
注意，身份 hashCode() 的实现取决于 JVM。

查看 JDK 源代码，get_next_hash 提供了六种计算 hashCode 的方法：

0. 随机生成的数字。
1. 对象的内存地址的函数。
2. 硬编码的 1（用于敏感性测试）。
3. 一个序列。
4. 对象的内存地址，强制转换为 int。
5. 线程状态与 xorshift 结合（https://en.wikipedia.org/wiki/Xorshift）

参考代码：

static inline intptr_t get_next_hash(Thread * Self, oop obj) {
  intptr_t value = 0 ;
  if (hashCode == 0) {
     // 此形式使用了一个不受保护的全局 Park-Miller 随机数生成器，
     // 因此两个线程可能会竞争并生成相同的随机数。
     // 在多处理器系统中，我们将频繁访问全局变量，因此机制会引发大量的一致性流量。
     value = os::random() ;
  } else
  if (hashCode == 1) {
     // 此变体的特点是在 STW 操作之间是稳定的（幂等的）。
     // 这在某些 1-0 同步方案中可能是有用的。
     intptr_t addrBits = cast_from_oop<intptr_t>(obj) >> 3 ;
     value = addrBits ^ (addrBits >> 5) ^ GVars.stwRandom ;
  } else
  if (hashCode == 2) {
     value = 1 ;            // 用于敏感性测试
  } else
  if (hashCode == 3) {
     value = ++GVars.hcSequence ;
  } else
  if (hashCode == 4) {
     value = cast_from_oop<intptr_t>(obj) ;
  } else {
     // Marsaglia 的 xor-shift 方案与线程特定状态
     // 这可能是总体上最好的实现 —— 在未来的版本中，我们可能会将其作为默认值。
     unsigned t = Self->_hashStateX ;
     t ^= (t << 11) ;
     Self->_hashStateX = Self->_hashStateY ;
     Self->_hashStateY = Self->_hashStateZ ;
     Self->_hashStateZ = Self->_hashStateW ;
     unsigned v = Self->_hashStateW ;
     v = (v ^ (v >> 19)) ^ (t ^ (t >> 8)) ;
     Self->_hashStateW = v ;
     value = v ;
  }

  value &= markOopDesc::hash_mask;
  if (value == 0) value = 0xBAD ;
  assert (value != markOopDesc::no_hash, "invariant") ;
  TEVENT (hashCode: GENERATE) ;
  return value;
}

OpenJDK 7 和 OpenJDK 6 都使用第一种方法，即随机数生成器。

OpenJDK 8 将默认值更改为 5 - 线程状态与 xorshift 结合。

请注意，所有这些 Object.hashCode 的实现都符合 Object.equals（就哈希码契约而言）。

总之，您将实现 Object.hashCode 背后的策略之一，但如果实现 equals，有可能违反契约。

英文:

Basic answer

In the typical use case, you want to retrieve not with the same (in terms of identity) key object that was used to insert it in the hashing collection, but with an object which is logically equivalent (in terms of equal).

Thus, you want the following code to find the element in the map

Key k1 = new Key(&quot;param1&quot;, &quot;param2&quot;);
Key k2 = new Key(&quot;param1&quot;, &quot;param2&quot;);
Value val1 = new Value();

map.put(k1, value);

Value val2 = map.get(k2);

If hashCode is based solely on random number, than k1 and k2 have different hashes, and thus may point you to different buckets of the HashMap.

We showed that for objects implementing equals, random hashCode is a poor idea.
This is the reasoning behind the part of hash code contract:
> If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.

A bit more advanced part:

Let's take a look at Object.hashCode implementation.
Note that the identity hashCode() implementation is dependant on the JVM.

Looking at JDK sources, get_next_hash offers six methods to calculate hashCode:

0. A randomly generated number.
1. A function of memory address of the object.
2. A hardcoded 1 (used for sensitivity testing.)
3. A sequence.
4. The memory address of the object, cast to int.
5. Thread state combined with xorshift (https://en.wikipedia.org/wiki/Xorshift)

Code for reference:

static inline intptr_t get_next_hash(Thread * Self, oop obj) {
  intptr_t value = 0 ;
  if (hashCode == 0) {
     // This form uses an unguarded global Park-Miller RNG,
     // so it&#39;s possible for two threads to race and generate the same RNG.
     // On MP system we&#39;ll have lots of RW access to a global, so the
     // mechanism induces lots of coherency traffic.
     value = os::random() ;
  } else
  if (hashCode == 1) {
     // This variation has the property of being stable (idempotent)
     // between STW operations.  This can be useful in some of the 1-0
     // synchronization schemes.
     intptr_t addrBits = cast_from_oop&lt;intptr_t&gt;(obj) &gt;&gt; 3 ;
     value = addrBits ^ (addrBits &gt;&gt; 5) ^ GVars.stwRandom ;
  } else
  if (hashCode == 2) {
     value = 1 ;            // for sensitivity testing
  } else
  if (hashCode == 3) {
     value = ++GVars.hcSequence ;
  } else
  if (hashCode == 4) {
     value = cast_from_oop&lt;intptr_t&gt;(obj) ;
  } else {
     // Marsaglia&#39;s xor-shift scheme with thread-specific state
     // This is probably the best overall implementation -- we&#39;ll
     // likely make this the default in future releases.
     unsigned t = Self-&gt;_hashStateX ;
     t ^= (t &lt;&lt; 11) ;
     Self-&gt;_hashStateX = Self-&gt;_hashStateY ;
     Self-&gt;_hashStateY = Self-&gt;_hashStateZ ;
     Self-&gt;_hashStateZ = Self-&gt;_hashStateW ;
     unsigned v = Self-&gt;_hashStateW ;
     v = (v ^ (v &gt;&gt; 19)) ^ (t ^ (t &gt;&gt; 8)) ;
     Self-&gt;_hashStateW = v ;
     value = v ;
  }

  value &amp;= markOopDesc::hash_mask;
  if (value == 0) value = 0xBAD ;
  assert (value != markOopDesc::no_hash, &quot;invariant&quot;) ;
  TEVENT (hashCode: GENERATE) ;
  return value;
}

Both OpenJDK 7 and OpenJDK 6 use the first method, a random number generator.

OpenJDK 8 changed the default to 5 - Thread state combined with xorshift.

Note that all of these implementations of Object.hashCode are consistent with Object.equals (in terms of hash code contract)

To sum up, you would be implementing one of the strategies behind Object.hashCode, but taking a risk of breaking the contract if you implement equals.

答案3

得分: 1

如果什么都不做，并使用 Object.hashCode()（对象的“内存地址”），那么基本上就可以实现您想要的效果。因此，您可以拥有任何类的 HashMap/HashSet 键。

仍然要保证安全的 equals。

英文:

If you do nothing and use Object.hashCode() (the "memory address" of the object), then you more or less have what you want to achieve. Hence you can have a HashMap/HashSet key of any class.

Still have a safe equals.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

随机整数作为哈希码

问题

答案1

答案2

答案3

质疑代码执行顺序

`micronaut @RequestScope` – 不会为每个传入的 HTTP 请求创建 bean。

Java项目用于识别回文数字，返回的是与匹配输出不符合。

使用Spring Starter Web和WebClient与Spring Security。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论