为什么在这个String类方法中使用了局部变量?

huangapple go评论86阅读模式
英文:

Why is a local variable used in this String class method?

问题

Java 11 String的hashCode方法中,使用了一个名为"h"的本地整数变量来存储哈希字段的值。

这样做是为了性能原因还是仅仅是一种风格选择?类似问题的答案表示这是为了线程安全性,但是字符串是不可变的,所以这似乎不是情况。

英文:

In the Java 11 String's hashCode method a local integer variable "h" is used to store the hash field's value.

Is this done for performance reasons, or a mere stylistic choice? An answer to a similar question says it's for thread safety, but Strings are immutable so it doesn't seem the case

答案1

得分: 3

git blame 对该方法的责任归属于 JDK-8166842

从该问题来看,原因是:

JDK 9 String 代码的最新更改引入了一个非良性的数据竞争:

public int hashCode() {
    if (hash == 0 && value.length > 0) {
        hash = isLatin1() ? StringLatin1.hashCode(value)
                          : StringUTF16.hashCode(value);
    }
    return hash;
}

'hash' 字段应该只被读取一次到一个局部变量中。第二次有竞争的读取(在 return 处)可以读取到 0,而第一次(在 if 处)可以读取到非零值。

英文:

The git blame for that method refers to JDK-8166842.

From that issue, the reasoning was

>Latest change to JDK 9 String code introduced a non-benign data race:
>
>java
>public int hashCode() {
> if (hash == 0 && value.length > 0) {
> hash = isLatin1() ? StringLatin1.hashCode(value)
> : StringUTF16.hashCode(value);
> }
> return hash;
>}
>

>The 'hash' field should only be read once into a local variable. The second racy read (at return) can read 0 while the 1st (at if) can read non-zero.

答案2

得分: 0

虽然Holloway已经回答了问题本身,但我想澄清一个小误解。

String只是在不暴露用于变异的功能意义上是不可变的。但是,它仍然具有一个内部可变状态,具体来说是hash字段。

计算String的哈希值是一个相对简单的O(n)操作。并不灾难性,但对于较长的字符串来说也不是完全免费的。

由于字符串是Java几乎所有方面的基本组成部分,基本上到处都在使用,包括在Java内部功能中,确保String的实现具有良好的性能至关重要。即使是轻微的低效性也可能因其运行频率之高而被极大放大。因此,虽然哈希计算并不太糟糕,但我们仍然希望在绝对必要时才进行计算。

现在,我们有三种计算哈希的方法:

  1. 每次调用hashCode()时都计算它。换句话说,我们只在需要时才计算它。但这也意味着我们必须每次需要时都要重新计算它。对于HashMap等情况来说,这将是灾难性的。
  2. 在创建字符串时计算它一次,然后每次调用hashCode()时都返回相同的值。这使hashCode()成为免费操作,但对于不需要哈希的情况,创建字符串会更昂贵。由于在Java中不断创建大量的字符串,这是一个非可忽视的因素。
  3. 我们进行懒计算,即第一次调用hashCode()时计算它,然后在后续调用hashCode()时记住它。

选择了后一种选项,因为它找到了“不需要时不要计算它”和“不要多次计算它”的平衡点。这意味着String具有可变状态;它会更新其哈希码的内部缓存。但是,从外部观察者的角度来看,这种效果是不可见的(除了第一次调用hashCode()会稍微慢一些),这意味着字符串在效果上是不可变的。

英文:

While Holloway has answered the question itself, I'd like to clear up a minor misconception.

String is only immutable in the sense that it doesn't expose functionality for mutating it. However, it still does have an internal mutable state, specifically the hash field.

Calculating the hash of a String is a relatively simple O(n) operation. Not disastrous, but also not completely free, especially not for longer strings.

Since strings are a fundamental part of almost all aspects of Java and used basically everywhere, including in Java-internal functionality, ensuring good performance for the implementation of String is critical. Even a minor inefficiency may be enormously multiplied simply from how incredibly often it gets run. As such, while the hash calculation isn't too bad, it's still something we'd like to not have to do unless it's absolutely necessary.

Now, there are three ways we could go about calculating the hash:

  1. We calculate it every time hashCode() is called. In other words, we only calculate it when we need it. However, that also means we have to recalculate it every single time it's needed. This would be disastrous for HashMap and similar.
  2. Calculate it once when we create the string, then return the same value every time hashCode() is called. This makes hashCode() free, but makes the creation of a string more expensive for the situations where the hash isn't needed. Since a lot of strings are constantly created in Java, that's a non-negligible.
  3. We do it lazily, i.e. we calculate it the first time hashCode() is called and then remember it for subsequent calls to hashCode().

The latter option was chosen since it finds that sweet spot between "don't calculate it unless you need it" and "don't calculate it more than once". This does mean that String has a mutable state; it updates its internal cache of the hash code. However, from an outside observer, this effect is invisible (aside from the fact that the first call to hashCode() is slightly slower), meaning that the string is effectively immutable.

huangapple
  • 本文由 发表于 2023年7月10日 17:16:40
  • 转载请务必保留本文链接:https://go.coder-hub.com/76652336.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定