英文:
Could Primitive Wrapper Classes (Integer, Double etc) and String be approximated as value based classes?
问题
Java 8引入了具有“基于值(value based)”特性的类,并且只在少数Java.Util(Optional)和Java.time类上标记了这一特性。
关于基于值类的标准在这里有定义(链接:https://docs.oracle.com/javase/10/docs/api/java/lang/doc-files/ValueBased.html)。
这些标准中的大部分(除了严格禁止在与身份敏感操作中使用它们的指导方针和它们不应该具有可访问构造函数的事实)似乎适用于原始包装类,如Integer、Double(以及类似的不可变类,如String)。我理解从技术上讲,开发人员可能会在基于身份的操作中使用这些类(尽管他们不应该这样做),因此现在将它们标记为“基于值”,可能会导致向后兼容性问题。但除此之外,从概念上讲,这些包装类为什么不是基于值的,是否还有其他原因呢?
我倾向于将它们视为基于值的原因。
当你定义Integer myInt = 5
时,你实际上只对值5感兴趣,而不关心持有该值的引用(至少对于我能想到的大多数用例是这样)。同样,当你说String myStr = "hello world"
时,你实际上对值字面量“hello world”感兴趣,而不是它的引用。
英文:
Java introduced value based
classes with Java 8 and has only marked few Java.Util (Optional) and Java.time classes with it.
The criteria for value-based classes is defined here.
Most of these criteria (except the strict guidance against using them in identity-sensitive operations and the fact that they should not have accessible constructors) seem to fit for Primitive wrapper classes like Integer, Double (and similar immutable classes like String). I understand technically developers might have used these classes with identity-based operations (although they shouldn't have) and therefore marking them value based
now could cause backward-compatibility issues. But apart from this, would there be any other reason that why in concept, these wrapper classes not be value-based?
The reason I incline to think of them as value based.
When you define Integer myInt = 5
, you are really just interested in the value 5 and not in the reference holding this value (at least for most of the use cases that I can think of). Likewise when you say String myStr = "hello world"
, you are really interested in value literal "hello world" and not in it's reference.
答案1
得分: 4
基本上,你提到了最重要的原因,即为什么将这些类进行“基于值”的改造是不可行的:向后兼容性。
你可能已经注意到,在Java 9中,原始包装类型的构造函数已被弃用,这可能是朝着这个方向迈出的一步。然而,仅仅是不鼓励使用与标识敏感操作,而不是禁止,因此不能基于此进行破坏兼容性的更改。但是,可能会破坏标识敏感操作,这可能是从“基于值”类中获得后续实际优势的唯一途径。
对于像String
、BigInteger
和BigDecimal
这样的类,JDK开发人员甚至不敢采取废弃构造函数的步骤,很可能是因为那样做会太过扰乱。对于某些构造函数,甚至没有等效的工厂方法。
但是不仅仅是公共构造函数。
请看,valueOf
方法的文档,以[Integer
为例]:
> 这个方法将始终缓存范围在-128到127之间的值,包括...
因此,当使用工厂方法时,对于某些情况仍然会获得指定的标识行为。
这就引出了[JLS §5.1.7]:
> 如果被装箱的值p是求值为类型为boolean
、byte
、char
、short
、int
或long
的常量表达式的结果,并且结果是true
、false
,在包括'\u0000'
到'\u007f'
范围内的字符,或者在-128
到127
范围内的整数,则设a
和b
分别为p
的任意两个装箱转换的结果。始终成立 a == b
。
因此,即使语言规范了某些标识敏感操作的行为。
注意,规范尝试不指明编译后的代码实际上会使用valueOf
方法,而是制定自己的规则(正式上只适用于编译时常量),但这并没有真正奏效。正如[这个答案]所记录的,规范的这部分经历了几次重写,因此当任何人对文字的理解变得字面时,保证会随着时间而改变...
其他保证更深刻地烙印在开发人员的脑海中:
JLS §15.29,常量表达式:
> 类型为String
的常量表达式总是被“interned”,以共享唯一实例,使用方法String.intern
。
这是一个关于对象标识的保证,是不可能拒绝的。
有趣的是,JLS §15.18.1 指出:
> 除非表达式是常量表达式(§15.29),否则将创建一个新的String对象(§12.5)。
目前还不清楚这个严格的措辞是否是有意为之,但如书面所述,它表明对于非常量字符串连接,必须生成具有独特标识的新对象。这是另一个开发人员不应依赖的指定行为。
因此,如果有人要设计一个没有遗留问题的新语言,从一开始将这些类型设计为值类型是没有问题的。设计者只需避免将过去认为是一个好主意的所有那些保证放入规范中。
[Integer
的文档]: https://docs.oracle.com/en/java/javase/15/docs/api/java.base/java/lang/Integer.html#valueOf(int) (java.lang.Integer.valueOf(int)的API文档)
JLS §5.1.7: https://docs.oracle.com/javase/specs/jls/se15/html/jls-5.html#jls-5.1.7-300 (Java®语言规范,§5.1.7 装箱转换)
[这个答案]: https://stackoverflow.com/a/59637974/2711488
§15.29: https://docs.oracle.com/javase/specs/jls/se15/html/jls-15.html#jls-15.29 (Java®语言规范,§15.29 常量表达式)
JLS §15.18.1: https://docs.oracle.com/javase/specs/jls/se15/html/jls-15.html#jls-15.18.1 (Java®语言规范,§15.18.1 字符串连接运算符 +)
§12.5: https://docs.oracle.com/javase/specs/jls/se15/html/jls-12.html#jls-12.5 (Java®语言规范,§12.5 创建新类实例)
英文:
Basically, you named the most important reason, why retrofitting these classes as value based is not feasible: backward compatibility.
You might have noticed that the constructors of the primitive wrapper types have been deprecated in Java 9, which would be a step into that direction. Still, using identity sensitive operations is only discouraged, not forbidden, so a change that breaks compatibility can not be made on that basis. But potentially breaking identity sensitive operations, would be the only thing that may enable subsequent practical advantages from being a value based class.
For classes like String
, BigInteger
, and BigDecimal
, the JDK developers did not even dare to make the step of deprecating the constructors, most likely because that would be too disrupting. For some constructors, there’s not even an equivalent factory method.
But there’s more than just the public constructors.
See, the documentation of the valueOf
methods, the one of Integer
exemplary:
> This method will always cache values in the range -128 to 127, inclusive, …
So when the factory method is used, you still get a specified identity behavior for some cases.
Which brings us to JLS §5.1.7:
> If the value p being boxed is the result of evaluating a constant expression (§15.29) of type boolean
, byte
, char
, short
, int
, or long
, and the result is true
, false
, a character in the range '\u0000'
to '\u007f'
inclusive, or an integer in the range -128
to 127
inclusive, then let a
and b
be the results of any two boxing conversions of p
. It is always the case that a == b
.
So even the language specifies the behavior of certain identity sensitive operations.
Note that the specification tries not to name the valueOf
method that the compiled code will use in practice, but to make up their own rules (which formally only apply to compile-time constants), which did not really pay off. As this answer documents, that part of the specification underwent several rewrites, so when anyone took the wording literally, the guarantees changed over time…
Other guarantees have burned into the developers’ minds much deeper:
JLS §15.29, Constant Expressions:
> Constant expressions of type String
are always "interned" so as to share unique instances, using the method String.intern
.
This is a guaranty about object identity that is impossible to turn down.
Interestingly, JLS §15.18.1 states:
> The String object is newly created (§12.5) unless the expression is a constant expression (§15.29).
It’s not clear whether this strict wording is intentional, but as written, it states that for non-constant string concatenation, it must produce a new object with a distinct identity. Yet another specified behavior that developers should not rely on.
So, if someone was to design a new language without legacies, there is nothing wrong with designing these types as value types in the first place. The designer just has to avoid to put all those guarantees into specification that were thought to be a good idea in the past.
答案2
得分: 2
以下是要翻译的内容:
value-based 类的目标是为真正的 值类型 打下基础,即类似于 C 中的 struct
类型。
值类型不是在堆上的对象,因此它们不使用 new
创建,也不是从 Object
派生的,因此它们没有对象监视器(锁)。
这就是为什么以下两条规则是 value-based 定义的一部分:
-
不使用诸如实例之间的引用相等性(==),实例的身份哈希码,或者在实例的内部锁上进行同步等与标识敏感操作;
-
没有可访问的构造函数,而是通过工厂方法实例化,该方法对返回实例的标识不作任何承诺;
正如 Brian Goetz 在 2015 年 1 月 6 日的评论:
Optional
是新内容,免责声明是在第一天就到位的。另一方面,Integer
可能已经被严重污染,我相信如果Integer
不再是可锁定的对象(尽管我们可能对这种做法持有不同看法),它肯定会破坏大量重要的代码。
参考: Value-Based Classes // nipafx
英文:
The goal of value-based classes is to lay the foundation for true value types, i.e. types similar to C's struct
.
Value types are not objects on the heap, so they are not created using new
, and they are not derived from Object
, so they don't have object monitors (locks).
That is why the following 2 rules are part of the definition for value-based:
-
make no use of identity-sensitive operations such as reference equality (==) between instances, identity hash code of instances, or synchronization on an instances's intrinsic lock;
-
do not have accessible constructors, but are instead instantiated through factory methods which make no commitment as to the identity of returned instances;
As commented by Brian Goetz on Jan 6, 2015:
> Optional
is new, and the disclaimers arrived on day 1. Integer
, on the other hand, is probably hopelessly polluted, and I am sure that it would break gobs of important code if Integer
ceased to be lockable (despite what we may think of such a practice.)
Reference: Value-Based Classes // nipafx
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论