Java中的String intern函数和==操作符用于字符串比较。

huangapple go评论68阅读模式
英文:

Java String intern function and ==

问题

Here is the translation:

最近我在学习热点JVM。在学习字符串常量池和字符串intern函数时,我遇到了一个非常奇怪的情况。在查阅了许多答案后,我仍然无法解释这个现象,所以我决定向您请教。

    public static void main(String[] args) {
        String s1 = new String("12") + new String("21");
        s1.intern();
        String s2 = "1221";
        System.out.println(s1 == s2); // true
    }

    public static void main(String[] args) {
        String s1 = new String("12") + new String("21");
        // s1.intern();
        String s2 = "1221";
        System.out.println(s1 == s2); // false
    }

结果基于Java 8。

所以这两段代码唯一的区别就是是否调用了 s1.intern()。

以下是intern函数的文档:

当调用intern方法时,如果池中已经包含一个与此String对象相等的字符串(根据equals(Object)方法确定),则返回池中的字符串。否则,将此String对象添加到池中并返回对此String对象的引用。

以下是我的理解:

  1. 通过查看字节码文件,我们可以在常量池中找到"12"、"21"和"1221"。
  2. 当类加载时,字节码文件中的常量池会被加载到运行时常量池中。因此,字符串池包含了"12"、"21"和"1221"。
  3. new String("12") 在堆上创建了一个String实例,它与字符串池中的"12"是不同的。new String("21") 也是如此。
  4. "+"运算符被转化为StringBuilder,并调用其append和toString方法,这可以在字节码中看到。
  5. 在toString方法中调用new string,所以s1是堆上的String实例"1221"。
  6. s1.intern() 查找字符串池,发现"1221"已经存在,所以它什么也不做。顺便说一句,我们不使用返回值,所以它与s1无关。
  7. String s2 = "1221" 只是加载了字符串池中的"1221"实例。在字节码中,ldc #11,#11 是常量池中"1221"的索引。
  8. "=="运算符比较引用类型的地址。s1指向堆上的实例,s2指向字符串池中的实例。这两者怎么可能相等呢?

我的疑惑:

  1. s1和s2究竟指向什么?
  2. 为什么调用intern()方法会改变行为?即使不使用返回值。

以下是我的假设:

  1. 当类加载时,字符串池并未初始化。有些答案说s1.intern()是将"1221"第一次加载到字符串池中。但是如何解释"1221"已经在字节码文件的常量池中了呢?关于字符串池加载时机,是否有任何规定?

  2. 另一种说法是intern函数只是保存了对堆上实例的引用,但引用s1、s2仍然是不同的。s1指向堆,s2指向字符串池,而字符串池指向堆。这个引用与引用的引用是不同的。

英文:

Recently I'm learning Hotspot JVM. When learning the string constant pool and String intern function, I encountered a very weird situation. After browsing a lot of answers, I still can’t explain this phenomenon, so I’m sending it out to discuss with you.

    public static void main(String[] args) {
        String s1 = new String("12") + new String("21");
        s1.intern();
        String s2 = "1221";
        System.out.println(s1 == s2); // true
    }

    public static void main(String[] args) {
        String s1 = new String("12") + new String("21");
        // s1.intern();
        String s2 = "1221";
        System.out.println(s1 == s2); // false
    }

The reslut is based on Java8.

So the only difference between the two codes is call s1.intern() or not.

Here is the document of intern function.
> When the intern method is invoked, if the pool already contains a string equal to this String object as determined by the equals(Object) method, then the string from the pool is returned. Otherwise, this String object is added to the pool and a reference to this String object is returned.

Here is my understanding:

  1. By browsing the bytecode file, we can find "12", "21", "1221" in the constant pool.
  2. When the class is loaded, the constant pool in bytecode file is loaded into run-time constant pool. So the String pool contains "12", "21", "1221".
  3. new String("12") create a String instance on the heap, which is different from "12" in String pool. So does new String("21").
  4. The "+" operator is transformed into StringBuilder and call its append and toString method, which can be seen in bytecode.
  5. In toString method calls new string, so s1 is String instance "1221" on the heap.
  6. s1.intern() look into String pool, and a "1221" is there, so it dose nothing. Btw, we don't use the return value, so it has nothing to do with s1.
  7. String s2 = "1221" just loaded the "1221" instance in the string pool. In bytecode, ldc #11, #11 is the index of "1221" in constant pool.
  8. The "==" operator comapre the address of reference type. The s1 point to the instance on the heap, the s2 point to the instance in the string pool. How can these two be equal?

My wonder:

  1. What exactly do s1 and s2 point to?
  2. Why call intern() methed will change the behavior? Even don't use the return value.

Here is my assumption:

  1. The string pool is not initilized when class is loaded. Some answer said s1.intern() is the first time "1221" is loaded into string pool.
    But how to explain "1221" is in the constant pool of bytecode file. Is there any specification about string pool loading timing?

  2. Another saying is intern function just save the reference to the instance on the heap, but the renference s1, s2 are still different. s1 point the heap, s2 point to the string pool, and string pool point to the heap. The reference is different from reference of a reference.

答案1

得分: 2

String one = new String("abc");
String two = new String("abc");

boolean res1 = one == two;      // false - two different objects
boolean res2 = one.equals(two); // true - content identical

one = one.intern(); // i.e. put string (if not exist) to the StringPool
// and retrieve the object from the StringPool back 
two = two.intern();
boolean res3 = one == two;      // true - same object from the StringPool
boolean res4 = one.equals(two); // true - content identical

// Put string literal "12" into StringPool
// Create an object in heap with "12"
String one = new String("12");
String two = new String("21");

// Concatenate two strings
// Put result into StringPool and retrieve it back 
String two = one + two;

// Concatenate two strings
// Put result into StringPool and retrieve it back
// Create an object in heap with result string
String three = new String(one + two);

// Put string literal to the StringPool and retrieve it back
String four = "1221";

boolean res1 = two == four;  // true - both objects are from StringPool
boolean res2 = three == four; // false - `three` is in Heap,
// `four` is in StringPool

// Put string into StringPool and retrieve it back
three = three.intern(); 

boolean res3 = three == four; // true - both objects are from StringPool
英文:
String one = new String("abc");
String two = new String("abc");

boolean res1 = one == two;      // false -> two different objects
boolean res2 = one.equals(two); // true -> content identical

one = one.intern(); // i.e. put string (if not exist) to the StringPool
// and retrieve the object from the StringPool back 
two = two.intern();
boolean res3 = one == two;      // true -> same object from the StringPool
boolean res4 = one.equals(two); // true -> content identical

// Put string literal "12" into StringPool
// Create and object in heap with "12"
String one = new String("12");
String two = new String("21");

// Concatenate two strings
// Put result into StringPool and retrieve it back 
String two = one + two;


// Concatenate two strings
// Put result into StringPool and retrieve it back
// Create an object in heap with result string
String three = new String(one + two);


// Put string literal to the StringPool and retrieve it back
String four = "1221";

boolean res1 = two == four;  // true -> both objects are from StringPool
boolean res2 = three == four; // false -> `three` is in Heap,
// `four` is in StringPool


// Put string into StringPool and retrieve it back
three = three.intern(); 

boolean res3 = three == four; // true -> both objects are from StringPool

答案2

得分: 1

我是提问者。

感谢与 @Sweeper 和 @user16320675 的讨论,我对这个问题有了新的理解,并在这里与您分享。

错误发生在对 2 和 6 的理解上,字符串池没有随类加载一起加载。 s1.intern() 是第一次将 "1221" 添加到字符串池中。然后 String s2 = "1221" 将根据字符串池中是否存在 "1221" 来改变行为。

为了更好地解释这个问题,首先定义涉及的关键概念。

关键概念

  • 常量池:字节码中的数据结构,用于存储源代码中使用的常量、字符串、类、字段、方法、接口、参数类型等。存储在硬盘上的字节码文件中。
  • 运行时常量池:程序运行时的内存中的常量池。当类被加载时,常量池数据将被加载到JVM方法区,形成运行时常量池。
  • CONSTANT_String_info:常量池中的数据结构,存储源代码中字符串字面值对应的Unicode序列。
  • 字符串池:JDK8堆中的内存区域,用于访问已使用的String实例。
  • ldc #5:将运行时常量池中的第5个常量推送到操作数栈上。在使用字符串字面值表示的字符串时,它将首先检查字符串池中是否存在相应的字符串实例。如果存在,则将其引用地址推送到栈上;如果不存在,则在字符串池中创建一个字符串实例,并将其地址推送到栈上。

错误原因

错误源于对 "字符串池" 和 "常量池" 之间关系的误解(以下将常量池和运行时常量池互换使用)。

虽然通常称为 "字符串常量池",但它与 "常量池" 没有关系。因此,它不会随类加载而加载。在JDK6中,"字符串池" 和 "常量池" 都位于永久代,它们之间似乎存在某种关系。但在JDK8中,"字符串池" 移到了堆中。它更像是 "String类" 的一部分,而不是 "常量池" 的一部分。它可以理解为 "String类" 的私有成员变量,尽管在 "String" 源代码中无法观察到它。

在 "字符串池" 中创建的 "String实例" 后,实例中的字节数组不能被更改。如果对现有的 "String实例" 执行更改操作,将生成一个新的 "String实例",显示出常量的特征,因此通常称为 "字符串常量池"。但为了避免混淆 "字符串池" 和 "常量池",我尝试使用 "字符串池" 而不是字符串常量池。

容易与它混淆的另一个概念是 "常量池" 中的 CONSTANT_String_info。"字符串字面值" 存储在Unicode序列中,并且将随着类加载一起加载到 "运行时常量池" 中。但它与 "字符串池" 本质上不同:CONSTANT_String_info 只存储 "Unicode序列",而字符串池存储 "String实例"。"String实例" 不仅包含 "Unicode序列",还包含其他成员属性,如哈希值。而且 "String类" 与许多无法在 "CONSTANT_String_info" 上执行的方法绑定在一起。可以通过执行以 "CONSTANT_String_info" 中的Unicode序列作为参数的String初始化函数来生成相应的 "String实例"。

英文:

I am the questioner.

Thanks for the discussion with @Sweeper and @user16320675, I have new understanding of this problem, and I share it with you here.

The error occurred in understanding 2 and 6, the string pool was not loaded along with the class loading. s1.intern() is the first time adds "1221" to the string pool. And then String s2 = "1221" will change the behavior according to whether "1221" exists in the string pool.

In order to better explain this problem, first define the key concepts involved.

key concept

  • Constant pool: A Data structure in bytecode, used to store constants, strings, classes, fields, methods, interfaces, parameter types, etc. used in source code. Stored in a bytecode file on the hard disk.
  • Runtime constant pool: When the program is running, the constant pool in memory. When the class is loaded, the constant pool data will be loaded into the JVM method area to form a runtime constant pool.
  • CONSTANT_String_info: A data structure in the constant pool, which stores the Unicode sequence corresponding to the string literal in the source code
  • String pool: A memory area in the JDK8 heap for accessing used String instances.
  • ldc #5: push the No.5 constant from the runtime constant pool to the operand stack. When using a string represented by a literal, it will first check whether there is a corresponding string instance in the string pool. If it exists, its reference address is pushed into the stack; if it does not exist, a string instance is created in the string pool and its address is pushed into the stack.

wrong reason

The error comes from misunderstanding the relationship between the string pool and the constant pool (hereafter using the constant pool and the runtime constant pool indiscriminately).

Although it is usually called string constant pool, it does not have a relationship with the constant pool. Therefore, it will not be loaded as the class is loaded. In JDK6, both the string pool and the constant pool are located in the permanent generation, and there seems to be some relationship between them. But in JDK8, the string pool was moved to the heap. It is not so much part of the constant pool as it is part of the String class. It can be understood as a private member variable of the String class, although it cannot be observed in the String source code.

After the String instance in the string pool is created, the byte array in the instance cannot be changed. If a change operation is performed on an existing String instance, a new String instance will be generated, showing the characteristics of a constant, so it is usually called a string constant pool. But in order to avoid confusing the string pool and the constant pool, I tries to use the string pool instead of the string constant pool.

Another concept that is easily confused with it is CONSTANT_String_info in the constant pool. String literals are stored in Unicode sequences, and will be loaded into the runtime constant pool along with class loading. But it is fundamentally different from the string pool: CONSTANT_String_info only stores Unicode sequences, while the string pool stores String instances. String instances not only contain Unicode sequences, but also other member attributes, such as hash. And the String class is bound with many methods which cannot be executed on CONSTANT_String_info. The corresponding String instance can be generated by executing the String initialization function with the Unicode sequence in CONSTANT_String_info as a parameter.

答案3

得分: 1

以下是翻译好的部分:

这里有一个简短的解释。首先,运算符==仅当两个比较的字符串实际上是String类的相同实例时才为true。对于包含相同内容的两个不同String类实例,结果将是false。所以,如果你真的想比较两个字符串的内容,必须使用String类的equals()方法。现在,如果你编写以下代码:

String s1 = "test";
//s1.intern();
String s2 = "test";
System.out.println(s1==s2) // 输出很可能是true;

即使你不调用s1.intern(),JVM很可能(虽然不能保证)在幕后调用它,并且s2将被分配相同的实例,这就是为什么s1==s2将为true的原因。(如果你调用s1.intern(),那么true的结果是保证的)。现在,如果你运行以下代码:

String s1 = "test";
s1.intern();
String s2 = new String("test");
System.out.println(s1==s2) // 输出将是false;

因为使用new String("test"),你强制创建了一个新的String实例,而不管内部池中已经存在什么。

英文:

Here is a short explanation. First operator == will be only true if two compared strings are actually the same instance of a String class. For 2 different instances of a String class that hold the same content the result would be false. So if you really want to compare the content of 2 Strings you MUST use methods equals() of a String class. Now if you write the following code:

String s1 = "test";
//s1.intern();
String s2 = "test";
System.out.println(s1==s2) // output most likely will be true;

Even if you don't invoke s1.intern() it will most likely (although not guaranteed) will be invoked behind the scenes by JVM and s2 will be assigned the same instance, and that is why the s1==s2 will be true. (If you invoke s1.intern() than the true result is guaranteed). Now if you run the following code:

String s1 = "test";
s1.intern();
String s2 = new String("test");
System.out.println(s1==s2) // output will be false;

Because with new String("test") you forse creation of a new instance of a String regardless of what is already in existence in the internal pool

huangapple
  • 本文由 发表于 2023年6月29日 17:35:57
  • 转载请务必保留本文链接:https://go.coder-hub.com/76579840.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定