“Java代码中的Emoji符号 – 字符字面量中的字符太多”

huangapple go评论87阅读模式
英文:

Emoji symbol in Java code - Too many characters in character literal

问题

I have translated the code part as requested. Here's the translated code:

我需要统计给定字符串中的字符数我将计数保存到一个`Map<Character, Long>`代码无法处理一些特殊符号比如"two hearts"当我将这样一个特殊符号转换为字符时会出现编译错误"字符文字中的字符太多"或类似的错误为什么会发生这种情况如何修复

以下是用来演示问题的粗略代码这不是完整的代码

```java
import java.util.HashMap;
import java.util.Map;

public class Demo {
    public static void main(String[] args) {
        String twoHeartsStr = "❤️";
        Map<Character, Long> output = new HashMap<>();
        output.put(twoHeartsStr.charAt(0), 1L);

        // 编译错误:
        // intellij IDE 编译器:字符文字中的字符太多。
        // java:未封闭的字符文字。
        Map<Character, Long> expectedOutput = Map.of('❤', 1L);
        System.out.println("Maps are equal: " + output.equals(expectedOutput));
    }
}

编辑:
在得到这个问题的答案后更新解决方案。

import java.util.HashMap;
import java.util.Map;

public class Demo {
    public static void main(String[] args) {
        String twoHeartsStr = "❤️"; // 尝试#、字母、数字等。
        Map<String, Long> output = new HashMap<>();
        int codePoint = twoHeartsStr.codePointAt(0);
        String charValue = String.valueOf(Character.toChars(codePoint)); // 为twoHearts的情况,大小为2。
        output.put(charValue, 1L);

        Map<String, Long> expectedOutput = Map.of("❤️", 1L);
        System.out.println("Maps are equal: " + output.equals(expectedOutput)); // true。
    }
}
英文:

I have to count the characters in a given String. I save the counts to a map Map&lt;Character, Long&gt;. The code does not work with some special symbols like "two hearts". When I convert such a special symbol into a character, then I get the compiler error "Too many characters in character literal" or similar. Why does this happen and how to fix it ?

Here is some rough code to demonstrate the problem. This is not the full code.

import java.util.HashMap;
import java.util.Map;

public class Demo {
    public static void main(String[]args){
        String twoHeartsStr = &quot;&#128149;&quot;;
        Map&lt;Character, Long&gt; output = new HashMap&lt;&gt;();
        output.put(twoHeartsStr.charAt(0), 1L);

        //Compiler error:
        //intellij IDE compiler : Too many characters in character literal.
        //java: unclosed character literal.
        Map&lt;Character, Long&gt; expectedOutput = Map.of(&#39;&#128149;&#39;, 1L);
        System.out.println(&quot;Maps are equal : &quot; + output.equals(expectedOutput));

    }
    
}

EDIT :
Updated solution after getting answers to this question.

import java.util.HashMap;
import java.util.Map;

public class Demo {
    public static void main(String[]args){
        String twoHeartsStr = &quot;&#128149;&quot;;//Try #, alphabet, number etc.
        Map&lt;String, Long&gt; output = new HashMap&lt;&gt;();
        int codePoint = twoHeartsStr.codePointAt(0);
        String charValue = String.valueOf(Character.toChars(codePoint));//Size = 2 for twoHearts.
        output.put(charValue, 1L);

        Map&lt;String, Long&gt; expectedOutput = Map.of(&quot;&#128149;&quot;, 1L);
        System.out.println(&quot;Maps are equal : &quot; + output.equals(expectedOutput));//true.
    }
}

答案1

得分: 4

根据Java的定义,"&#128149;" 不是一个字符;它是两个字符:

&gt;&gt;&gt; "&#128149;".length()
2 (int)

因此,'&#128149;' 是语法错误,因为char是一个16位整数类型,而Unicode符号&#128149;并不仅由一个16位整数值表示。

解决您的问题的方法是使用字符串而不是字符。

英文:

By Java's definition, &quot;&#128149;&quot; is not one character; it is two:

&gt;&gt;&gt; &quot;&#128149;&quot;.length()
2 (int)

So &#39;&#128149;&#39; is a syntax error, because char is a 16-bit integer type, and the Unicode symbol 💕 is not represented by just one 16-bit integer value.

The solution to your problem is to use strings instead.

答案2

得分: 1

The Java char type is a 16-bit value, which is not sufficient to represent all Unicode characters. Some characters, like "two hearts," require a surrogate pair. You can use 32-bit integers (int) and the codePointAt method to work with such characters. However, you can't store them in a char. To find the code-point value for the "two hearts" character, extract it from a string containing the symbol.

英文:

> The code does not work with some special symbols like "two hearts"... Why does this happen

The Java char type is a 16-bit value. In the early days of Unicode, this was sufficient to store all the code-point values, but that quickly changed. The established Unicode specification allows for over a million characters, some of which need to be represented with a "surrogate pair".

From the documentation:

> A String represents a string in the UTF-16 format in which supplementary characters are represented by surrogate pairs (see the section Unicode Character Representations in the Character class for more information). Index values refer to char code units, so a supplementary character uses two positions in a String.

Moving on:

> twoHeartsStr.charAt(0)

This will give you the first half of the surrogate pair, which is not a valid character on its own despite being a valid char value (char is fundamentally an integer type rather than a textual type).

> ...and how to fix it ?

You can use 32-bit integers (i.e., int or Integer) to represent the values, and the codePointAt method to extract them from the string. Note, however, that when you iterate over the string, you'll still need to skip over the indices corresponding to the second halves of the pairs.

You still won't be able to store the "supplementary characters" in a char, so you won't be able to write them in char literals. So to look up the two-hearts character in the resulting histogram (or to populate your reference data for testing), you'll want to get the integer code-point value from a string with that symbol.

huangapple
  • 本文由 发表于 2020年7月30日 08:48:35
  • 转载请务必保留本文链接:https://go.coder-hub.com/63164540.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定