2020年8月5日 13:41:39go评论80阅读模式

英文:

Why does Java not recognize these white spaces?

问题

有25种空白字符。下面的代码中Character.isWhitespace(char)显示了其中25种中有四种在Java中不被视为空白字符。为什么？

public class Main {
    public static void main(String...args){
        char [] whiteSpaces = {'\u0085', '\u00A0', '\u2007', '\u202F'};
        for(char space : whiteSpaces){
            //所有这些字符在Java中都不是空白字符。
            System.out.println("[" + space + "] is a white space in Java:" + Character.isWhitespace(space));
        }
    }
}

参考 - https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/lang/Character.html#isWhitespace(char)

英文:

There are 25 types of white spaces. Character.isWhitespace(char) in the code below shows that four of the 25 types are not considered as white space in Java. Why ?

public class Main {
    public static void main(String...args){
        char [] whiteSpaces = {&#39;\u0085&#39;, &#39;\u00A0&#39;, &#39;\u2007&#39;, &#39;\u202F&#39;};
        for(char space : whiteSpaces){
            //All spaces are not white spaces in Java.
            System.out.println(&quot;[&quot; + space + &quot;] is a white space in Java:&quot; + Character.isWhitespace(space));
        }
    }
}

Refer -https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/lang/Character.html#isWhitespace(char)

答案1

得分: 3

为什么？因为这就是该方法的规定方式。isWhiteSpace 的javadoc列出了它匹配的代码。您所识别的4个字符不在列表中。

**我们无法告诉您为什么它被定义成这样。**但是，根据javadoc的说法，'\u00A0'、'\u2007' 和 '\u202F' 被排除在外，因为它们是不间断的空白字符。

'\u0085' 或 NEL 是一个有趣的案例。根据Unicode代码表（请参见此处以获取非官方摘要），它不属于一般类别SPACE_SEPARATOR、LINE_SEPARATOR或PARAGRAPH_SEPARATOR。（它属于CONTROL类别。）

如果您想要一个识别所有Unicode空白字符（即SPACE_SEPARATOR、LINE_SEPARATOR或PARAGRAPH_SEPARATOR中的字符）的方法，应该使用 isSpaceChar（javadoc）而不是isWhiteSpace。

请注意，Unicode规范不是一个固定不变的东西。代码的分类，甚至“空白字符”的定义随着时间的推移而发展演变。每个Java版本实现了特定版本的Unicode规范，该规范在其发布时是当前的。例如：

Java 8 实现了 Unicode 6.2
Java 11 实现了 Unicode 10.0.0
Java 13 实现了 Unicode 12.1

每个Java版本的Character类的javadoc中都有详细信息。请注意，给定的Java版本不会被补丁以跟踪随后的Unicode发布。

总之，“空白字符”是一个相当模糊的概念。如果您想要实现特定含义的方法，可能需要自己实现。

英文:

Why? Because that is how that method is specified. The javadoc for isWhiteSpace lists the codes that it matches. The 4 that you identified are not in the list.

We can't tell you why it was defined that way. However, one implication of what the javadoc says is that '\u00A0', '\u2007' and '\u202F' are excluded because they are non-breaking whitespace characters.

'\u0085' or NEL is an interesting case. According to the Unicode code tables (see here for an unofficial summary) it is NOT a member of the general categories SPACE_SEPARATOR, LINE_SEPARATOR or PARAGRAPH_SEPARATOR. (It shows up in the CONTROL category.)

If you want a method that recognises all Unicode white space characters (i.e. characters in SPACE_SEPARATOR, LINE_SEPARATOR or PARAGRAPH_SEPARATOR), you should use isSpaceChar (javadoc) instead of isWhiteSpace.

Note that the Unicode spec is not a constant thing. The categorization of the codes, and indeed the definition of "white space" has evolved over time. Each Java version implements a specific version of the Unicode spec that was current at the time it was released. For example:

Java 8 implements Unicode 6.2
Java 11 implements Unicode 10.0.0
Java 13 implements Unicode 12.1

The details are in the javadoc for the Character class for each Java version. Note that a given Java release is NOT patched to track subsequent Unicode releases.

The bottom line is that "white space" is a rather slippery concept. If you want a method that implements a specific meaning, you may need to implement it yourself.

答案2

得分: 3

如果您阅读文档，即Character.isWhitespace(char)的javadoc，它说：

> 根据Java，确定指定的字符是否为空格。只有当满足以下条件之一时，字符才是Java空格字符：
>
> - 它是Unicode空格字符（SPACE_SEPARATOR、LINE_SEPARATOR或PARAGRAPH_SEPARATOR），但不是不换行空格（'\u00A0'、'\u2007'、'\u202F'）。
> - 它是'\t'，U+0009水平制表符。
> - 它是'\n'，U+000A换行符。
> - 它是'\u000B'，U+000B垂直制表符。
> - 它是'\f'，U+000C换页符。
> - 它是'\r'，U+000D回车符。
> - 它是'\u001C'，U+001C文件分隔符。
> - 它是'\u001D'，U+001D组分隔符。
> - 它是'\u001E'，U+001E记录分隔符。
> - 它是'\u001F'，U+001F单元分隔符。

您列出的4个字符中的3个明确被排除，因为它们是不换行空格。

至于U+0085下一行（NEL），它不是Unicode空格字符，Java不将其视为空格字符，您可以在javadoc中清楚地看到。

英文:

If you read the documentation, i.e. the javadoc of Character.isWhitespace(char), it says:

> Determines if the specified character is white space according to Java. A character is a Java whitespace character if and only if it satisfies one of the following criteria:
>
> - It is a Unicode space character (SPACE_SEPARATOR, LINE_SEPARATOR, or PARAGRAPH_SEPARATOR) but is not also a non-breaking space ('\u00A0', '\u2007', '\u202F').
> - It is '\t', U+0009 HORIZONTAL TABULATION.
> - It is '\n', U+000A LINE FEED.
> - It is '\u000B', U+000B VERTICAL TABULATION.
> - It is '\f', U+000C FORM FEED.
> - It is '\r', U+000D CARRIAGE RETURN.
> - It is '\u001C', U+001C FILE SEPARATOR.
> - It is '\u001D', U+001D GROUP SEPARATOR.
> - It is '\u001E', U+001E RECORD SEPARATOR.
> - It is '\u001F', U+001F UNIT SEPARATOR.

3 of the 4 you listed are explicitly excluded because they are non-breaking spaces.

As for U+0085 NEXT LINE (NEL), it is not a Unicode space character, and it is not considered a whitespace character by Java, as you can well see in that javadoc.

答案3

得分: 1

Java似乎没有在任何地方公开Unicode空白字符列表。

在Java中，isWhitespace明确定义为以下之一：

它是Unicode空格字符（SPACE_SEPARATOR、LINE_SEPARATOR或PARAGRAPH_SEPARATOR），但不是不间断空格（'\u00A0'、'\u2007'或'\u202F'）之一。
它是'\t'，U+0009水平制表符。
它是'\n'，U+000A换行符。
它是'\u000B'，U+000B垂直制表符。
它是'\f'，U+000C换页符。
它是'\r'，U+000D回车符。
它是'\u001C'，U+001C文件分隔符。
它是'\u001D'，U+001D组分隔符。
它是'\u001E'，U+001E记录分隔符。
它是'\u001F'，U+001F单元分隔符。

Java还通过Character.isSpaceChar()提供Unicode的空格，但不包括Unicode的空白字符，这是一个稍微不同的列表。

char[] whiteSpaces = {'\u0085', '\u00A0', '\u2007', '\u202F'};
for (char space : whiteSpaces) {
    // 在Java中，所有空格都不是空白字符。
    System.out.println("[" + space + "] is a white space in Java: " + Character.isWhitespace(space) + " Unicode: " + Character.isSpaceChar(space));
}

输出：

[] is a white space in Java: false Unicode: false
[ ] is a white space in Java: false Unicode: true
[ ] is a white space in Java: false Unicode: true
[ ] is a white space in Java: false Unicode: true

如果对您的应用程序来说匹配Unicode规范而不是Java规范很重要，您可以自行定义。

英文:

Java doesn't seem to expose the unicode whitespace list anywhere

In Java, isWhitespace is specifically defined as one of these:

It is a Unicode space character (SPACE_SEPARATOR, LINE_SEPARATOR, or PARAGRAPH_SEPARATOR) but is not also a non-breaking space ('\u00A0', '\u2007', '\u202F').
It is '\t', U+0009 HORIZONTAL TABULATION.
It is '\n', U+000A LINE FEED.
It is '\u000B', U+000B VERTICAL TABULATION.
It is '\f', U+000C FORM FEED.
It is '\r', U+000D CARRIAGE RETURN.
It is '\u001C', U+001C FILE SEPARATOR.
It is '\u001D', U+001D GROUP SEPARATOR.
It is '\u001E', U+001E RECORD SEPARATOR.
It is '\u001F', U+001F UNIT SEPARATOR.

Java also makes unicode spaces available, but not unicode whitespaces, via Character.isSpaceChar(). This is a slightly different list.

char [] whiteSpaces = {&#39;\u0085&#39;, &#39;\u00A0&#39;, &#39;\u2007&#39;, &#39;\u202F&#39;};
        for(char space : whiteSpaces){
            //All spaces are not white spaces in Java.
            System.out.println(&quot;[&quot; + space + &quot;] is a white space in Java: &quot; + Character.isWhitespace(space) + &quot; Unicode: &quot; + Character.isSpaceChar(space));
        }

Output:

[] is a white space in Java: false Unicode: false
[ ] is a white space in Java: false Unicode: true
[ ] is a white space in Java: false Unicode: true
[ ] is a white space in Java: false Unicode: true

If it's important for your application to match the unicode specs instead of the java specs, just define it yourself.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Java为什么不识别这些空白字符？

问题

答案1

答案2

答案3

如何从另一个活动调用非静态函数？

这段 Java 代码为什么没有产生我期望的结果？

JPA：检查集合中是否存在具有属性的成员

比较Python和Java之间的HTTP Post（Jenkins 302/403响应代码）。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论