Identify if a Unicode code point represents a character from a certain script such as the Latin script?

huangapple go评论86阅读模式
英文:

Identify if a Unicode code point represents a character from a certain script such as the Latin script?

问题

Unicode将字符分类为属于特定的书写系统,比如拉丁字母书写系统

如何测试特定的字符(代码点)是否属于特定的书写系统?

英文:

Unicode categorizes characters as belonging to a script, such as the Latin script.

How do I test whether a particular character (code point) is in a particular script?

答案1

得分: 4

Java在Character.UnicodeScript枚举中表示各种Unicode脚本,例如Character.UnicodeScript.LATIN。这些与Unicode Script Properties相匹配。

您可以通过将字符的代码点整数值提交给该枚举上的of方法来测试字符。

int codePoint = "a".codePointAt(0);
Character.UnicodeScript script = Character.UnicodeScript.of(codePoint);
if (Character.UnicodeScript.LATIN.equals(script)) { … }

或者:

boolean isLatinScript =
        Character.UnicodeScript.LATIN
        .equals(
            Character.UnicodeScript.of(codePoint)
        )
;

示例用法。

System.out.println(
        Character.UnicodeScript.LATIN      // 枚举上定义的常量。
        .equals(                           // `java.lang.Enum.equals()` 比较枚举上定义的两个常量。
            Character.UnicodeScript.of(    // 确定此字符的Unicode脚本。
                "😷".codePointAt(0)      // 获取此字符串中第一个(也是唯一一个)字符的代码点整数值。
            )                              // 返回一个`Character.UnicodeScript`枚举对象。
        )                                  // 返回`boolean`。
);

在这里查看在 IdeOne.com 上运行的代码

>false

另外,Character类允许您查询代码点是否表示isDigitisLetterisLetterOrDigitisLowerCase等特性。

英文:

Java represents the various Unicode scripts in the Character.UnicodeScript enum, including for example Character.UnicodeScript.LATIN. These match the Unicode Script Properties.

You can test a character by submitting its code point integer number to the of method on that enum.

int codePoint = "a".codePointAt( 0 ) ; 
Character.UnicodeScript script = Character.UnicodeScript.of( codePoint ) ;
if( Character.UnicodeScript.LATIN.equals( script ) ) { … }

Alternatively:

boolean isLatinScript = 
        Character.UnicodeScript.LATIN
        .equals( 
            Character.UnicodeScript.of( codePoint ) 
        )
;

Example usage.

System.out.println(
        Character.UnicodeScript.LATIN      // Constant defined on the enum.
        .equals(                           // `java.lang.Enum.equals()` comparing two constants defined on the enum.
            Character.UnicodeScript.of(    // Determine which Unicode script for this character.
                "😷".codePointAt( 0 )      // Get the code point integer number of the first (and only) character in this string.
            )                              // Returns a `Character.UnicodeScript` enum object. 
        )                                  // Returns `boolean`. 
);

See this code run at IdeOne.com.

>false

FYI, the Character class lets you ask if a code point represents a character that isDigit, isLetter, isLetterOrDigit, isLowerCase, and more.

huangapple
  • 本文由 发表于 2020年5月31日 07:18:28
  • 转载请务必保留本文链接:https://go.coder-hub.com/62109781.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定