英文:
Identify if a Unicode code point represents a character from a certain script such as the Latin script?
问题
Unicode将字符分类为属于特定的书写系统,比如拉丁字母书写系统。
如何测试特定的字符(代码点)是否属于特定的书写系统?
英文:
Unicode categorizes characters as belonging to a script, such as the Latin script.
How do I test whether a particular character (code point) is in a particular script?
答案1
得分: 4
Java在Character.UnicodeScript
枚举中表示各种Unicode脚本,例如Character.UnicodeScript.LATIN
。这些与Unicode Script Properties相匹配。
您可以通过将字符的代码点整数值提交给该枚举上的of
方法来测试字符。
int codePoint = "a".codePointAt(0);
Character.UnicodeScript script = Character.UnicodeScript.of(codePoint);
if (Character.UnicodeScript.LATIN.equals(script)) { … }
或者:
boolean isLatinScript =
Character.UnicodeScript.LATIN
.equals(
Character.UnicodeScript.of(codePoint)
)
;
示例用法。
System.out.println(
Character.UnicodeScript.LATIN // 枚举上定义的常量。
.equals( // `java.lang.Enum.equals()` 比较枚举上定义的两个常量。
Character.UnicodeScript.of( // 确定此字符的Unicode脚本。
"😷".codePointAt(0) // 获取此字符串中第一个(也是唯一一个)字符的代码点整数值。
) // 返回一个`Character.UnicodeScript`枚举对象。
) // 返回`boolean`。
);
在这里查看在 IdeOne.com 上运行的代码。
>false
另外,Character
类允许您查询代码点是否表示isDigit
、isLetter
、isLetterOrDigit
、isLowerCase
等特性。
英文:
Java represents the various Unicode scripts in the Character.UnicodeScript
enum, including for example Character.UnicodeScript.LATIN
. These match the Unicode Script Properties.
You can test a character by submitting its code point integer number to the of
method on that enum.
int codePoint = "a".codePointAt( 0 ) ;
Character.UnicodeScript script = Character.UnicodeScript.of( codePoint ) ;
if( Character.UnicodeScript.LATIN.equals( script ) ) { … }
Alternatively:
boolean isLatinScript =
Character.UnicodeScript.LATIN
.equals(
Character.UnicodeScript.of( codePoint )
)
;
Example usage.
System.out.println(
Character.UnicodeScript.LATIN // Constant defined on the enum.
.equals( // `java.lang.Enum.equals()` comparing two constants defined on the enum.
Character.UnicodeScript.of( // Determine which Unicode script for this character.
"😷".codePointAt( 0 ) // Get the code point integer number of the first (and only) character in this string.
) // Returns a `Character.UnicodeScript` enum object.
) // Returns `boolean`.
);
See this code run at IdeOne.com.
>false
FYI, the Character
class lets you ask if a code point represents a character that isDigit
, isLetter
, isLetterOrDigit
, isLowerCase
, and more.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论