如何在Java中检测字符串是否包含表情符号?

huangapple go评论64阅读模式
英文:

How to detect if string contains emoji in Java?

问题

I would like to detect if a string contains emoji in Java.

I tried https://github.com/vdurmont/emoji-java but it is not maintained anymore and fails on new emojis.

For example - the following test fails:

EmojiManager.containsEmoji("This string contains beans 🫘") shouldBe true
英文:

I would like to detect if a string contains emoji in Java.

I tried https://github.com/vdurmont/emoji-java but it is not maintained anymore and fails on new emojis.

For example - the following test fails:

EmojiManager.containsEmoji("This string contains beans 🫘") shouldBe true

答案1

得分: 1

In Java 21,您将能够使用Character#isEmoji(int)来协助您。

英文:

In Java 21 you will be able to use Character#isEmoji(int) to help you out.

答案2

得分: 1

<h3> How To: Detect Emojis in Strings </h3>

由于emoji-java库在此情况下无法使用,您必须使用java.lang.Character。在我的代码中,containsEmoji(String str)方法执行重要工作。在这里,我们使用Unicode规则并在该方法中进行检查。Character.SURROGATE表示Unicode代理对,Emojis使用它。Character.OTHER_SYMBOL包括各种符号,包括Emojis,但不仅限于Emojis。在您的情况下,这段代码应该没问题,但请注意,Unicode中不是Emojis的其他特殊符号也可能触发为true。不幸的是,我目前不知道其他方法。

修复后的代码:

import java.lang.Character;

/**
 * 用于检查字符串是否包含表情符号的主类
 */
public class Main {
  /**
   * 用于测试表情符号检测器的主要方法
   *
   * @param args 未使用
   */
  public static void main(String[] args) {
    String beans = "This string contains beans &#129752;";

    System.out.println("以下字符串是否包含表情符号?");
    System.out.println(beans);
    System.out.println("结果:" + containsEmoji(beans));
    System.out.println("");

    String smiley = "Hello, this is a happy message &#128512;";

    System.out.println("以下字符串是否包含表情符号?");
    System.out.println(smiley);
    System.out.println("结果:" + containsEmoji(smiley));
    System.out.println("");

    String penguin = "There is a penguin here &#128039;";

    System.out.println("以下字符串是否包含表情符号?");
    System.out.println(penguin);
    System.out.println("结果:" + containsEmoji(penguin));
    System.out.println("");

  }

  /**
   * 检查字符串是否包含表情符号
   *
   * @param str 要检查的参数字符串
   * @return 如果字符串包含任何表情符号,则为True,否则为False
   */
  private static boolean containsEmoji(String str) {
    int length = str.length();

    for (int i = 0; i < length; i++) {
      int type = Character.getType(str.charAt(i));
      if (type == Character.SURROGATE || type == Character.OTHER_SYMBOL) {
        return true;
      }
    }

    return false;
  }
}

编辑:
如果您可以使用Java 21(旧版本的Java不支持),您可以使用isEmoji()方法。

英文:

<h3> How To: Detect Emojis in Strings </h3>

Since the emoji-java library can't be used in this case you have to go for java.lang.Character. In my code the containsEmoji(String str) method does the important work. Here we make use of Unicode rules and check them in the method. Character.SURROGATE indicates a Unicode surrogate pair, which Emojis use. Character.OTHER_SYMBOL includes various symbols including Emojis, but not limited to Emojis. In your case this code should be fine, however mind that other special symbols which are not Emojis in Unicode could trigger true too. Sadly I don't know any other way at the moment.

Fixed Code:

import java.lang.Character;
/**
* Main class for checking if String contains Emojis
*/
public class Main {
/**
* Main method to test Emoji Detector
*
* @param args are not used 
*/
public static void main(String[] args) {
String beans = &quot;This string contains beans &#129752;&quot;;
System.out.println(&quot;Does following String contain Emoji?&quot;);
System.out.println(beans);
System.out.println(&quot;Result: &quot; + + containsEmoji(beans));
System.out.println(&quot;&quot;);
String smiley = &quot;Hello, this is a happy message &#128512;&quot;;
System.out.println(&quot;Does following String contain Emoji?&quot;);
System.out.println(smiley);
System.out.println(&quot;Result: &quot; + + containsEmoji(smiley));
System.out.println(&quot;&quot;);
String penguin = &quot;There is a penguin here &#128039;&quot;;
System.out.println(&quot;Does following String contain Emoji?&quot;);
System.out.println(penguin);
System.out.println(&quot;Result: &quot; + + containsEmoji(penguin));
System.out.println(&quot;&quot;);
}
/**
* Checks if a String contains Emojis
*
* @param str is the parameter String to check
* @return True if String contains any Emojis, otherwise false
*/
private static boolean containsEmoji(String str) {
int length = str.length();
for (int i = 0; i &lt; length; i++) {
int type = Character.getType(str.charAt(i));
if (type == Character.SURROGATE || type == Character.OTHER_SYMBOL) {
return true;
}
}
return false;
}
}

EDIT:
If you can use Java 21 (Older versions of Java don't support that) you can make use of the isEmoji() Method.

答案3

得分: 1

以下是翻译好的部分:

这里是一个更新后的答案,包括你在评论中提供的资源。

它应该能够模仿_containsEmoji_方法,尽管它没有完全测试。

我使用了以下资源,Unicode,附录A – 表情符号属性和数据文件
具体来说,是emoji-data.txt数据文件。

我使用了以下正则表达式来捕获数值。

^([\dA-F]{4,5})(?:\.\.([\dA-F]{4,5}))?

还有一个打印值来编译源代码。

list.add(new int[] { 0x$1, 0x$2 });\n

将会有多个空的_$2_组,所以你需要使用一个_find-and-replace_。

由于代码超过了_30,000_个字符,我无法粘贴所有的代码。

public class EmojiUtil {
    static List&lt;int[]&gt; list = new ArrayList&lt;&gt;();

    static {
        /* https://unicode.org/Public/15.0.0/ucd/emoji/emoji-data.txt */
        list.add(new int[] { 0x0023 });
        list.add(new int[] { 0x002a });
        list.add(new int[] { 0x0030, 0x0039 });
        list.add(new int[] { 0x00a9 });
        list.add(new int[] { 0x00ae });
        list.add(new int[] { 0x203c });
        list.add(new int[] { 0x2049 });
        list.add(new int[] { 0x2122 });
        list.add(new int[] { 0x2139 });
        list.add(new int[] { 0x2194, 0x2199 });
        list.add(new int[] { 0x21a9, 0x21aa });
        list.add(new int[] { 0x231a, 0x231b });
        list.add(new int[] { 0x2328 });
        list.add(new int[] { 0x23cf });
        list.add(new int[] { 0x23e9, 0x23ec });
        list.add(new int[] { 0x23ed, 0x23ee });
        /* ... */
    }

    static boolean contains(String string) {
        char[] characters = string.toCharArray();
        char high, low;
        int index, limit;
        for (int[] values : list) {
            if (values.length == 1) limit = values[0];
            else limit = values[1];
            for (int codePoint = values[0]; codePoint &lt;= limit; codePoint++) {
                if (codePoint &gt; 0xffff) {
                    high = Character.highSurrogate(codePoint);
                    low = Character.lowSurrogate(codePoint);
                    if ((index = Arrays.binarySearch(characters, (char) high)) &gt;= 0) {
                        if (index + 1 &lt; characters.length &amp;&amp; characters[index + 1] == (char) low)
                            return true;
                    }
                } else if (Arrays.binarySearch(characters, (char) values[0]) &gt;= 0)
                    return true;
            }
        }
        return false;
    }
}

示例

EmojiUtil.contains("This string contains beans \uD83E\uDED8");

希望这对你有所帮助。如果你有其他问题,可以随时提出。

英文:

Here is an updated answer, including the resource you provided within the comments.

It should be able to mimic the containsEmoji method, although, it is not fully tested.

I used the following resource, Unicode, Annex A &ndash; Emoji Properties and Data Files.
Specifically, the emoji-data.txt data file.

I used the following regex to capture the values.

^([\dA-F]{4,5})(?:\.\.([\dA-F]{4,5}))?

And, a print value to compile the source.

list.add(new int[] { 0x$1, 0x$2 });\n

There will be several empty $2 groups, so you'll need to use a find-and-replace.

I cannot paste all the code, as it is over 30,000 characters.

public class EmojiUtil {
    static List&lt;int[]&gt; list = new ArrayList&lt;&gt;();

    static {
        /* https://unicode.org/Public/15.0.0/ucd/emoji/emoji-data.txt */
        list.add(new int[] { 0x0023 });
        list.add(new int[] { 0x002a });
        list.add(new int[] { 0x0030, 0x0039 });
        list.add(new int[] { 0x00a9 });
        list.add(new int[] { 0x00ae });
        list.add(new int[] { 0x203c });
        list.add(new int[] { 0x2049 });
        list.add(new int[] { 0x2122 });
        list.add(new int[] { 0x2139 });
        list.add(new int[] { 0x2194, 0x2199 });
        list.add(new int[] { 0x21a9, 0x21aa });
        list.add(new int[] { 0x231a, 0x231b });
        list.add(new int[] { 0x2328 });
        list.add(new int[] { 0x23cf });
        list.add(new int[] { 0x23e9, 0x23ec });
        list.add(new int[] { 0x23ed, 0x23ee });
        /* ... */
    }

    static boolean contains(String string) {
        char[] characters = string.toCharArray();
        char high, low;
        int index, limit;
        for (int[] values : list) {
            if (values.length == 1) limit = values[0];
            else limit = values[1];
            for (int codePoint = values[0]; codePoint &lt;= limit; codePoint++) {
                if (codePoint &gt; 0xffff) {
                    high = Character.highSurrogate(codePoint);
                    low = Character.lowSurrogate(codePoint);
                    if ((index = Arrays.binarySearch(characters, (char) high)) &gt;= 0) {
                        if (index + 1 &lt; characters.length &amp;&amp; characters[index + 1] == (char) low)
                            return true;
                    }
                } else if (Arrays.binarySearch(characters, (char) values[0]) &gt;= 0)
                    return true;
            }
        }
        return false;
    }
}

Example

EmojiUtil.contains(&quot;This string contains beans \uD83E\uDED8&quot;);

huangapple
  • 本文由 发表于 2023年6月8日 05:20:24
  • 转载请务必保留本文链接:https://go.coder-hub.com/76427196.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定