将字符串与表情符号映射到字符串或字符数组。

huangapple go评论106阅读模式
英文:

Map String with Emojis to Array<String|Char>

问题

我想将我的字符串转换为字符串数组或字符列表,例如:Array&lt;String&gt;Array&lt;Char&gt;

示例:

val myText = &quot;Ab&#128543;+&#129337;2&#128104;&#127995;#✅&#39;&#128104;&#252;&#128104;&#127999;{&quot; // 解析并打印到日志

应该是:

[ &quot;A&quot;, &quot;b&quot;, &quot;&#128543;&quot;, &quot;+&quot;, &quot;&#129337;&quot;, &quot;2&quot;, &quot;&#128104;&#127995;&quot;, &quot;#&quot;, &quot;✅&quot;, &quot;&#39;&quot;, &quot;&#128104;&quot;, &quot;&#252;&quot;, &quot;&#128104;&#127999;&quot;, &quot;{&quot; ] // 数组包含字符串或字符

由于Android上的表情符号而不起作用的Java/Kotlin方法:

myText.toList() // ❌ 由于表情符号而失败
myText.toMutableList() // ❌ 由于表情符号而失败
英文:

I want to convert my String to Array or List of Strings or Chars like: Array&lt;String&gt; or Array&lt;Char&gt;.

Example:

val myText = &quot;Ab&#128543;+&#129337;2&#128104;&#127995;#✅&#39;&#128104;&#252;&#128104;&#127999;{&quot; // Parse and print to Log

Should:

[ &quot;A&quot;, &quot;b&quot;, &quot;&#128543;&quot;, &quot;+&quot;, &quot;&#129337;&quot;, &quot;2&quot;, &quot;&#128104;&#127995;&quot;, &quot;#&quot;, &quot;✅&quot;, &quot;&#39;&quot;, &quot;&#128104;&quot;, &quot;&#252;&quot;, &quot;&#128104;&#127999;&quot;, &quot;{&quot; ] // Array contains Strings or Chars

Java/ Kotlin method what doesn't work because of Emojis on Android:

myText.toList() // ❌ Fails because of Emojis
myText.toMutableList() // ❌ Fails because of Emojis

答案1

得分: 7

在Kotlin中,如果目标是JDK 8或更高版本,你可以使用以下代码:

fun String.splitToCodePoints(): List<String> {
    return codePoints()
        .toList()
        .map { String(Character.toChars(it)) }
}

如果使用JDK 7,需要更多手动操作:

fun String.splitToCodePoints(): List<String> {
    val list = mutableListOf<String>()
    var count = 0
    while (count < length) {
        with (codePointAt(count)){
            list.add(String(Character.toChars(this)))
            count += Character.charCount(this)
        }
    }
    return list
}

由于Kotlin标准库在这些方面似乎存在不足,因此你必须依赖于JDK的装箱原始类来将代码点整数转换为字符串。

正如另一个答案中提到的,如果需要处理零宽连接器,则可能需要更多的操作。你可能需要删除任何零宽连接器,以便字符可以分开显示,或者你可能希望将它们一起显示,因此需要操作列表以合并由连接器分隔的元素。如果语言使用连字,这会影响这个决定。

英文:

In Kotlin, if targeting JDK 8 or later you can use:

fun String.splitToCodePoints(): List&lt;String&gt; {
    return codePoints()
        .toList()
        .map { String(Character.toChars(it)) }
}

If using JDK 7, it's more manual:

fun String.splitToCodePoints(): List&lt;String&gt; {
    val list = mutableListOf&lt;String&gt;()
    var count = 0
    while (count &lt; length) {
        with (codePointAt(count)){
            list.add(String(Character.toChars(this)))
            count += Character.charCount(this)
        }
    }
    return list
}

It seems the Kotlin standard library is lacking in these areas since you have to rely on JDK boxed primitive classes to convert the code points integers to Strings.

As mentioned in another answer here, this will have to be more involved if you need to handle the zero width joiner. You might need to remove any zero width joiners so the characters can be shown separately, or you might want to display them together and so need to manipulate the list to combine elements separated by joiners. If the language uses ligatures, this would affect this decision.

答案2

得分: 3

在Java中,您可以获取字符串的代码点流,并将每个代码点转换回字符串:

var myText = "Ab&#128543;+&#129337;2&#128104;&#127995;#✅&#39;&#128104;&#252;&#128104;&#127999;{";
String[] array = myText.codePoints()
    .boxed()
    .map(i -> new String(Character.toChars(i)))
    .toArray(String[]::new)

返回结果:

{ "A", "b", "😟", "+", "🤹", "2", "👨", "🏻", "#", "✅", "'", "👨", "ü", "👨", "🏿", "{" }

请注意,某些表情符号,如国旗或肤色和性别变体,由多个Unicode代码点连接,因此可能会或可能不会产生您想要的结果。

英文:

In Java, you can get a stream of the string's code points, and convert each of them back into a string:

var myText = &quot;Ab&#128543;+&#129337;2&#128104;&#127995;#✅&#39;&#128104;&#252;&#128104;&#127999;{&quot;;
String[] array = myText.codePoints()
    .boxed()
    .map(i -&gt; new String(Character.toChars(i)))
    .toArray(String[]::new)

Returns:

{ "A", "b", "😟", "+", "🤹", "2", "👨", "🏻", "#", "✅", "'", "👨", "ü", "👨", "🏿", "{" }

Note that some emojis, like flags or skin color and gender variations, are composed by joining multiple Unicode code points so this may or may not produce the result you want.

huangapple
  • 本文由 发表于 2020年7月30日 20:43:00
  • 转载请务必保留本文链接:https://go.coder-hub.com/63173425.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定