Emojis corrupted on iOS when using negation with unicode character class escapes.

huangapple go评论74阅读模式
英文:

Emojis corrupted on iOS when using negation with unicode character class escapes

问题

你好!以下是你要求的翻译:

我正在使用正则表达式从字符串中删除非拉丁字母和非表情符号的字符。

由于现在广泛支持Unicode字符类转义,我使用它们来简化我的表达式。

const regex = new RegExp('[^(\\d\\s\\p{Script=Latin}\\p{gc=Punctuation}\\p{Extended_Pictographic})]+', 'gui');

function removeUnsupportedChars(txt: string) {
   return txt.replace(this.characterEx, '');
}

这在PC和Android上运行正常。然而,在iOS上,当使用这个正则表达式时,表情符号会变成方块。

Emojis corrupted on iOS when using negation with unicode character class escapes.

我创建了一个简化了的CodePen,在这个场景中重新现了这种情况,似乎在iOS上Extended_Pictographic类(或任何其他表情符号类)进行否定的任何用法都会导致它们的损坏

Emojis corrupted on iOS when using negation with unicode character class escapes.

这是否是iOS上已知的问题?有没有已知的解决方法(除了使用明确的表情符号列表)?

英文:

I'm using regular expressions to remove non-latin and non-emoji characters from strings.

As Unicode character class escapes are now widely supported, I used them to simplify my expressions.

const regex = new RegExp('[^(\\d\\s\\p{Script=Latin}\\p{gc=Punctuation}\\p{Extended_Pictographic})]+', 'gui');

function removeUnsupportedChars(txt: string) {
   return txt.replace(this.characterEx, '');
}

This works on PC and on Android. However, on iOS, when using this regular expression, emojis get corrupted and shown as squares.

Emojis corrupted on iOS when using negation with unicode character class escapes.

I created a minimal CodePen where the scenario is reproduced with a simplified regex and it seems like on iOS any usage of negation on the Extended_Pictographic class (or any of the other emoji classes) leads to their corruption.

Emojis corrupted on iOS when using negation with unicode character class escapes.

Is this a known issue on iOS? Any known workarounds (other than using explicit emoji lists)?

答案1

得分: 0

I found a workaround, but I'm still curious as for why negation of unicode character classes doesn't work on iOS.

I chose to use a positive regex and use match to combine the pieces that DO match the regex, instead of using a negative regex with replace:

const regex = /[\d\s\p{Script=Latin}\p{gc=Punctuation}\p{Currency_Symbol}\p{Emoji_Presentation}\p{Extended_Pictographic}]*/gui;

function removeUnsupportedChars(txt: string) {
    const matches = txt.match(this.characterEx) || [];
    return matches.join('');
}
英文:

I found a workaround, but I'm still curious as for why negation of unicode character classes doesn't work on iOS.

I chose to use a positive regex and use match to combine the pieces that DO match the regex, instead of using a negative regex with replace:

const regex = /[\d\s\p{Script=Latin}\p{gc=Punctuation}\p{Currency_Symbol}\p{Emoji_Presentation}\p{Extended_Pictographic}]*/gui;

function removeUnsupportedChars(txt: string) {
    const matches = txt.match(this.characterEx) || [];
    return matches.join('');
}

huangapple
  • 本文由 发表于 2023年5月10日 15:31:48
  • 转载请务必保留本文链接:https://go.coder-hub.com/76215943.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定