英文:
Emojis corrupted on iOS when using negation with unicode character class escapes
问题
你好!以下是你要求的翻译:
我正在使用正则表达式从字符串中删除非拉丁字母和非表情符号的字符。
由于现在广泛支持Unicode字符类转义,我使用它们来简化我的表达式。
const regex = new RegExp('[^(\\d\\s\\p{Script=Latin}\\p{gc=Punctuation}\\p{Extended_Pictographic})]+', 'gui');
function removeUnsupportedChars(txt: string) {
return txt.replace(this.characterEx, '');
}
这在PC和Android上运行正常。然而,在iOS上,当使用这个正则表达式时,表情符号会变成方块。
我创建了一个简化了的CodePen,在这个场景中重新现了这种情况,似乎在iOS上对Extended_Pictographic
类(或任何其他表情符号类)进行否定的任何用法都会导致它们的损坏。
这是否是iOS上已知的问题?有没有已知的解决方法(除了使用明确的表情符号列表)?
英文:
I'm using regular expressions to remove non-latin and non-emoji characters from strings.
As Unicode character class escapes are now widely supported, I used them to simplify my expressions.
const regex = new RegExp('[^(\\d\\s\\p{Script=Latin}\\p{gc=Punctuation}\\p{Extended_Pictographic})]+', 'gui');
function removeUnsupportedChars(txt: string) {
return txt.replace(this.characterEx, '');
}
This works on PC and on Android. However, on iOS, when using this regular expression, emojis get corrupted and shown as squares.
I created a minimal CodePen where the scenario is reproduced with a simplified regex and it seems like on iOS any usage of negation on the Extended_Pictographic
class (or any of the other emoji classes) leads to their corruption.
Is this a known issue on iOS? Any known workarounds (other than using explicit emoji lists)?
答案1
得分: 0
I found a workaround, but I'm still curious as for why negation of unicode character classes doesn't work on iOS.
I chose to use a positive regex
and use match
to combine the pieces that DO match the regex, instead of using a negative regex
with replace
:
const regex = /[\d\s\p{Script=Latin}\p{gc=Punctuation}\p{Currency_Symbol}\p{Emoji_Presentation}\p{Extended_Pictographic}]*/gui;
function removeUnsupportedChars(txt: string) {
const matches = txt.match(this.characterEx) || [];
return matches.join('');
}
英文:
I found a workaround, but I'm still curious as for why negation of unicode character classes doesn't work on iOS.
I chose to use a positive regex
and use match
to combine the pieces that DO match the regex, instead of using a negative regex
with replace
:
const regex = /[\d\s\p{Script=Latin}\p{gc=Punctuation}\p{Currency_Symbol}\p{Emoji_Presentation}\p{Extended_Pictographic}]*/gui;
function removeUnsupportedChars(txt: string) {
const matches = txt.match(this.characterEx) || [];
return matches.join('');
}
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论