英文:
PHP - strlen() doesn't work correctly after removing emojis from string
问题
我有这个字符串:
b🤵♀️🤵♀️b
在移除笑脸和特殊字符之后:
$str = preg_replace('/[^ -\x{2122}]\s+|\s*[^ -\x{2122}]/u','',$str);
$str = trim($str);
...
strlen($str);
给出的结果是8而不是2,为什么以及如何修复这个问题?
英文:
I have this string:
b🤵♀️🤵♀️b
After removing the smilies and special chars:
$str = preg_replace('/[^ -\x{2122}]\s+|\s*[^ -\x{2122}]/u','',$str);
$str = trim($str);
...
strlen($str);
gives me 8 instead of 2, why and how to fix this?
答案1
得分: 2
正则表达式无法完全移除所有特殊字符。一个特殊的调试器显示了在 preg_replace 后仍然存在的字符。
"b\u{200d}\u{200d}b"
或者 8 字节
"b\xe2\x80\x8d\xe2\x80\x8db"
原始字符串中的 字符 \u{200d} 位于表情符号之间。对于这个特定示例来说,移除这些字符并不困难。
$str = preg_replace('/[^ -\x{2122}]\s+|\s*[^ -\x{2122}]|\x{200d}/u','',$str);
然而,如果其他特殊字符也可能出现,这并不是一个解决方案。
英文:
The regular expression is not sufficient to remove all special characters. A special debugger shows which characters are still present after the preg_replace.
"b\u{200d}\u{200d}b"
or as 8 bytes
"b\xe2\x80\x8d\xe2\x80\x8db"
The characters \u{200d} are in the original string between the emojis. Removing these characters for the specific example here is not difficult.
$str = preg_replace('/[^ -\x{2122}]\s+|\s*[^ -\x{2122}]|\x{200d}/u','',$str);
However, this is not a solution if other special characters can also occur.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论