英文:
Matching multiple unicode characters in Golang Regexp
问题
作为一个简化的例子,我想要将 ^⬛+$
与 ⬛⬛⬛
进行匹配,以得到匹配结果 ⬛⬛⬛
。
r := regexp.MustCompile("^⬛+$")
matches := r.FindString("⬛️⬛️⬛️")
fmt.Println(matches)
但是尽管这对于常规的ASCII字符有效,但它并不能成功匹配。
我猜测这可能与Unicode匹配有关,但我还没有在文档中找到任何合理的解释。
有人能解释一下这个问题吗?
英文:
As a simplified example, I want to get ^⬛+$
matched against ⬛⬛⬛
to yield a find match of ⬛⬛⬛
.
r := regexp.MustCompile("^⬛+$")
matches := r.FindString("⬛️⬛️⬛️")
fmt.Println(matches)
But it doesn't match successfully even though this would work with regular ASCII characters.
I'm guessing there's something I don't know about Unicode matching, but I haven't found any decent explanation in documentation yet.
Can someone explain the problem?
答案1
得分: 4
你需要考虑字符串中的所有字符。如果你分析字符串,你会发现它包含:
所以你需要一个正则表达式,它能匹配包含一个或多个\x{2B1B}
和\x{FE0F}
字符的字符串,直到字符串的末尾。
所以你需要使用:
^(?:\x{2B1B}\x{FE0F})+$
请参考正则表达式演示。
注意,你可以使用\p{M}
来匹配任何变音符号:
^(?:\x{2B1B}\p{M})+$
请参考Go演示:
package main
import (
"fmt"
"regexp"
)
func main() {
r := regexp.MustCompile(`^(?:\x{2B1B}\x{FE0F})+$`)
matches := r.FindString("⬛️⬛️⬛️")
fmt.Println(matches)
}
英文:
You need to account for all chars in the string. If you analyze the string you will see it contains:
So you need a regex that will match a string containing one or more combinations of \x{2B1B}
and \x{FE0F}
chars till end of string.
So you need to use
^(?:\x{2B1B}\x{FE0F})+$
See the regex demo.
Note you can use \p{M}
to match any diacritic mark:
^(?:\x{2B1B}\p{M})+$
See the Go demo:
package main
import (
"fmt"
"regexp"
)
func main() {
r := regexp.MustCompile(`^(?:\x{2B1B}\x{FE0F})+$`)
matches := r.FindString("⬛️⬛️⬛️")
fmt.Println(matches)
}
答案2
得分: 0
正则表达式匹配一个包含一个或多个⬛(黑色方块)的字符串。
主题字符串是三对黑色方块和变异选择器-16。变异选择器在我的终端上是不可见的,并且阻止了匹配。
通过从主题字符串中删除变异选择器或将变异选择器添加到模式中来修复。
这是第一个修复:https://go.dev/play/p/oKIVnkC7TZ1
英文:
The regular expression matches a string containing one or more ⬛ (black square box).
The subject string is three pairs of black square box and variation selector-16. The variation selectors are invisible (on my terminal) and prevent a match.
Fix by removing the variation selectors from the subject string or adding the variation selector to the pattern.
Here's the first fix: https://go.dev/play/p/oKIVnkC7TZ1
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论