英文:
Using package`regexp` to find all mactch substring in Golang, but get unexpected result
问题
我正在使用Go语言的regexp
包来查找所有匹配的子字符串,但是得到了意外的结果。以下是我的代码:
package main
import (
"fmt"
"regexp"
)
func main() {
str := "build: xxxxxx Prefix:middle#6\nPrefix:middle#16026Prefix:middle#1111\n Prefix:middle#110 Prefix:middle.#2 Prefix:middl.e#111 Prefix:middle#112"
regexpStr := "\\bPrefix:([a-zA-Z0-9]+[\\w-.]+[^.])#[0-9]+"
re := regexp.MustCompile(regexpStr)
matchs := re.FindAllString(str, -1)
fmt.Println(matchs)
}
你可以在https://go.dev/play/p/XFSMW09MKxV中查看它。
期望的结果是:
[Prefix:middle#6 Prefix:middle#110 Prefix:middl.e#111 Prefix:middle#112]
但是我得到的结果是:
[Prefix:middle#6 Prefix:middle#16026 Prefix:middle#110 Prefix:middl.e#111 Prefix:middle#112]
为什么会匹配到Prefix:middle#16026
?有人能告诉我原因吗?如何修复这个问题?谢谢。
以下是匹配规则:
我想从一个字符串中提取Prefix:${middle}#${number}
。
-
${middle}
的规则:- 允许的字符:字母、数字、下划线、点
- 必须以字母或数字开头
- 不能以点结尾
-
${number}
的规则:- 必须是数字
-
Prefix:${middle}#${number}
可以出现在字符串的开头、结尾或中间,但是:- 出现在字符串开头时,需要在后面跟着一个空格或换行符;
- 出现在字符串结尾时,需要在前面跟着一个空格或换行符;
- 出现在字符串中间时,需要在前后都有换行符或空格。
英文:
I am using package regexp
to find all mactch substring in Golang, but get unexpected result.Here is my code:
package main
import (
"fmt"
"regexp"
)
func main() {
str := "build: xxxxxx Prefix:middle#6\nPrefix:middle#16026Prefix:middle#1111\n Prefix:middle#110 Prefix:middle.#2 Prefix:middl.e#111 Prefix:middle#112"
regexpStr := "\\bPrefix:([a-zA-Z0-9]+[\\w-.]+[^.])#[0-9]+"
re := regexp.MustCompile(regexpStr)
matchs := re.FindAllString(str, -1)
fmt.Println(matchs)
}
You can see it in https://go.dev/play/p/XFSMW09MKxV.
expected:
[Prefix:middle#6 Prefix:middle#110 Prefix:middl.e#111 Prefix:middle#112]
But I got:
[Prefix:middle#6 Prefix:middle#16026 Prefix:middle#110 Prefix:middl.e#111 Prefix:middle#112]
Why Prefix:middle#16026
macthed? Could someone tell me the reason? And how to fix it, thx.
Here is the rules for what should match:
I want to extract Prefix:${middle}#${number}
in a String
.
-
${middle}
rules:- Allowed characters: letters, numbers, underscores, underscores, dots
- Must begin with a letter or number
- Can't end with a dot
-
${number}
rules:- Shoule be number
-
Prefix:${middle}#${number}
can appear at the beginning or end of a string, or in the middle of a string, but:- Appear at the beginning of the string, it needs to be followed by a space or
/n
; - Appear at the end of the string, it needs to be preceded by a space or
/n
; - in the middle of the string, but it needs to be preceded and followed by a newline symbol or a space.
- Appear at the beginning of the string, it needs to be followed by a space or
答案1
得分: 2
你可以使用以下正则表达式与regexp.FindAllStringSubmatch
一起使用:
(?:\s|^)(Prefix:[a-zA-Z0-9][\w.-]*[^.]#\d+)(?:\s|$)
请参见正则表达式演示。
注意,该模式仅在字符串中的空格加倍后才能正常工作,因为空格边界(?:\s|^)
和(?:\s|$)
都是消耗模式,并且会阻止连续匹配。因此,在运行上述正则表达式之前,应使用regexp.MustCompile(
\s).ReplaceAllString(str, "$0$0")
或类似的方法。
详细说明:
(?:\s|^)
- 空格或字符串的开头(Prefix:[a-zA-Z0-9][\w.-]*[^.]#\d+)
- 第1组:Prefix:
- 固定字符串[a-zA-Z0-9]
- 字母数字字符[\w.-]*
- 零个或多个字母、数字、下划线、点或连字符[^.]
- 除.
之外的字符#
-#
字符\d+
- 一个或多个数字
(?:\s|$)
- 空格或字符串的结尾
请参见Go演示:
package main
import (
"fmt"
"regexp"
)
func main() {
str := "Prefix:middle#113 build: xxxxxx Prefix:middle#6\nPrefix:middle#16026Prefix:middle#1111\n Prefix:middle#110 Prefix:middle.#2 Prefix:middl.e#111 Prefix:middle#112"
re := regexp.MustCompile(`(?:\s|^)(Prefix:[a-zA-Z0-9][\w.-]*[^.]#\d+)(?:\s|$)`)
matchs := re.FindAllStringSubmatch(regexp.MustCompile(`\s`).ReplaceAllString(str, "$0$0"), -1)
for _, m := range matchs {
fmt.Println(m[1])
}
}
输出:
Prefix:middle#113
Prefix:middle#6
Prefix:middle#110
Prefix:middl.e#111
Prefix:middle#112
英文:
You can use the following regex with regexp.FindAllStringSubmatch
:
(?:\s|^)(Prefix:[a-zA-Z0-9][\w.-]*[^.]#\d+)(?:\s|$)
See the regex demo.
Note that this pattern will only work after doubling whitespaces in the string because both the whitespace boundaries, (?:\s|^)
and (?:\s|$)
, are consuming patterns, and will prevent getting consecutive matches. Hence, regexp.MustCompile(
\s).ReplaceAllString(str, "$0$0")
or similar should be used before running the above regex.
Details:
(?:\s|^)
- either a whitespace or start of string(Prefix:[a-zA-Z0-9][\w.-]*[^.]#\d+)
- Group 1:Prefix:
- a fixed string[a-zA-Z0-9]
- an alphanumeric[\w.-]*
- zero or more letters, digits, underscores, dots or hyphens[^.]
- a char other than.
#
- a#
char\d+
- one or more digits
(?:\s|$)
- either a whitespace or end of string
See the Go demo:
package main
import (
"fmt"
"regexp"
)
func main() {
str := "Prefix:middle#113 build: xxxxxx Prefix:middle#6\nPrefix:middle#16026Prefix:middle#1111\n Prefix:middle#110 Prefix:middle.#2 Prefix:middl.e#111 Prefix:middle#112"
re := regexp.MustCompile(`(?:\s|^)(Prefix:[a-zA-Z0-9][\w.-]*[^.]#\d+)(?:\s|$)`)
matchs := re.FindAllStringSubmatch(regexp.MustCompile(`\s`).ReplaceAllString(str, "$0$0"), -1)
for _, m := range matchs {
fmt.Println(m[1])
}
}
Output:
Prefix:middle#113
Prefix:middle#6
Prefix:middle#110
Prefix:middl.e#111
Prefix:middle#112
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论