2022年9月16日 16:40:42go评论74阅读模式

英文:

Using package`regexp` to find all mactch substring in Golang, but get unexpected result

问题

我正在使用Go语言的regexp包来查找所有匹配的子字符串，但是得到了意外的结果。以下是我的代码：

package main

import (
	"fmt"
	"regexp"
)

func main() {
	str := "build: xxxxxx Prefix:middle#6\nPrefix:middle#16026Prefix:middle#1111\n Prefix:middle#110 Prefix:middle.#2 Prefix:middl.e#111 Prefix:middle#112"
	regexpStr := "\\bPrefix:([a-zA-Z0-9]+[\\w-.]+[^.])#[0-9]+"
	re := regexp.MustCompile(regexpStr)
	matchs := re.FindAllString(str, -1)
	fmt.Println(matchs)
}

你可以在https://go.dev/play/p/XFSMW09MKxV中查看它。

期望的结果是：

[Prefix:middle#6 Prefix:middle#110 Prefix:middl.e#111 Prefix:middle#112]

但是我得到的结果是：

[Prefix:middle#6 Prefix:middle#16026 Prefix:middle#110 Prefix:middl.e#111 Prefix:middle#112]

为什么会匹配到Prefix:middle#16026？有人能告诉我原因吗？如何修复这个问题？谢谢。

以下是匹配规则：

我想从一个字符串中提取Prefix:${middle}#${number}。

${middle}的规则：
- 允许的字符：字母、数字、下划线、点
- 必须以字母或数字开头
- 不能以点结尾
${number}的规则：
- 必须是数字
Prefix:${middle}#${number}可以出现在字符串的开头、结尾或中间，但是：
- 出现在字符串开头时，需要在后面跟着一个空格或换行符；
- 出现在字符串结尾时，需要在前面跟着一个空格或换行符；
- 出现在字符串中间时，需要在前后都有换行符或空格。

英文:

I am using package regexp to find all mactch substring in Golang, but get unexpected result.Here is my code:

package main

import (
	&quot;fmt&quot;
	&quot;regexp&quot;
)

func main() {
	str := &quot;build: xxxxxx Prefix:middle#6\nPrefix:middle#16026Prefix:middle#1111\n Prefix:middle#110 Prefix:middle.#2 Prefix:middl.e#111 Prefix:middle#112&quot;
	regexpStr := &quot;\\bPrefix:([a-zA-Z0-9]+[\\w-.]+[^.])#[0-9]+&quot;
	re := regexp.MustCompile(regexpStr)
	matchs := re.FindAllString(str, -1)
	fmt.Println(matchs)
}

You can see it in https://go.dev/play/p/XFSMW09MKxV.

expected:

[Prefix:middle#6 Prefix:middle#110 Prefix:middl.e#111 Prefix:middle#112]

But I got:

[Prefix:middle#6 Prefix:middle#16026 Prefix:middle#110 Prefix:middl.e#111 Prefix:middle#112]

Why Prefix:middle#16026 macthed? Could someone tell me the reason? And how to fix it, thx.

Here is the rules for what should match:

I want to extract Prefix:${middle}#${number} in a String.

${middle} rules:
- Allowed characters: letters, numbers, underscores, underscores, dots
- Must begin with a letter or number
- Can't end with a dot
${number} rules:
- Shoule be number
Prefix:${middle}#${number} can appear at the beginning or end of a string, or in the middle of a string, but：
- Appear at the beginning of the string, it needs to be followed by a space or /n;
- Appear at the end of the string, it needs to be preceded by a space or /n;
- in the middle of the string, but it needs to be preceded and followed by a newline symbol or a space.

答案1

得分: 2

你可以使用以下正则表达式与regexp.FindAllStringSubmatch一起使用：

(?:\s|^)(Prefix:[a-zA-Z0-9][\w.-]*[^.]#\d+)(?:\s|$)

请参见正则表达式演示。

注意，该模式仅在字符串中的空格加倍后才能正常工作，因为空格边界(?:\s|^)和(?:\s|$)都是消耗模式，并且会阻止连续匹配。因此，在运行上述正则表达式之前，应使用regexp.MustCompile(\s).ReplaceAllString(str, "$0$0")或类似的方法。

详细说明：

(?:\s|^) - 空格或字符串的开头
(Prefix:[a-zA-Z0-9][\w.-]*[^.]#\d+) - 第1组：
- Prefix: - 固定字符串
- [a-zA-Z0-9] - 字母数字字符
- [\w.-]* - 零个或多个字母、数字、下划线、点或连字符
- [^.] - 除.之外的字符
- # - #字符
- \d+ - 一个或多个数字
(?:\s|$) - 空格或字符串的结尾

请参见Go演示：

package main

import (
	"fmt"
	"regexp"
)

func main() {
	str := "Prefix:middle#113 build: xxxxxx Prefix:middle#6\nPrefix:middle#16026Prefix:middle#1111\n Prefix:middle#110 Prefix:middle.#2 Prefix:middl.e#111 Prefix:middle#112"
	re := regexp.MustCompile(`(?:\s|^)(Prefix:[a-zA-Z0-9][\w.-]*[^.]#\d+)(?:\s|$)`)
	matchs := re.FindAllStringSubmatch(regexp.MustCompile(`\s`).ReplaceAllString(str, "$0$0"), -1)
	for _, m := range matchs {
		fmt.Println(m[1])
	}
}

输出：

Prefix:middle#113
Prefix:middle#6
Prefix:middle#110
Prefix:middl.e#111
Prefix:middle#112

英文:

You can use the following regex with regexp.FindAllStringSubmatch:

(?:\s|^)(Prefix:[a-zA-Z0-9][\w.-]*[^.]#\d+)(?:\s|$)

See the regex demo.

Note that this pattern will only work after doubling whitespaces in the string because both the whitespace boundaries, (?:\s|^) and (?:\s|$), are consuming patterns, and will prevent getting consecutive matches. Hence, regexp.MustCompile(\s).ReplaceAllString(str, "$0$0") or similar should be used before running the above regex.

Details:

(?:\s|^) - either a whitespace or start of string
(Prefix:[a-zA-Z0-9][\w.-]*[^.]#\d+) - Group 1:
- Prefix: - a fixed string
- [a-zA-Z0-9] - an alphanumeric
- [\w.-]* - zero or more letters, digits, underscores, dots or hyphens
- [^.] - a char other than .
- # - a # char
- \d+ - one or more digits
(?:\s|$) - either a whitespace or end of string

See the Go demo:

package main

import (
	&quot;fmt&quot;
	&quot;regexp&quot;
)

func main() {
	str := &quot;Prefix:middle#113 build: xxxxxx Prefix:middle#6\nPrefix:middle#16026Prefix:middle#1111\n Prefix:middle#110 Prefix:middle.#2 Prefix:middl.e#111 Prefix:middle#112&quot;
	re := regexp.MustCompile(`(?:\s|^)(Prefix:[a-zA-Z0-9][\w.-]*[^.]#\d+)(?:\s|$)`)
	matchs := re.FindAllStringSubmatch(regexp.MustCompile(`\s`).ReplaceAllString(str, &quot;$0$0&quot;), -1)
	for _, m := range matchs {
		fmt.Println(m[1])
	}
}

Output:

Prefix:middle#113
Prefix:middle#6
Prefix:middle#110
Prefix:middl.e#111
Prefix:middle#112

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

使用`regexp`包在Golang中查找所有匹配的子字符串，但是得到了意外的结果。

问题

答案1

可以绕过 Go 模块的校验和验证吗？

在Go的select语句中的优先级解决方法

How to dynamically parse request body in go fiber?

如何根据键前缀解码JSON对象

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论