使用`regexp`包在Golang中查找所有匹配的子字符串,但是得到了意外的结果。

huangapple go评论74阅读模式
英文:

Using package`regexp` to find all mactch substring in Golang, but get unexpected result

问题

我正在使用Go语言的regexp包来查找所有匹配的子字符串,但是得到了意外的结果。以下是我的代码:

package main

import (
	"fmt"
	"regexp"
)

func main() {
	str := "build: xxxxxx Prefix:middle#6\nPrefix:middle#16026Prefix:middle#1111\n Prefix:middle#110 Prefix:middle.#2 Prefix:middl.e#111 Prefix:middle#112"
	regexpStr := "\\bPrefix:([a-zA-Z0-9]+[\\w-.]+[^.])#[0-9]+"
	re := regexp.MustCompile(regexpStr)
	matchs := re.FindAllString(str, -1)
	fmt.Println(matchs)
}

你可以在https://go.dev/play/p/XFSMW09MKxV中查看它。

期望的结果是:

[Prefix:middle#6 Prefix:middle#110 Prefix:middl.e#111 Prefix:middle#112]

但是我得到的结果是:

[Prefix:middle#6 Prefix:middle#16026 Prefix:middle#110 Prefix:middl.e#111 Prefix:middle#112]

为什么会匹配到Prefix:middle#16026?有人能告诉我原因吗?如何修复这个问题?谢谢。

以下是匹配规则:

我想从一个字符串中提取Prefix:${middle}#${number}

  • ${middle}的规则:

    • 允许的字符:字母、数字、下划线、点
    • 必须以字母或数字开头
    • 不能以点结尾
  • ${number}的规则:

    • 必须是数字
  • Prefix:${middle}#${number}可以出现在字符串的开头、结尾或中间,但是:

    • 出现在字符串开头时,需要在后面跟着一个空格或换行符;
    • 出现在字符串结尾时,需要在前面跟着一个空格或换行符;
    • 出现在字符串中间时,需要在前后都有换行符或空格。
英文:

I am using package regexp to find all mactch substring in Golang, but get unexpected result.Here is my code:

package main

import (
	"fmt"
	"regexp"
)

func main() {
	str := "build: xxxxxx Prefix:middle#6\nPrefix:middle#16026Prefix:middle#1111\n Prefix:middle#110 Prefix:middle.#2 Prefix:middl.e#111 Prefix:middle#112"
	regexpStr := "\\bPrefix:([a-zA-Z0-9]+[\\w-.]+[^.])#[0-9]+"
	re := regexp.MustCompile(regexpStr)
	matchs := re.FindAllString(str, -1)
	fmt.Println(matchs)
}

You can see it in https://go.dev/play/p/XFSMW09MKxV.

expected:

[Prefix:middle#6 Prefix:middle#110 Prefix:middl.e#111 Prefix:middle#112]

But I got:

[Prefix:middle#6 Prefix:middle#16026 Prefix:middle#110 Prefix:middl.e#111 Prefix:middle#112]

Why Prefix:middle#16026 macthed? Could someone tell me the reason? And how to fix it, thx.

Here is the rules for what should match:

I want to extract Prefix:${middle}#${number} in a String.

  • ${middle} rules:

    • Allowed characters: letters, numbers, underscores, underscores, dots
    • Must begin with a letter or number
    • Can't end with a dot
  • ${number} rules:

    • Shoule be number
  • Prefix:${middle}#${number} can appear at the beginning or end of a string, or in the middle of a string, but:

    • Appear at the beginning of the string, it needs to be followed by a space or /n;
    • Appear at the end of the string, it needs to be preceded by a space or /n;
    • in the middle of the string, but it needs to be preceded and followed by a newline symbol or a space.

答案1

得分: 2

你可以使用以下正则表达式与regexp.FindAllStringSubmatch一起使用:

(?:\s|^)(Prefix:[a-zA-Z0-9][\w.-]*[^.]#\d+)(?:\s|$)

请参见正则表达式演示

注意,该模式仅在字符串中的空格加倍后才能正常工作,因为空格边界(?:\s|^)(?:\s|$)都是消耗模式,并且会阻止连续匹配。因此,在运行上述正则表达式之前,应使用regexp.MustCompile(\s).ReplaceAllString(str, "$0$0")或类似的方法。

详细说明

  • (?:\s|^) - 空格或字符串的开头
  • (Prefix:[a-zA-Z0-9][\w.-]*[^.]#\d+) - 第1组:
    • Prefix: - 固定字符串
    • [a-zA-Z0-9] - 字母数字字符
    • [\w.-]* - 零个或多个字母、数字、下划线、点或连字符
    • [^.] - 除.之外的字符
    • # - #字符
    • \d+ - 一个或多个数字
  • (?:\s|$) - 空格或字符串的结尾

请参见Go演示

package main

import (
	"fmt"
	"regexp"
)

func main() {
	str := "Prefix:middle#113 build: xxxxxx Prefix:middle#6\nPrefix:middle#16026Prefix:middle#1111\n Prefix:middle#110 Prefix:middle.#2 Prefix:middl.e#111 Prefix:middle#112"
	re := regexp.MustCompile(`(?:\s|^)(Prefix:[a-zA-Z0-9][\w.-]*[^.]#\d+)(?:\s|$)`)
	matchs := re.FindAllStringSubmatch(regexp.MustCompile(`\s`).ReplaceAllString(str, "$0$0"), -1)
	for _, m := range matchs {
		fmt.Println(m[1])
	}
}

输出:

Prefix:middle#113
Prefix:middle#6
Prefix:middle#110
Prefix:middl.e#111
Prefix:middle#112
英文:

You can use the following regex with regexp.FindAllStringSubmatch:

(?:\s|^)(Prefix:[a-zA-Z0-9][\w.-]*[^.]#\d+)(?:\s|$)

See the regex demo.

Note that this pattern will only work after doubling whitespaces in the string because both the whitespace boundaries, (?:\s|^) and (?:\s|$), are consuming patterns, and will prevent getting consecutive matches. Hence, regexp.MustCompile(\s).ReplaceAllString(str, "$0$0") or similar should be used before running the above regex.

Details:

  • (?:\s|^) - either a whitespace or start of string
  • (Prefix:[a-zA-Z0-9][\w.-]*[^.]#\d+) - Group 1:
    • Prefix: - a fixed string
    • [a-zA-Z0-9] - an alphanumeric
    • [\w.-]* - zero or more letters, digits, underscores, dots or hyphens
    • [^.] - a char other than .
    • # - a # char
    • \d+ - one or more digits
  • (?:\s|$) - either a whitespace or end of string

See the Go demo:

package main

import (
	"fmt"
	"regexp"
)

func main() {
	str := "Prefix:middle#113 build: xxxxxx Prefix:middle#6\nPrefix:middle#16026Prefix:middle#1111\n Prefix:middle#110 Prefix:middle.#2 Prefix:middl.e#111 Prefix:middle#112"
	re := regexp.MustCompile(`(?:\s|^)(Prefix:[a-zA-Z0-9][\w.-]*[^.]#\d+)(?:\s|$)`)
	matchs := re.FindAllStringSubmatch(regexp.MustCompile(`\s`).ReplaceAllString(str, "$0$0"), -1)
	for _, m := range matchs {
		fmt.Println(m[1])
	}
}

Output:

Prefix:middle#113
Prefix:middle#6
Prefix:middle#110
Prefix:middl.e#111
Prefix:middle#112

huangapple
  • 本文由 发表于 2022年9月16日 16:40:42
  • 转载请务必保留本文链接:https://go.coder-hub.com/73742156.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定