How to use capture groups in GoLang?

huangapple go评论64阅读模式
英文:

How to use capture groups in GoLang?

问题

我需要使用正则表达式获取 Golang 日志,但是正则表达式对于 "msg" 捕获组的输出是不正确的。我有这个函数来从文件内容中提取日志语句:

func extractLogStatements(content string) []LogStatement {
	logPattern := `flog\.(?P<sev>.*?)\(\s*flog.(?P<type>.*?),\s*("|\fmt.Sprintf\(")(?P<msg>.*?)\"`

	re := regexp.MustCompile(logPattern)
	matches := re.FindAllStringSubmatch(content, -1)

	logStatements := make([]LogStatement, 0, len(matches))
	for _, match := range matches {
		statement := LogStatement{
			Sev:  match[1],
			Type: match[2],
			Msg:  match[3],
		}
		logStatements = append(logStatements, statement)
	}

	return logStatements
}

除了函数中第一行的正则表达式模式之外,其他都正常工作。尽管我在在线正则表达式解析器上测试时正常工作。

以下是我一直在测试的一些日志示例:

flog.Info(flog.Application, fmt.Sprintf("unable to translate address for config: %v", err))

flog.Info(flog.Application, "unable to translate address for config")

flog.Info(flog.Application, fmt.Sprintf("Test 1"),
    lm.CrKind, objectType,
    lm.CrName, crName,
    lm.AppNS, namespace)

对于第一个日志示例,它应该提取 "Info"("sev" 捕获组),"Application"("type" 捕获组)和 "unable to translate address for config: %v"("msg" 捕获组)。当我输出为 JSON 时,得到的结果是:

[
	{
		"sev": "Info",
		"type": "Application",
		"msg": "fmt.Sprintf(\""
	},
	{
		"sev": "Info",
		"type": "Application",
		"msg": "fmt.Sprintf(\""
	},
	{
		"sev": "Info",
		"type": "Application",
		"msg": "fmt.Sprintf(\""
	},
]

所以它正确捕获了 "sev" 和 "type" 捕获组,但是对于 "msg" 捕获组,它捕获的是 "fmt.Sprintf("",而不是应该获取的 "unable to translate address for config: %v"。

英文:

I need to get Golang logs using regex but the output of the regex is incorrect for the "msg" capture group. I have this function here to extract log statements from the contents of a file:

func extractLogStatements(content string) []LogStatement {
	logPattern := `flog\.(?P&lt;sev&gt;.*?)\(\s*flog.(?P&lt;type&gt;.*?),\s*(&quot;|fmt.Sprintf\(&quot;)(?P&lt;msg&gt;.*?)&quot;`

	re := regexp.MustCompile(logPattern)
	matches := re.FindAllStringSubmatch(content, -1)

	logStatements := make([]LogStatement, 0, len(matches))
	for _, match := range matches {
		statement := LogStatement{
			Sev:  match[1],
			Type: match[2],
			Msg:  match[3],
		}
		logStatements = append(logStatements, statement)
	}

	return logStatements
}

Everything works correctly except the regex pattern on the first line in the function is not capturing the correct values for the capture groups, even though when I tested on an online regex parser it worked fine.

Here are some examples of the logs I've been testing on:

flog.Info(flog.Application, fmt.Sprintf(&quot;unable to translate address for config: %v&quot;, err))

flog.Info(flog.Application, &quot;unable to translate address for config&quot;)

flog.Info(flog.Application, fmt.Sprintf(&quot;Test 1&quot;),
    lm.CrKind, objectType,
    lm.CrName, crName,
    lm.AppNS, namespace)

For the first log example, it should extract "Info" ("sev" capture group), "Application" ("type" capture group), and "unable to translate address for config: %v" ("msg" capture group). When I output to json I get:

[
	{
		&quot;sev&quot;: &quot;Info&quot;,
		&quot;type&quot;: &quot;Application&quot;,
		&quot;msg&quot;: &quot;fmt.Sprintf(\&quot;&quot;
	},
	{
		&quot;sev&quot;: &quot;Info&quot;,
		&quot;type&quot;: &quot;Application&quot;,
		&quot;msg&quot;: &quot;fmt.Sprintf(\&quot;&quot;
	},
	{
		&quot;sev&quot;: &quot;Info&quot;,
		&quot;type&quot;: &quot;Application&quot;,
		&quot;msg&quot;: &quot;fmt.Sprintf(\&quot;&quot;
	},
]

So it's capturing the "sev" and "type" capture groups correctly but for the "msg" it's capturing "fmt.Sprintf(&quot;" when it should be getting "unable to translate address for config: %v".

答案1

得分: 1

match[3] 存储了组 (&quot;|fmt.Sprintf\(&quot;) 的值。如果你不想捕获它,可以使用 ?: 将其转换为非捕获组。

(?:&quot;|fmt.Sprintf\(&quot;)

由于你想要的所有值都是由命名捕获组捕获的,另一种解决方案是通过名称引用它们:

for _, match := range matches {
	statement := LogStatement{
		Sev:  match[re.SubexpIndex(&quot;sev&quot;)],
		Type: match[re.SubexpIndex(&quot;type&quot;)],
		Msg:  match[re.SubexpIndex(&quot;msg&quot;)],
	}
	logStatements = append(logStatements, statement)
}
英文:

match[3] stores the value for the group (&quot;|fmt.Sprintf\(&quot;). If you don't want to capture it, use ?: to turn it into a non-capturing group.

(?:&quot;|fmt.Sprintf\(&quot;)

Since all the values you want are captured by named capture groups, another solution is to reference them by name:

for _, match := range matches {
	statement := LogStatement{
		Sev:  match[re.SubexpIndex(&quot;sev&quot;)],
		Type: match[re.SubexpIndex(&quot;type&quot;)],
		Msg:  match[re.SubexpIndex(&quot;msg&quot;)],
	}
	logStatements = append(logStatements, statement)
}

huangapple
  • 本文由 发表于 2023年6月29日 08:38:02
  • 转载请务必保留本文链接:https://go.coder-hub.com/76577454.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定