正则表达式匹配包含”error”或”warn”(不区分大小写)的完整行,使用golang。

huangapple go评论96阅读模式
英文:

Regexp matching complete lines containing "error" or "warn" (case insensitive) with golang

问题

我想要打印出包含警告(WARN)或错误(ERROR)(不区分大小写)的每一行日志文件的完整行给用户。

给定以下内容:

[01-17|18:53:38.179] INFO server/server.go:381 this would be skipped
[01-17|18:53:38.280] INFO server/server.go:620 this also
[01-17|18:53:41.180] WARN server/server.go:388 Something is warned, so show this
[01-17|18:53:41.394] WARN server/server.go:188 Something reported an ->error<-
[01-17|18:53:41.395] ERROR server/server.go:191 Blabla
[01-17|18:53:41.395] DEBUG server/server.go:196 Obviously skipped
[01-17|18:53:41.395] DEBUG server/server.go:196 This debug contains an ->error<- so match this
[01-17|18:53:41.395] WARN server/server.go:198 You get the idea

我想要的结果是:

[01-17|18:53:41.180] WARN server/server.go:388 Something is warned, so show this
[01-17|18:53:41.394] WARN server/server.go:188 Something reported an ->error<-
[01-17|18:53:41.395] ERROR server/server.go:191 Blabla
[01-17|18:53:41.395] DEBUG server/server.go:196 This debug contains an ->error<- so match this
[01-17|18:53:41.395] WARN server/server.go:198 You get the idea

我最初尝试了以下代码:

errorRegEx := regexp.MustCompile(`(?is)error|warn`)

这只会打印出(可能与上面的示例不完全匹配):

WARN
error

然后我尝试修改一下,以便更好地匹配:

errorRegEx := regexp.MustCompile(`(?is).*error.*|.*warn.*`)

但是这根本没有打印出任何内容。

我该如何获取包含警告(WARN)或错误(ERROR)(不区分大小写)的完整行以及所有匹配的行呢?

PS:这个问题与建议的 https://stackoverflow.com/questions/31745545/regex-match-line-containing-string 不同,因为这是针对特定的 go 语言,它似乎没有使用完全相同的标准引擎。

英文:

I want to print the complete line from my log file to the user for every line containing WARN or ERROR (case insensitive).

Given this:

[01-17|18:53:38.179] INFO server/server.go:381 this would be skipped
[01-17|18:53:38.280] INFO server/server.go:620 this also
[01-17|18:53:41.180] WARN server/server.go:388 Something is warned, so show this
[01-17|18:53:41.394] WARN server/server.go:188 Something reported an ->error<-
[01-17|18:53:41.395] ERROR server/server.go:191 Blabla
[01-17|18:53:41.395] DEBUG server/server.go:196 Obviously skipped
[01-17|18:53:41.395] DEBUG server/server.go:196 This debug contains an ->error<- so match this
[01-17|18:53:41.395] WARN server/server.go:198 You get the idea

I want:

[01-17|18:53:41.180] WARN server/server.go:388 Something is warned, so show this
[01-17|18:53:41.394] WARN server/server.go:188 Something reported an ->error<-
[01-17|18:53:41.395] ERROR server/server.go:191 Blabla
[01-17|18:53:41.395] DEBUG server/server.go:196 This debug contains an ->error<- so match this
[01-17|18:53:41.395] WARN server/server.go:198 You get the idea

I naively started with

errorRegEx := regexp.MustCompile(`(?is)error|warn`)

Which would just print (from a different run, might not exactly match the above example)

WARN
error

Then I thought I'd change this to match a bit more:

errorRegEx := regexp.MustCompile(`(?is).*error.*|.*warn.*`)

But this didn't print anything at all

How can I get the complete line, and all lines, where either WARN or ERROR (case insensitive) would match?

PS: This is NOT the same question as the suggested https://stackoverflow.com/questions/31745545/regex-match-line-containing-string , as this is for the go language specifically which appears to not be using the exact same standard engine.

答案1

得分: 1

考虑到该问题已被标记为重复,并且下面有OP的评论。

这个问题被标记为重复,那个链接的帖子有很多答案,我们可以使用这些答案来尝试拼凑出OP问题的答案,但仍然不完全,因为这些答案似乎与PCRE相关,而Go使用RE2。

首先,我不明白为什么这没有为OP打印任何内容:

然后我想改成更匹配一点的:

errorRegEx := regexp.MustCompile(`(?is).*error.*|.*warn.*`)

但这根本没有打印任何内容,因为这应该打印所有内容

fmt.Println("Original regexp:")
reOriginal := regexp.MustCompile(`(?is).*error.*|.*warn.*`)
lines := reOriginal.FindAllString(logs, -1)

fmt.Println("match\t\tentry")
fmt.Println("=====\t\t=====")
for i, line := range lines {
    fmt.Printf("%d\t\t%q\n", i+1, line)
}

s标志在(?is)...中表示将换行符与点(.)匹配^1,因为你的星号(*)是贪婪的^2,所以如果找到"error"或"warn",它们将匹配整个字符串中的所有内容。

真正的解决方案就是不要用点匹配"\n",去掉s标志,你就能得到你想要的结果:

fmt.Println("Whole text:")
reWholeText := regexp.MustCompile(`(?i).*error.*|.*warn.*`)
lines = reWholeText.FindAllString(logs, -1)

fmt.Println("match\t\tentry")
fmt.Println("=====\t\t=====")
for i, line := range lines {
    fmt.Printf("%d\t\t%q\n", i+1, line)
}

现在我们在"\n"的实例之间进行匹配(实际上是行),因为我们使用的是All形式,它只找到_不重叠_的匹配:

如果存在'All',该例程将匹配整个表达式的连续非重叠匹配^3

我们得到了完整且不重复的行。

你可以稍微调整一下正则表达式:

`(?i).*(?:error|warn).*` // "在"error"或"warn"之前的任何内容和之后的任何内容(对于一行)"

(?:...)是一个_非捕获组_^1,因为你似乎不关心每个匹配中的"error"或"warn"的个别实例。

而且,我仍然想展示在尝试匹配之前按行拆分可以让你更好地控制/精确匹配,并且使正则表达式非常易于理解:

r := strings.NewReader(logs)
scanner := bufio.NewScanner(r)

fmt.Println("Line-by-line:")
reLine := regexp.MustCompile(`(?i)error|warn`)

fmt.Println("match\tline\tentry")
fmt.Println("=====\t====\t=====")

var matchNo, lineNo, match = 1, 1, ""
for scanner.Scan() {
    line := scanner.Text()
    match = reLine.FindString(line)
    if match != "" {
        fmt.Printf("%d\t%d\t%q\n", matchNo, lineNo, line)
        matchNo++
    }
    lineNo++
}

这三个示例都在这个 Playground中。

英文:

Taking into account the question has since been marked a dupe, and OP's comment below.

This question was flagged as a duplicate, and that linked post has a number of answers which we can use to try and piece together to make the answer to OP's question, but still not completely because those answers seem tied to PCRE and Go uses RE2.

var logs = `
[01-17|18:53:38.179] INFO server/server.go:381 this would be skipped
[01-17|18:53:38.280] INFO server/server.go:620 this also
[01-17|18:53:41.180] Warn server/server.go:388 Something is warned, so show this
[01-17|18:53:41.394] warn server/server.go:188 Something reported an ->error<-
[01-17|18:53:41.395] Error server/server.go:191 Blabla
[01-17|18:53:41.395] DEBUG server/server.go:196 Obviously skipped
[01-17|18:53:41.395] DEBUG server/server.go:196 This debug contains an ->error<- so match this
[01-17|18:53:41.395] WARN server/server.go:198 You get the idea
`

func init() {
	logs = strings.TrimSpace(logs)
}

First off, I don't understand why this didn't print anything for OP:

> Then I thought I'd change this to match a bit more:
>
> go
> errorRegEx := regexp.MustCompile(`(?is).*error.*|.*warn.*`)
>

>
> But this didn't print anything at all

because that should have printed everything:

fmt.Println("Original regexp:")
reOriginal := regexp.MustCompile(`(?is).*error.*|.*warn.*`)
lines := reOriginal.FindAllString(logs, -1)

fmt.Println("match\t\tentry")
fmt.Println("=====\t\t=====")
for i, line := range lines {
	fmt.Printf("%d\t\t%q\n", i+1, line)
}
Original regexp:
match           entry
=====           =====
1               "[01-17|18:53:38.179] INFO server/server.go:381 this would be skipped\n[01-17|18:53:38.280] INFO server/server.go:620 this also\n[01-17|18:53:41.180] Warn server/server.go:388 Something is warned, so show this\n[01-17|18:53:41.394] warn server/server.go:188 Something reported an ->error<-\n[01-17|18:53:41.395] Error server/server.go:191 Blabla\n[01-17|18:53:41.395] DEBUG server/server.go:196 Obviously skipped\n[01-17|18:53:41.395] DEBUG server/server.go:196 This debug contains an ->error<- so match this\n[01-17|18:53:41.395] WARN server/server.go:198 You get the idea"

The s flag in (?is)... means to match newline against the dot (.)^1, and because your stars (*) are greedy^2, they will match everything in the entire string if either "error" or "warn" are found.

The real solution is just to not match "\n" with the dot—get rid of the s flag and you get what you were aiming for:

fmt.Println("Whole text:")
reWholeText := regexp.MustCompile(`(?i).*error.*|.*warn.*`)
lines = reWholeText.FindAllString(logs, -1)

fmt.Println("match\t\tentry")
fmt.Println("=====\t\t=====")
for i, line := range lines {
	fmt.Printf("%d\t\t%q\n", i+1, line)
}
Whole text:
match           entry
=====           =====
1               "[01-17|18:53:41.180] Warn server/server.go:388 Something is warned, so show this"
2               "[01-17|18:53:41.394] warn server/server.go:188 Something reported an ->error<-"
3               "[01-17|18:53:41.395] Error server/server.go:191 Blabla"
4               "[01-17|18:53:41.395] DEBUG server/server.go:196 This debug contains an ->error<- so match this"
5               "[01-17|18:53:41.395] WARN server/server.go:198 You get the idea"

Now we're matching between instances of "\n" (effectively lines), and because we're using the All form which only finds non overlapping matches:

>If 'All' is present, the routine matches successive non-overlapping matches of the entire expression.^3

we get complete and distinct lines.

You could tighten that regexp up a bit:

`(?i).*(?:error|warn).*` // "anything before either "error" or "warn" and anything after (for a line)"

(?:...) is a non-capturing group^1 because you don't appear care about the individual instances of "error" or "warn" in each match.

And, I still want to show that splitting by line before trying to match gives you more control/precision, and makes the regexp very easy to reason about:

r := strings.NewReader(logs)
scanner := bufio.NewScanner(r)

fmt.Println("Line-by-line:")
reLine := regexp.MustCompile(`(?i)error|warn`)

fmt.Println("match\tline\tentry")
fmt.Println("=====\t====\t=====")

var matchNo, lineNo, match = 1, 1, ""
for scanner.Scan() {
	line := scanner.Text()
	match = reLine.FindString(line)
	if match != "" {
		fmt.Printf("%d\t%d\t%q\n", matchNo, lineNo, line)
		matchNo++
	}
	lineNo++
}
Line-by-line:
match   line    entry
=====   ====    =====
1       3       "[01-17|18:53:41.180] Warn server/server.go:388 Something is warned, so show this"
2       4       "[01-17|18:53:41.394] warn server/server.go:188 Something reported an ->error<-"
3       5       "[01-17|18:53:41.395] Error server/server.go:191 Blabla"
4       7       "[01-17|18:53:41.395] DEBUG server/server.go:196 This debug contains an ->error<- so match this"
5       8       "[01-17|18:53:41.395] WARN server/server.go:198 You get the idea"

All three examples are in this Playground.

答案2

得分: -2

在第一个空格之后查找ERROR和WARN标记:

 errorRegEx := regexp.MustCompile(`^[^ ]* (?:ERROR|WARN) .*`)
英文:

Look for the ERROR and WARN tokens after the first space on the line:

 errorRegEx := regexp.MustCompile(`^[^ ]* (?:ERROR|WARN) .*`)

huangapple
  • 本文由 发表于 2023年1月18日 08:03:22
  • 转载请务必保留本文链接:https://go.coder-hub.com/75153469.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定