当正则表达式以`.*`开头时,`=~`的匹配结果可能不正确。

huangapple go评论85阅读模式
英文:

Regex match of =~ returns wrong result when starts with .*

问题

package main

import "fmt"
import "regexp"

func main() {
    var sep = "=~"
    var filter = "exported_pod=~.*grafana.*"
    matched, _ := regexp.MatchString(sep+`\b`, filter)
    fmt.Println(matched)
}

在上面的代码片段中,我试图在filter字符串中精确匹配=~,如果匹配成功则返回True

但是我无法理解为什么它返回false

如果filter字符串是"exported_pod=~grafana.*",它按预期工作,但如果是"exported_pod=~.*grafana.*",它就失败了。请帮我理解这里出了什么问题。


实际的问题是:

将字符串分割成=, =~, !=, !~中的任意一个分隔符。

在上面的示例中,结果应该是["exported_pod", ".*grafana.*"]

但是这个分割应该针对任意一个列出的分隔符进行。

英文:
package main

import "fmt"
import "regexp"

func main() {
	var sep = "=~"
	var filter = "exported_pod=~.*grafana.*"
	matched, _ := regexp.MatchString(sep+`\b`, filter)
	fmt.Println(matched)
}

In the above snippet, I'm trying to return True if =~ is exactly present in the filter string.

Unable to understand why it's returning false.

It works as expected if the filter string is "exported_pod=~grafana.*" whereas if it is "exported_pod=~.*grafana.*", it fails. Please help me in understanding what's wrong here.


The actual problem is:

Split the string around either =, =~, !=, !~.

In the example above, the result should be [ "exported_pod", ".*grafana.*" ].
But that split should happen for any one of the listed separators.

答案1

得分: 2

regex101

\b匹配,在不消耗任何字符的情况下,\w(a-z)匹配的字符和\w不匹配的字符之间的位置(无论顺序如何)。
它不能用于将非单词与单词分开。

因此,使用\b将不起作用(无论正则表达式是否适用于此案例)。

要简单地测试字符串是否包含=〜(如“如何在Go中检查字符串是否包含子字符串”):

fmt.Println(strings.Contains(filter, "=~")) // true

请参见此playground示例

package main

import (
	"fmt"
	"strings"
)

func main() {
	var sep = "=~"
	var filter = "exported_pod=~.*grafana.*"
	matched := strings.Contains(filter, sep)
	fmt.Println(matched)
}

如果您需要测试多个分隔符,那么正则表达式可以帮助:playground示例,使用此处测试的正则表达式

package main

import "fmt"
import "regexp"

func main() {
	var filter = "exported_pod=~.*grafana.*"
	matched, _ := regexp.MatchString(`[^=!~](=|=~|!=|!~)[^=!~]`, filter)
	fmt.Println(matched)
}

使用具有命名捕获组的正则表达式

[^=!~](?P<separator>=|=~|!=|!~)[^=!~]
       ^^^^^^^^^^^^^

您可以使用regexp.SubexpIndexGo 1.15+,2020年8月)提取该分隔符,并将其用于拆分原始字符串。
请参见此playground示例

package main

import "fmt"
import "regexp"
import "strings"

func main() {
	var filter = "exported_pod=~.*grafana.*"
	re := regexp.MustCompile(`[^=!~](?P<separator>=|=~|!=|!~)[^=!~]`)
	matches := re.FindStringSubmatch(filter)
	separator := matches[re.SubexpIndex("separator")]
	filtered := strings.Split(filter, separator)
	fmt.Println(filtered)
}

filtered是一个数组,包含=〜之前和之后的部分(由正则表达式检测到的分隔符)。

英文:

From regex101:

\b matches, without consuming any characters, immediately between a character matched by \w (a-z) and a character not matched by \w (in either order).
It cannot be used to separate non words from words.

So using \b would not work. (irrespective of the fact regexp might not be the best fit for this case)

To simply test if the string includes =~ (as in "How to check if a string contains a substring in Go")

fmt.Println(strings.Contains(filter, "=~")) // true

See this playground example.

package main

import (
	"fmt"
	"strings"
)

func main() {
	var sep = "=~"
	var filter = "exported_pod=~.*grafana.*"
	matched := strings.Contains(filter, sep)
	fmt.Println(matched)
}

If you need to test for more than one separator though, then yes, regex can help: playground example, with regex tested here.

package main

import "fmt"
import "regexp"

func main() {
	var filter = "exported_pod=~.*grafana.*"
	matched, _ := regexp.MatchString(`[^=!~](=|=~|!=|!~)[^=!~]`, filter)
	fmt.Println(matched)
}

Using a regexp with a named capture group:

[^=!~](?P<separator>=|=~|!=|!~)[^=!~]
       ^^^^^^^^^^^^^

You can extract that separator, using regexp.SubexpIndex (Go 1.15+, Aug. 2020), and use it to split your original string.
See this playground example:

package main

import "fmt"
import "regexp"
import "strings"

func main() {
	var filter = "exported_pod=~.*grafana.*"
	re := regexp.MustCompile(`[^=!~](?P<separator>=|=~|!=|!~)[^=!~]`)
	matches := re.FindStringSubmatch(filter)
	separator := matches[re.SubexpIndex("separator")]
	filtered := strings.Split(filter, separator)
	fmt.Println(filtered)
}

filtered is an array with parts before and after any =~ (the separator detected by the regexp).

huangapple
  • 本文由 发表于 2021年5月25日 13:25:08
  • 转载请务必保留本文链接:https://go.coder-hub.com/67682201.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定