英文:
Match until character but, don't include that character
问题
我正在尝试匹配以下输入:
foo=bar baz foo:1 foo:234.mds32 notfoo:baz foo:bak foo:nospace foo:bar
并输出6个匹配项:除了notfoo
之外的所有内容。匹配项应该是像foo:bar
这样的(不包括前导或尾随空格)。
总的来说,我试图匹配的规则是:
- 查找任何键值对,其中键是
foo
,键值对由=
或:
分隔。 - 键值对之间是字符串分隔的。在键值对之间可能有多个空格或随机字符串。
- 由于上述原因,键值对必须在两侧有空格,或者在行的开头/结尾。
我目前使用的最佳正则表达式是'(?:\s|^)(?P<primary>foo[:=].+?)\s'
,然后提取primary
组。
这种方法的问题是,因为我们将\s
作为匹配的一部分,所以在处理重叠的正则表达式时会遇到问题:foo:bak foo:nospace foo:bar
被分割了,因为空格字符被匹配了两次,而且golang正则表达式不返回重叠的匹配项。
在其他正则表达式引擎中,我认为可以使用前瞻,但据我所知,golang正则表达式不允许这样做。
有没有办法实现这个目标?
Go Playground链接:https://play.golang.org/p/n8gnWwpiBSR
英文:
I am trying to match against inputs like:
foo=bar baz foo:1 foo:234.mds32 notfoo:baz foo:bak foo:nospace foo:bar
and output 6 matches: everything but the notfoo
. The matches should be like foo:bar
(ie not including trailing or leading spaces.
In general, the rules I am trying to match are:
- Find any kv pair, where the key is
foo
, and a kv pair is delimited by=
or:
. - Pairs are string split from each other. There may be multiple spaces, or random strings, inbetween kv pairs.
- As a result of ^, a kv pair must have a space, or line start/end on either side.
The current best regex I have for this is '(?:\s|^)(?P<primary>foo[:=].+?)\s'
, and then extracting the primary
group.
The problem with this is because we are including the \s
as part of the match, we run into issues with overlapping regex: the foo:bak foo:nospace foo:bar
is broken because we are attempt the whitespace character is matched 2x, and golang regex doesn't return overlapping matches.
In other regex engines I think lookahead can be used, but as far as I can tell this is not allowed with golang regex.
Is there any way to accomplish this?
Go playground link: https://play.golang.org/p/n8gnWwpiBSR
答案1
得分: 2
很遗憾,Go的regexp
库中没有支持lookaround的功能。因此,你可以通过加倍空格(例如使用regexp.MustCompile(
\s).ReplaceAllString(d, "$0$0")
)来绕过这个限制,然后使用(?:\s|^)(?P<primary>foo[:=]\S+(?:\s+[^:\s]+)*)(?:\s|$)
进行匹配:
package main
import (
"fmt"
"regexp"
)
func main() {
var d = `foo=bar baz foo:1 foo:234.mds32 notfoo:baz foo:bak foo:nospace foo:bar`
d = regexp.MustCompile(`\s`).ReplaceAllString(d, "$0$0")
r := regexp.MustCompile(`(?:\s|^)(?P<primary>foo[:=]\S+(?:\s+[^:\s]+)*)(?:\s|$)`)
idx := r.SubexpIndex("primary")
for _, m := range r.FindAllStringSubmatch(d, -1) {
fmt.Printf("%q\n", m[idx])
}
}
参见Go演示。输出结果为:
"foo=bar baz"
"foo:1"
"foo:234.mds32"
"foo:bak"
"foo:nospace"
"foo:bar"
详细说明:
(?:\s|^)
- 空格或字符串的开头(?P<primary>foo[:=]\S+(?:\s+[^:\s]+)*)
- "primary"组:foo
,冒号或等号字符,一个或多个非空格字符,然后零个或多个出现的一个或多个空格字符和一个或多个非空格字符(?:\s|$)
- 空格或字符串的结尾。
英文:
It is a pity there is no lookaround support in Go regexp
, thus, you can work around this limitation by doubling whitespaces (e.g. with regexp.MustCompile(
\s).ReplaceAllString(d, "$0$0")
) and then matching with (?:\s|^)(?P<primary>foo[:=]\S+(?:\s+[^:\s]+)*)(?:\s|$)
:
package main
import (
"fmt"
"regexp"
)
func main() {
var d = `foo=bar baz foo:1 foo:234.mds32 notfoo:baz foo:bak foo:nospace foo:bar`
d = regexp.MustCompile(`\s`).ReplaceAllString(d, "$0$0")
r := regexp.MustCompile(`(?:\s|^)(?P<primary>foo[:=]\S+(?:\s+[^:\s]+)*)(?:\s|$)`)
idx := r.SubexpIndex("primary")
for _, m := range r.FindAllStringSubmatch(d, -1) {
fmt.Printf("%q\n", m[idx])
}
}
See the Go demo. Output:
"foo=bar baz"
"foo:1"
"foo:234.mds32"
"foo:bak"
"foo:nospace"
"foo:bar"
Details:
(?:\s|^)
- a whitespace or start of string(?P<primary>foo[:=]\S+(?:\s+[^:\s]+)*)
- Group "primary":foo
, a colon or=
char, one or more non-whitespaces, and then zero or more occurrences of one or more whitespaces and then one or more chars other than a whitespace or colon(?:\s|$)
- a whitepace or end of string.
答案2
得分: 2
有几种方法可以采用:
-
只需将模式更改为
(?:\s|^)(?P<primary>foo[:=]\S+)
,如Wiktor Stribiżew在评论中提到的,而不是匹配.+?
直到\s
。这样可以解决问题,而无需进行其他操作,但我将列出几种可能适用于类似问题的更多选项,这些选项可能不容易被否定。 -
由于问题出在
FindAll
函数不允许重叠,那就不要使用它们!相反,自己编写代码,使用FindStringSubmatchIndex
来获取一个匹配的边界,通过切片字符串提取匹配的文本,然后执行d = d[endIndex-1:]
并循环直到FindStringSubmatchIndex
返回nil。 -
使用模式
\s+
将输入字符串拆分为以空格分隔的组件,然后只丢弃那些在^foo[:=]
上没有匹配的组件。你甚至可以使用strings.HasPrefix("foo:") || strings.HasPrefix("foo=")
。剩下的部分将是你想要的匹配项,并且它们周围的空格已经被拆分丢弃。在我看来,这个版本比尝试使用匹配更清晰地传达了意图。
英文:
There are several approaches you could take:
-
Just change your pattern to
(?:\s|^)(?P<primary>foo[:=]\S+)
as Wiktor Stribiżew mentions in a comment, instead of matching.+?
up to\s
. This solves the problem with no shenanigans, but I will list a few more options that might be applicable to similar problems that couldn't be so easily negated. -
Since the problem is with the
FindAll
functions not allowing the overlap, don't use them! Instead, roll your own, usingFindStringSubmatchIndex
to get the boundaries of one match, extract the matched text by slicing the string, then dod = d[endIndex-1:]
and loop untilFindStringSubmatchIndex
returns nil. -
Use
regexp.Split()
with a pattern of\s+
to break the input string into whitespace-separated components, then just discard the ones that don'tregexp.Match()
on^foo[:=]
. You could even usestrings.HasPrefix("foo:") || strings.HasPrefix("foo=")
instead. The remaining ones will be your desired matches, and the whitespace around them will have already been discarded by the split. In my opinion this version conveys intent more clearly than trying to use a match.
答案3
得分: 1
其他人已经给出了使用正则表达式的优秀答案,正如你所要求的那样。我是否可以大胆地提出一个非正则表达式的解决方案呢?
我发现在这种情况下,正则表达式并不是最好的解决方案。最好的方法是使用strings.Fields(original)
将字符串拆分为子字符串列表。对于每个字符串,根据它是否包含=
或:
或两者都不包含来进行拆分。Fields()
函数在解析时类似于awk
中的默认拆分,它会跳过连续的多个空格。
工作示例在这里:https://play.golang.org/p/xXaA9skdplz
original := `foo=bar baz foo:1 foo:234.mds32 notfoo:baz foo:bak foo:nospace foo:bar`
for _, item := range strings.Fields(original) {
if kv := strings.SplitN(item, "=", 2); len(kv) == 2 {
fmt.Printf("key/value: %q -> %q\n", kv[0], kv[1])
} else if kv := strings.SplitN(item, ":", 2); len(kv) == 2 {
fmt.Printf("key/value: %q -> %q\n", kv[0], kv[1])
} else {
fmt.Printf("key: %q\n", item)
}
}
显然,你需要修改这段代码以收集答案而不是打印它们。
如果你必须使用正则表达式,请使用其他答案中的方法。
英文:
Other people have given excellent answers using regular expressions as requested. Might I be so bold as to suggest a non-regex answer?
I find that regex's are not the best solution for this situation. It is better to split the string using strings.Fields(original)
to get a list of substrings. For each string, split it based on whether it has a =
or :
or neither. The Fields()
function does a great job of parsing similar to the default split in awk
, which skips multiple spaces in a row.
Working example here: https://play.golang.org/p/xXaA9skdplz
original := `foo=bar baz foo:1 foo:234.mds32 notfoo:baz foo:bak foo:nospace foo:bar`
for _, item := range strings.Fields(original) {
if kv := strings.SplitN(item, "=", 2); len(kv) == 2 {
fmt.Printf("key/value: %q -> %q\n", kv[0], kv[1])
} else if kv := strings.SplitN(item, ":", 2); len(kv) == 2 {
fmt.Printf("key/value: %q -> %q\n", kv[0], kv[1])
} else {
fmt.Printf("key: %q\n", item)
}
}
Obviously you'll need to modify this code to collect the answers rather than print them.
If you have to use regex's, then please use the other answers.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论