英文:
How to simulate negative lookbehind in Go
问题
我正在尝试编写一个正则表达式来提取一个命令,以下是我目前使用负向回顾断言的表达式:
\b(?<![@#\/])\w.*
对于以下输入:
/msg @nickname #channel foo bar baz
/foo #channel @nickname foo bar baz
foo bar baz
每次都会提取出 foo bar baz
。可以在这个工作示例中查看:https://regex101.com/r/lF9aG7/3
然而,在Go语言中,这个表达式无法编译通过,会抛出以下错误:panic: regexp: Compile(`\b(?<![@#\/])\w.*`): error parsing regexp: invalid or unsupported Perl syntax: `(?<`
经过一些研究,我发现负向回顾断言在Go语言中不被支持,以确保O(n)的时间复杂度。
那么,如何重写这个正则表达式,以在没有负向回顾断言的情况下实现相同的功能呢?
英文:
I'm trying to write a regex that can extract a command, here's what I've got so far using a negative lookbehind assertion:
<!-- language-all: none -->
\b(?<![@#\/])\w.*
So with the input:
/msg @nickname #channel foo bar baz
/foo #channel @nickname foo bar baz
foo bar baz
foo bar baz
is extracted every time. See working example
https://regex101.com/r/lF9aG7/3
In Go however this doesn't compile http://play.golang.org/p/gkkVZgScS_
It throws:
panic: regexp: Compile(`\b(?<![@#\/])\w.*`): error parsing regexp: invalid or unsupported Perl syntax: `(?<`
I did a bit of research and realized negative lookbehinds are not supported in the language to guarantee O(n) time.
How can I rewrite this regex so that it does the same without negative lookbehind?
答案1
得分: 5
由于您的否定后向查找中只使用了一个简单的字符集,您可以将其替换为否定字符集:
\b[^@#/]\w.*
如果允许在字符串开头使用这些字符,则使用^
锚点:
(?:^|[^@#\/])\b\w.*
根据您问题中的Go playground链接中的示例,我认为您想要过滤掉所有以[#@/]
开头的单词。您可以使用一个filter
函数:
func Filter(vs []string, f func(string) bool) []string {
vsf := make([]string, 0)
for _, v := range vs {
if f(v) {
vsf = append(vsf, v)
}
}
return vsf
}
以及一个使用上述过滤器的Process
函数:
func Process(inp string) string {
t := strings.Split(inp, " ")
t = Filter(t, func(x string) bool {
return strings.Index(x, "#") != 0 &&
strings.Index(x, "@") != 0 &&
strings.Index(x, "/") != 0
})
return strings.Join(t, " ")
}
您可以在playground上看到它的运行效果,链接为http://play.golang.org/p/ntJRNxJTxo
英文:
Since in your negated lookbehind, you are only using a simple character set; you can replace it with a negated character-set:
\b[^@#/]\w.*
If the are allowed at the start of the string, then use the ^
anchor:
(?:^|[^@#\/])\b\w.*
Based on the samples in Go playground link in your question, I think you're looking to filter out all words beginning with a character from [#@/]
. You can use a filter
function:
func Filter(vs []string, f func(string) bool) []string {
vsf := make([]string, 0)
for _, v := range vs {
if f(v) {
vsf = append(vsf, v)
}
}
return vsf
}
and a Process
function, which makes use of the filter above:
func Process(inp string) string {
t := strings.Split(inp, " ")
t = Filter(t, func(x string) bool {
return strings.Index(x, "#") != 0 &&
strings.Index(x, "@") != 0 &&
strings.Index(x, "/") != 0
})
return strings.Join(t, " ")
}
It can be seen in action on playground at http://play.golang.org/p/ntJRNxJTxo
答案2
得分: 2
你实际上可以匹配前一个字符(或行的开头),并使用一个组来获取子表达式中所需的文本。
正则表达式
(?:^|[^@#/])\b(\w+)
(?:^|[^@#/])
匹配^
行的开头或[^@#/]
除了@#/
之外的任意字符\b
用于断言单词的开头(\w+)
生成一个子表达式,并匹配\w+
任意数量的单词字符
代码
<!-- language: lang-golang -->
cmds := []string{
`/msg @nickname #channel foo bar baz`,
`#channel @nickname foo bar baz /foo`,
`foo bar baz @nickname #channel`,
`foo bar baz#channel`}
regex := regexp.MustCompile(`(?:^|[^@#/])\b(\w+)`)
// 循环遍历所有 cmds
for _, cmd := range cmds{
// 查找所有匹配项和子表达式
matches := regex.FindAllStringSubmatch(cmd, -1)
fmt.Printf("'%v' \t==>\n", cmd)
// 循环遍历所有匹配项
for n, match := range matches {
// match[1] 保存第一个子表达式(第一组括号)匹配的文本
fmt.Printf("\t%v. '%v'\n", n, match[1])
}
}
输出
<!-- language: lang-none -->
'/msg @nickname #channel foo bar baz' ==>
0. 'foo'
1. 'bar'
2. 'baz'
'#channel @nickname foo bar baz /foo' ==>
0. 'foo'
1. 'bar'
2. 'baz'
'foo bar baz @nickname #channel' ==>
0. 'foo'
1. 'bar'
2. 'baz'
'foo bar baz#channel' ==>
0. 'foo'
1. 'bar'
2. 'baz'
在线演示
http://play.golang.org/p/AaX9Cg-7Vx
英文:
You can actually match the preceding character (or the beginning of line) and use a group to get the desired text in a subexpression.
Regex
(?:^|[^@#/])\b(\w+)
(?:^|[^@#/])
Matches either^
the beginning of line or[^@#/]
any character except@#/
\b
A word boundary to assert the beginning of a word(\w+)
Generates a subexpression- and matches
\w+
any number of word characters
Code
<!-- language: lang-golang -->
cmds := []string{
`/msg @nickname #channel foo bar baz`,
`#channel @nickname foo bar baz /foo`,
`foo bar baz @nickname #channel`,
`foo bar baz#channel`}
regex := regexp.MustCompile(`(?:^|[^@#/])\b(\w+)`)
// Loop all cmds
for _, cmd := range cmds{
// Find all matches and subexpressions
matches := regex.FindAllStringSubmatch(cmd, -1)
fmt.Printf("`%v` \t==>\n", cmd)
// Loop all matches
for n, match := range matches {
// match[1] holds the text matched by the first subexpression (1st set of parentheses)
fmt.Printf("\t%v. `%v`\n", n, match[1])
}
}
Output
<!-- language: lang-none -->
`/msg @nickname #channel foo bar baz` ==>
0. `foo`
1. `bar`
2. `baz`
`#channel @nickname foo bar baz /foo` ==>
0. `foo`
1. `bar`
2. `baz`
`foo bar baz @nickname #channel` ==>
0. `foo`
1. `bar`
2. `baz`
`foo bar baz#channel` ==>
0. `foo`
1. `bar`
2. `baz`
Playground
http://play.golang.org/p/AaX9Cg-7Vx
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论