如何在Go中模拟负向回顾后断言

huangapple go评论100阅读模式
英文:

How to simulate negative lookbehind in Go

问题

我正在尝试编写一个正则表达式来提取一个命令,以下是我目前使用负向回顾断言的表达式:

\b(?<![@#\/])\w.*

对于以下输入:

/msg @nickname #channel foo bar baz
/foo #channel @nickname foo bar baz 
foo bar baz

每次都会提取出 foo bar baz。可以在这个工作示例中查看:https://regex101.com/r/lF9aG7/3

然而,在Go语言中,这个表达式无法编译通过,会抛出以下错误:panic: regexp: Compile(`\b(?&lt;![@#\/])\w.*`): error parsing regexp: invalid or unsupported Perl syntax: `(?&lt;`

经过一些研究,我发现负向回顾断言在Go语言中不被支持,以确保O(n)的时间复杂度。

那么,如何重写这个正则表达式,以在没有负向回顾断言的情况下实现相同的功能呢?

英文:

I'm trying to write a regex that can extract a command, here's what I've got so far using a negative lookbehind assertion:

<!-- language-all: none -->

\b(?&lt;![@#\/])\w.*

So with the input:

/msg @nickname #channel foo bar baz
/foo #channel @nickname foo bar baz 
foo bar baz

foo bar baz is extracted every time. See working example
https://regex101.com/r/lF9aG7/3

In Go however this doesn't compile http://play.golang.org/p/gkkVZgScS_

It throws:

panic: regexp: Compile(`\b(?&lt;![@#\/])\w.*`): error parsing regexp: invalid or unsupported Perl syntax: `(?&lt;`

I did a bit of research and realized negative lookbehinds are not supported in the language to guarantee O(n) time.

How can I rewrite this regex so that it does the same without negative lookbehind?

答案1

得分: 5

由于您的否定后向查找中只使用了一个简单的字符集,您可以将其替换为否定字符集:

\b[^@#/]\w.*

如果允许在字符串开头使用这些字符,则使用^锚点:

(?:^|[^@#\/])\b\w.*

根据您问题中的Go playground链接中的示例,我认为您想要过滤掉所有以[#@/]开头的单词。您可以使用一个filter函数:

func Filter(vs []string, f func(string) bool) []string {
    vsf := make([]string, 0)
    for _, v := range vs {
        if f(v) {
            vsf = append(vsf, v)
        }
    }
    return vsf
}

以及一个使用上述过滤器的Process函数:

func Process(inp string) string {
    t := strings.Split(inp, " ")
    t = Filter(t, func(x string) bool {
        return strings.Index(x, "#") != 0 &&
            strings.Index(x, "@") != 0 &&
            strings.Index(x, "/") != 0
    })
    return strings.Join(t, " ")
}

您可以在playground上看到它的运行效果,链接为http://play.golang.org/p/ntJRNxJTxo

英文:

Since in your negated lookbehind, you are only using a simple character set; you can replace it with a negated character-set:

\b[^@#/]\w.*

If the are allowed at the start of the string, then use the ^ anchor:

(?:^|[^@#\/])\b\w.*

Based on the samples in Go playground link in your question, I think you're looking to filter out all words beginning with a character from [#@/]. You can use a filter function:

func Filter(vs []string, f func(string) bool) []string {
    vsf := make([]string, 0)
    for _, v := range vs {
        if f(v) {
            vsf = append(vsf, v)
        }
    }
    return vsf
}

and a Process function, which makes use of the filter above:

func Process(inp string) string {
	t := strings.Split(inp, &quot; &quot;)
	t = Filter(t, func(x string) bool {
		return strings.Index(x, &quot;#&quot;) != 0 &amp;&amp;
			strings.Index(x, &quot;@&quot;) != 0 &amp;&amp;
			strings.Index(x, &quot;/&quot;) != 0
	})
	return strings.Join(t, &quot; &quot;)
}

It can be seen in action on playground at http://play.golang.org/p/ntJRNxJTxo

答案2

得分: 2

你实际上可以匹配前一个字符(或行的开头),并使用一个组来获取子表达式中所需的文本。

正则表达式

(?:^|[^@#/])\b(\w+)
  • (?:^|[^@#/]) 匹配 ^ 行的开头或 [^@#/] 除了 @#/ 之外的任意字符
  • \b 用于断言单词的开头
  • (\w+) 生成一个子表达式,并匹配 \w+ 任意数量的单词字符

代码

<!-- language: lang-golang -->

cmds := []string{
	`/msg @nickname #channel foo bar baz`,
	`#channel @nickname foo bar baz /foo`,
	`foo bar baz @nickname #channel`,
	`foo bar baz#channel`}

regex := regexp.MustCompile(`(?:^|[^@#/])\b(\w+)`)


// 循环遍历所有 cmds
for _, cmd := range cmds{
	// 查找所有匹配项和子表达式
	matches := regex.FindAllStringSubmatch(cmd, -1)
	
	fmt.Printf("'%v' \t==>\n", cmd)
	
	// 循环遍历所有匹配项
	for n, match := range matches {
		// match[1] 保存第一个子表达式(第一组括号)匹配的文本
		fmt.Printf("\t%v. '%v'\n", n, match[1])
	}
}

输出

<!-- language: lang-none -->

'/msg @nickname #channel foo bar baz' 	==>
	0. 'foo'
	1. 'bar'
	2. 'baz'
'#channel @nickname foo bar baz /foo' 	==>
	0. 'foo'
	1. 'bar'
	2. 'baz'
'foo bar baz @nickname #channel' 	==>
	0. 'foo'
	1. 'bar'
	2. 'baz'
'foo bar baz#channel' 	==>
	0. 'foo'
	1. 'bar'
	2. 'baz'

在线演示
http://play.golang.org/p/AaX9Cg-7Vx

英文:

You can actually match the preceding character (or the beginning of line) and use a group to get the desired text in a subexpression.

Regex

(?:^|[^@#/])\b(\w+)
  • (?:^|[^@#/]) Matches either ^ the beginning of line or [^@#/] any character except @#/
  • \b A word boundary to assert the beginning of a word
  • (\w+) Generates a subexpression
  • and matches \w+ any number of word characters

Code

<!-- language: lang-golang -->

cmds := []string{
	`/msg @nickname #channel foo bar baz`,
	`#channel @nickname foo bar baz /foo`,
	`foo bar baz @nickname #channel`,
	`foo bar baz#channel`}

regex := regexp.MustCompile(`(?:^|[^@#/])\b(\w+)`)


// Loop all cmds
for _, cmd := range cmds{
	// Find all matches and subexpressions
	matches := regex.FindAllStringSubmatch(cmd, -1)
	
	fmt.Printf(&quot;`%v` \t==&gt;\n&quot;, cmd)
	
	// Loop all matches
	for n, match := range matches {
		// match[1] holds the text matched by the first subexpression (1st set of parentheses)
		fmt.Printf(&quot;\t%v. `%v`\n&quot;, n, match[1])
	}
}

Output

<!-- language: lang-none -->

`/msg @nickname #channel foo bar baz` 	==&gt;
	0. `foo`
	1. `bar`
	2. `baz`
`#channel @nickname foo bar baz /foo` 	==&gt;
	0. `foo`
	1. `bar`
	2. `baz`
`foo bar baz @nickname #channel` 	==&gt;
	0. `foo`
	1. `bar`
	2. `baz`
`foo bar baz#channel` 	==&gt;
	0. `foo`
	1. `bar`
	2. `baz`

Playground
http://play.golang.org/p/AaX9Cg-7Vx

huangapple
  • 本文由 发表于 2015年11月4日 14:18:44
  • 转载请务必保留本文链接:https://go.coder-hub.com/33514971.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定