2015年11月4日 14:18:44go评论77阅读模式

英文:

How to simulate negative lookbehind in Go

问题

我正在尝试编写一个正则表达式来提取一个命令，以下是我目前使用负向回顾断言的表达式：

\b(?<![@#\/])\w.*

对于以下输入：

/msg @nickname #channel foo bar baz
/foo #channel @nickname foo bar baz 
foo bar baz

每次都会提取出 foo bar baz。可以在这个工作示例中查看：https://regex101.com/r/lF9aG7/3

然而，在Go语言中，这个表达式无法编译通过，会抛出以下错误：panic: regexp: Compile(`\b(?<![@#\/])\w.*`): error parsing regexp: invalid or unsupported Perl syntax: `(?<`

经过一些研究，我发现负向回顾断言在Go语言中不被支持，以确保O(n)的时间复杂度。

那么，如何重写这个正则表达式，以在没有负向回顾断言的情况下实现相同的功能呢？

英文:

I'm trying to write a regex that can extract a command, here's what I've got so far using a negative lookbehind assertion:

\b(?&lt;![@#\/])\w.*

So with the input:

/msg @nickname #channel foo bar baz
/foo #channel @nickname foo bar baz 
foo bar baz

foo bar baz is extracted every time. See working example
https://regex101.com/r/lF9aG7/3

In Go however this doesn't compile http://play.golang.org/p/gkkVZgScS_

It throws:

panic: regexp: Compile(`\b(?&lt;![@#\/])\w.*`): error parsing regexp: invalid or unsupported Perl syntax: `(?&lt;`

I did a bit of research and realized negative lookbehinds are not supported in the language to guarantee O(n) time.

How can I rewrite this regex so that it does the same without negative lookbehind?

答案1

得分: 5

由于您的否定后向查找中只使用了一个简单的字符集，您可以将其替换为否定字符集：

\b[^@#/]\w.*

如果允许在字符串开头使用这些字符，则使用^锚点：

(?:^|[^@#\/])\b\w.*

根据您问题中的Go playground链接中的示例，我认为您想要过滤掉所有以[#@/]开头的单词。您可以使用一个filter函数：

func Filter(vs []string, f func(string) bool) []string {
    vsf := make([]string, 0)
    for _, v := range vs {
        if f(v) {
            vsf = append(vsf, v)
        }
    }
    return vsf
}

以及一个使用上述过滤器的Process函数：

func Process(inp string) string {
    t := strings.Split(inp, " ")
    t = Filter(t, func(x string) bool {
        return strings.Index(x, "#") != 0 &&
            strings.Index(x, "@") != 0 &&
            strings.Index(x, "/") != 0
    })
    return strings.Join(t, " ")
}

您可以在playground上看到它的运行效果，链接为http://play.golang.org/p/ntJRNxJTxo

英文:

Since in your negated lookbehind, you are only using a simple character set; you can replace it with a negated character-set:

\b[^@#/]\w.*

If the are allowed at the start of the string, then use the ^ anchor:

(?:^|[^@#\/])\b\w.*

Based on the samples in Go playground link in your question, I think you're looking to filter out all words beginning with a character from [#@/]. You can use a filter function:

func Filter(vs []string, f func(string) bool) []string {
    vsf := make([]string, 0)
    for _, v := range vs {
        if f(v) {
            vsf = append(vsf, v)
        }
    }
    return vsf
}

and a Process function, which makes use of the filter above:

func Process(inp string) string {
	t := strings.Split(inp, &quot; &quot;)
	t = Filter(t, func(x string) bool {
		return strings.Index(x, &quot;#&quot;) != 0 &amp;&amp;
			strings.Index(x, &quot;@&quot;) != 0 &amp;&amp;
			strings.Index(x, &quot;/&quot;) != 0
	})
	return strings.Join(t, &quot; &quot;)
}

It can be seen in action on playground at http://play.golang.org/p/ntJRNxJTxo

答案2

得分: 2

你实际上可以匹配前一个字符（或行的开头），并使用一个组来获取子表达式中所需的文本。

正则表达式

(?:^|[^@#/])\b(\w+)

(?:^|[^@#/]) 匹配 ^ 行的开头或 [^@#/] 除了 @#/ 之外的任意字符
\b 用于断言单词的开头
(\w+) 生成一个子表达式，并匹配 \w+ 任意数量的单词字符

代码

cmds := []string{
	`/msg @nickname #channel foo bar baz`,
	`#channel @nickname foo bar baz /foo`,
	`foo bar baz @nickname #channel`,
	`foo bar baz#channel`}

regex := regexp.MustCompile(`(?:^|[^@#/])\b(\w+)`)


// 循环遍历所有 cmds
for _, cmd := range cmds{
	// 查找所有匹配项和子表达式
	matches := regex.FindAllStringSubmatch(cmd, -1)
	
	fmt.Printf("'%v' \t==>\n", cmd)
	
	// 循环遍历所有匹配项
	for n, match := range matches {
		// match[1] 保存第一个子表达式（第一组括号）匹配的文本
		fmt.Printf("\t%v. '%v'\n", n, match[1])
	}
}

输出

'/msg @nickname #channel foo bar baz' 	==>
	0. 'foo'
	1. 'bar'
	2. 'baz'
'#channel @nickname foo bar baz /foo' 	==>
	0. 'foo'
	1. 'bar'
	2. 'baz'
'foo bar baz @nickname #channel' 	==>
	0. 'foo'
	1. 'bar'
	2. 'baz'
'foo bar baz#channel' 	==>
	0. 'foo'
	1. 'bar'
	2. 'baz'

在线演示
http://play.golang.org/p/AaX9Cg-7Vx

英文:

You can actually match the preceding character (or the beginning of line) and use a group to get the desired text in a subexpression.

Regex

(?:^|[^@#/])\b(\w+)

(?:^|[^@#/]) Matches either ^ the beginning of line or [^@#/] any character except @#/
\b A word boundary to assert the beginning of a word
(\w+) Generates a subexpression
and matches \w+ any number of word characters

Code

cmds := []string{
	`/msg @nickname #channel foo bar baz`,
	`#channel @nickname foo bar baz /foo`,
	`foo bar baz @nickname #channel`,
	`foo bar baz#channel`}

regex := regexp.MustCompile(`(?:^|[^@#/])\b(\w+)`)


// Loop all cmds
for _, cmd := range cmds{
	// Find all matches and subexpressions
	matches := regex.FindAllStringSubmatch(cmd, -1)
	
	fmt.Printf(&quot;`%v` \t==&gt;\n&quot;, cmd)
	
	// Loop all matches
	for n, match := range matches {
		// match[1] holds the text matched by the first subexpression (1st set of parentheses)
		fmt.Printf(&quot;\t%v. `%v`\n&quot;, n, match[1])
	}
}

Output

`/msg @nickname #channel foo bar baz` 	==&gt;
	0. `foo`
	1. `bar`
	2. `baz`
`#channel @nickname foo bar baz /foo` 	==&gt;
	0. `foo`
	1. `bar`
	2. `baz`
`foo bar baz @nickname #channel` 	==&gt;
	0. `foo`
	1. `bar`
	2. `baz`
`foo bar baz#channel` 	==&gt;
	0. `foo`
	1. `bar`
	2. `baz`

Playground
http://play.golang.org/p/AaX9Cg-7Vx

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何在Go中模拟负向回顾后断言

问题

答案1

答案2

GAE datastore中的GetMulti()方法支持投影查询吗？

验证价格使用正则表达式

如何在tarball中覆盖一个文件

What's the meaning of interface{}?

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论