2015年5月18日 22:12:55go评论109阅读模式

英文:

Using positive-lookahead (?=regex) with re2

问题

由于我对re2还不太熟悉，我正在尝试弄清楚如何在Go中使用类似JS、C++或任何PCRE风格的正向先行断言 (?=regex)。

以下是我正在寻找的一些示例。

JS:

'foo bar baz'.match(/^[\s\S]+?(?=baz|$)/);

Python:

re.match('^[\s\S]+?(?=baz|$)', 'foo bar baz')

注意：这两个示例都匹配 foo bar

非常感谢。

英文:

Since I'm a bit new with re2, I'm trying to figure out how to use positive-lookahead (?=regex) like JS, C++ or any PCRE style in Go.

Here's some examples of what I'm looking for.

JS:

&#39;foo bar baz&#39;.match(/^[\s\S]+?(?=baz|$)/);

Python:

re.match(&#39;^[\s\S]+?(?=baz|$)&#39;, &#39;foo bar baz&#39;)

Note: both examples match 'foo bar '

Thanks a lot.

答案1

得分: 19

根据语法文档，不支持这个功能：

(?=re) 在匹配 re 之前的文本（不支持）

此外，根据WhyRE2：

作为原则，RE2不支持只能通过回溯解决的结构。因此，不支持反向引用和环视断言。

英文:

According to the Syntax Documentation, this feature isn't supported:

> (?=re) before text matching re (NOT SUPPORTED)

Also, from WhyRE2:

> As a matter of principle, RE2 does not support constructs for which only backtracking solutions are known to exist. Thus, backreferences and look-around assertions are not supported.

答案2

得分: 12

你可以使用一个更简单的正则表达式来实现这个：

re := regexp.MustCompile(`^(.+?)(?:baz)?$`)
sm := re.FindStringSubmatch("foo bar baz")
fmt.Printf("%q\n", sm)

sm[1] 将是你的匹配结果。Playground: http://play.golang.org/p/Vyah7cfBlH

英文:

You can achieve this with a simpler regexp:

re := regexp.MustCompile(`^(.+?)(?:baz)?$`)
sm := re.FindStringSubmatch(&quot;foo bar baz&quot;)
fmt.Printf(&quot;%q\n&quot;, sm)

sm[1] will be your match. Playground: http://play.golang.org/p/Vyah7cfBlH

答案3

得分: 0

在某些情况下，你想要匹配一个广泛的模式，但在正则表达式中排除特定的子字符串，你可以使用一种称为"逐步排除"的技术。

这种技术涉及通过逐个字符地细化正则表达式来排除特定的序列。

让我们来看一个例子。假设你想要匹配所有以"@google.com"结尾的电子邮件地址，但要排除特定的地址"noreply@google.com"。下面是使用逐步排除技术构建这样一个正则表达式的方法：

^(?i)([\w]{1,6}|[a-mo-z0-9_][\w]*|n[a-np-z0-9_][\w]*|no[a-qs-z0-9_][\w]*|nor[a-df-z0-9_][\w]*|nore[a-oq-z0-9_][\w]*|norep[a-km-z0-9_][\w]*|norepl[a-xz0-9_][\w]*)@google\.com

模式的分解

(?i)：这个标志使正则表达式不区分大小写。
[\w]{1,6}：这部分匹配任何包含较短但不完整的**noreply部分的电子邮件地址，例如no@google.com**。
[a-mo-z0-9_][\w]*：这部分匹配以任何字母数字字符或下划线（除了**n）开头，并以@google.com**结尾的电子邮件。
模式的每个后续部分（例如**n[a-np-z0-9_][\w]*，no[a-qs-z0-9_][\w]***等）都旨在逐步排除在相同序列中出现的"noreply"中的字符。
最后一部分**noreply[\w]*匹配以"noreply"开头，并在@google.com**之前有其他字符的地址。

英文:

In cases where you want to match a broad pattern, but exclude specific substrings purely in Regex you can use a technique called "Stepwise Exclusion"

This technique involves iteratively refining the regex to exclude specific sequences character by character.

Let's consider an example. Suppose you want to match all email addresses ending with "@google.com", but exclude the specific address "noreply@google.com". Here's how you would construct such a regex using the stepwise exclusion technique:

^(?i)([\w]{1,6}|[a-mo-z0-9_][\w]*|n[a-np-z0-9_][\w]*|no[a-qs-z0-9_][\w]*|nor[a-df-z0-9_][\w]*|nore[a-oq-z0-9_][\w]*|norep[a-km-z0-9_][\w]*|norepl[a-xz0-9_][\w]*)@google\.com

Breakdown of the Pattern

(?i): This flag makes the regex case insensitive.
[\w]{1,6}: This part matches any email address containing shorter but not complete parts of noreply such as no@google.com
[a-mo-z0-9_][\w]*: This part matches any email that starts with any alphanumeric character or underscore (except for n) and ends with @google.com.
Each subsequent part of the pattern (e.g., n[a-np-z0-9_][\w]*, no[a-qs-z0-9_][\w]*, etc.) is designed to progressively exclude the characters in "noreply" when they appear in the same sequence.
The last part, noreply[\w]*, matches addresses that start with 'noreply' and have additional characters before @google.com.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

使用正向先行断言 (?=regex) 与 re2

问题

答案1

答案2

答案3

在VS Code中创建一个正则表达式搜索。

在同一仓库中，从特定分支的 GitHub 上导入代码。

删除所有双引号前和双引号后的空格。

测试golang的Web应用程序查询参数的最佳实践

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。