使用正向先行断言 (?=regex) 与 re2

huangapple go评论85阅读模式
英文:

Using positive-lookahead (?=regex) with re2

问题

由于我对re2还不太熟悉,我正在尝试弄清楚如何在Go中使用类似JS、C++或任何PCRE风格的正向先行断言 (?=regex)

以下是我正在寻找的一些示例。

JS:

'foo bar baz'.match(/^[\s\S]+?(?=baz|$)/);

Python:

re.match('^[\s\S]+?(?=baz|$)', 'foo bar baz')
  • 注意:这两个示例都匹配 foo bar

非常感谢。

英文:

Since I'm a bit new with re2, I'm trying to figure out how to use positive-lookahead (?=regex) like JS, C++ or any PCRE style in Go.

Here's some examples of what I'm looking for.

JS:

'foo bar baz'.match(/^[\s\S]+?(?=baz|$)/);

Python:

re.match('^[\s\S]+?(?=baz|$)', 'foo bar baz')
  • Note: both examples match 'foo bar '

Thanks a lot.

答案1

得分: 19

根据语法文档,不支持这个功能:

(?=re) 在匹配 re 之前的文本(不支持)

此外,根据WhyRE2

作为原则,RE2不支持只能通过回溯解决的结构。因此,不支持反向引用和环视断言。

英文:

According to the Syntax Documentation, this feature isn't supported:

> (?=re) before text matching re (NOT SUPPORTED)

Also, from WhyRE2:

> As a matter of principle, RE2 does not support constructs for which only backtracking solutions are known to exist. Thus, backreferences and look-around assertions are not supported.

答案2

得分: 12

你可以使用一个更简单的正则表达式来实现这个:

re := regexp.MustCompile(`^(.+?)(?:baz)?$`)
sm := re.FindStringSubmatch("foo bar baz")
fmt.Printf("%q\n", sm)

sm[1] 将是你的匹配结果。Playground: http://play.golang.org/p/Vyah7cfBlH

英文:

You can achieve this with a simpler regexp:

re := regexp.MustCompile(`^(.+?)(?:baz)?$`)
sm := re.FindStringSubmatch("foo bar baz")
fmt.Printf("%q\n", sm)

sm[1] will be your match. Playground: http://play.golang.org/p/Vyah7cfBlH

答案3

得分: 0

在某些情况下,你想要匹配一个广泛的模式,但在正则表达式中排除特定的子字符串,你可以使用一种称为"逐步排除"的技术。

这种技术涉及通过逐个字符地细化正则表达式来排除特定的序列。

让我们来看一个例子。假设你想要匹配所有以"@google.com"结尾的电子邮件地址,但要排除特定的地址"noreply@google.com"。下面是使用逐步排除技术构建这样一个正则表达式的方法:

^(?i)([\w]{1,6}|[a-mo-z0-9_][\w]*|n[a-np-z0-9_][\w]*|no[a-qs-z0-9_][\w]*|nor[a-df-z0-9_][\w]*|nore[a-oq-z0-9_][\w]*|norep[a-km-z0-9_][\w]*|norepl[a-xz0-9_][\w]*)@google\.com

模式的分解

  1. (?i):这个标志使正则表达式不区分大小写。
  2. [\w]{1,6}:这部分匹配任何包含较短但不完整的**noreply部分的电子邮件地址,例如no@google.com**。
  3. [a-mo-z0-9_][\w]*:这部分匹配以任何字母数字字符或下划线(除了**n)开头,并以@google.com**结尾的电子邮件。
  4. 模式的每个后续部分(例如**n[a-np-z0-9_][\w]*no[a-qs-z0-9_][\w]***等)都旨在逐步排除在相同序列中出现的"noreply"中的字符。
  5. 最后一部分**noreply[\w]*匹配以"noreply"开头,并在@google.com**之前有其他字符的地址。
英文:

In cases where you want to match a broad pattern, but exclude specific substrings purely in Regex you can use a technique called "Stepwise Exclusion"

This technique involves iteratively refining the regex to exclude specific sequences character by character.

Let's consider an example. Suppose you want to match all email addresses ending with "@google.com", but exclude the specific address "noreply@google.com". Here's how you would construct such a regex using the stepwise exclusion technique:

^(?i)([\w]{1,6}|[a-mo-z0-9_][\w]*|n[a-np-z0-9_][\w]*|no[a-qs-z0-9_][\w]*|nor[a-df-z0-9_][\w]*|nore[a-oq-z0-9_][\w]*|norep[a-km-z0-9_][\w]*|norepl[a-xz0-9_][\w]*)@google\.com

Breakdown of the Pattern

  1. (?i): This flag makes the regex case insensitive.
  2. [\w]{1,6}: This part matches any email address containing shorter but not complete parts of noreply such as no@google.com
  3. [a-mo-z0-9_][\w]*: This part matches any email that starts with any alphanumeric character or underscore (except for n) and ends with @google.com.
  4. Each subsequent part of the pattern (e.g., n[a-np-z0-9_][\w]*, no[a-qs-z0-9_][\w]*, etc.) is designed to progressively exclude the characters in "noreply" when they appear in the same sequence.
  5. The last part, noreply[\w]*, matches addresses that start with 'noreply' and have additional characters before @google.com.

huangapple
  • 本文由 发表于 2015年5月18日 22:12:55
  • 转载请务必保留本文链接:https://go.coder-hub.com/30305542.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定