正则表达式排除括号内的文本匹配

huangapple go评论125阅读模式
英文:

Regex exclude text between parenthesis from match

问题

给定以下文本:

{field1} == value1 && {field2} == value2  && ({field3} == value3 && {field4} == value4) && {field5} == value5

我正在尝试创建一个正则表达式,匹配该文本中的所有&&,但排除括号之间的部分(因此应忽略value3和field4之间的&&)。我已经能够使用以下正则表达式实现:(\&{2})(?![^\(]*\)) [这个正则表达式有效,并且实现了我需要的功能],但问题是我正在使用Golang,它不支持负向先行断言。有没有办法在不使用负向先行断言的情况下实现呢?括号不能嵌套。

基本上,我想通过&&进行分割,但忽略括号之间的部分,并得到类似以下的结果:

[&&, &&, &&]
[{field1} == value1, {field2} == value2, ({field3} == value3 && {field4} == value4), {field5} == value5]

谢谢!

英文:

given the following text:

{field1} == value1 && {field2} == value2  && ({field3} == value3 && {field4} == value4) && {field5} == value5

I'm trying to create a regex that match all the && on that text but excluding the ones between parentheses (so the && between value3 and field4 should be ignored). I've been able to do it with the following regex: (\&{2})(?![^\(]*\)) [this one works and it's doing what I need] but the problem is that I'm using Golang and doesn't support negative lookahead. There is any way to do it without negative lookahead? Parentheses can not be nested.

Basically I want to split by the && but ignoring the ones between parentheses and get something like:

[&&, &&, &&]
[{field1} == value1, {field2} == value2, ({field3} == value3 && {field4} == value4), {field5} == value5]

Thanks!

答案1

得分: 1

你可以使用一种技术,通过匹配但不捕获不需要的内容,同时匹配并捕获需要的内容(即保存到捕获组)。你可以使用以下正则表达式实现这一点。

([^]]*)|(&&)

它的意思是,“匹配括号内的字符串或者(|)匹配字符串&&并将其保存到捕获组1”。这个想法是在代码中忽略未捕获的匹配。

对于下面的字符串,匹配(用"m"表示)和捕获(用"c"表示)如下所示。

{f1} == v1 && {f2} == v2 && ({f3} == v3 && {f4} == v4) && {f5} == v5
mm mmmmmmmmmmmmmmmmmmmmmmmmmm mm
cc cc

演示

在字符串的开头({),既不匹配\([^\]]*\)也不匹配(&&),所以字符串指针向前移动一个字符到f。再次,没有匹配,所以字符串指针再向前移动一个字符到1。这样继续下去(没有匹配),直到达到第一个&\([^\]]*\)没有匹配,但(&&)匹配了。我们对这个匹配感兴趣,因为它被捕获(保存到捕获组1)。

然后,字符串指针每次向前移动一个字符,没有找到更多的匹配,直到达到(。此时,\([^\]]*\)匹配了({f3} == v3 && {f4} == v4),但由于它没有被捕获,我们不予考虑。这样继续下去,直到达到字符串的末尾。

或者,我们可以使用正则表达式

([^]]*)|&&

并检查每个匹配的第一个字符。如果它是(,我们将丢弃该匹配。

当找到每个&&的匹配时,我们可以(在代码中)用另一个(可能为空的)字符串替换它,获取它在字符串中的偏移量(用于某些目的),或者只是增加一个计数器来计算字符串中匹配的数量。当然,这取决于我们为什么要匹配这些字符串。

英文:

You might use a technique by which everything that isn't wanted is matched, but not captured, and everything that is wanted is matched and captured (i.e., saved to a capture group). You can do that with the following regular expression.

\([^\]]*\)|(&&)

which reads, "match a string enclosed in parentheses or (|) match the string "&&" and save it to capture group 1". The idea is to (in code) disregard matches that are not captured.

For the following string the matches (denoted by "m"s) and captures (denoted by "c"s) are shown.

{f1} == v1 && {f2} == v2  && ({f3} == v3 && {f4} == v4) && {f5} == v5
           mm                mmmmmmmmmmmmmmmmmmmmmmmmmm mm
           cc                                           cc

Demo

At the beginning of the string ({) neither \([^\]]*\) nor (&&) are matched, so the string pointer is advanced by one character to f. Again, there is no match so the string pointer is advance one more character, to 1. This continues (with no matches) until the first & is reached. \([^\]]*\) is not matched but (&&) is. We are interested in that match because it is captured (to capture group 1).

The string pointer is then moved one character ahead at a time, with no more matches found, until ( is reached. At that point, ({f3} == v3 && {f4} == v4) is matched by \([^\]]*\), but since it is not captured we pay no attention to it. This continues until the end of the string is reached.

Alternatively, we could use the regular expression

\([^\]]*\)|&&

and examine the first character of each match. If it is ( we discard the match from consideration.

When each match of && is found we might (in code) replace it with another (possibly empty) string, obtain its offset into the string (for some purpose) or simply increment a counter of the number of matches in the string. This of course depends on why we want to match those strings.

答案2

得分: 0

我不熟悉Golang,但是如果没有使用适当的语言解析器,你可以使用3个正则表达式来完成这个任务:

  1. 将括号内的所有&&替换为占位符字符。
  2. 根据&&进行分割。
  3. 在每个数组项中从占位符字符恢复&&
英文:

I am not familiar with Golang, but short of using a proper language parser you could do this with 3 regex:

  1. replace all && inside parenthesis with a placeholder char
  2. split on &&
  3. restore the && in each array item from the placeholder char

huangapple
  • 本文由 发表于 2023年3月11日 13:22:19
  • 转载请务必保留本文链接:https://go.coder-hub.com/75702957.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定