这种正向和负向预查的组合是做什么的?

huangapple go评论74阅读模式
英文:

What does this combination of positive and negative lookahead do?

问题

最近我偶然发现了这个奇怪的正则表达式,它是正向预查和负向预查的组合,我无法理解它到底做了什么。请记住,这是一些Java正则表达式的语法。

这两个嵌套的预查是做什么的?这个能简化吗?

英文:

Recently I stumbled upon this weird REGEX, which is a combination of positive and negative lookahead and I can not wrap my head around what does really it do. Keep in mind this is some Java regex syntax.

(?=((?!\bword1\b|\bword2\b).)+?\s*?)
 ^^  ^^

What does those two nested lookaheads do? Can this be simplified?

答案1

得分: 0

  • . 匹配在非单词字符之间不是 "word1" 或 "word2" 的情况(可以简化为 \bword1\b|\bword2\b\bword[12]\b),这是负断言的含义,
  • +? 表示至少有一个这样的 .
  • 但实际上只有一个,因为量词是非贪婪的,后面跟着总是匹配的 \s*。因此,可以省略 +?
  • 在这个断言中的 \s*? 是没有意义的,因为它总是匹配的,不会消耗任何输入,并且后面没有跟任何东西,
  • 正向先行断言 (?=...) 表示该位置后面跟着任何字符(除了 "w"、"word" 等,如上所述)。

进一步简化会移除捕获组,这在特定情境下可能是必需的。

因此,简化后的正则表达式是 (?=((?!\bword[12]\b).))。它会在输入的任何字符之前成功匹配,除非在非单词字符之间的 "word1" 或 "word2" 的开头。匹配会是空的,但第一个捕获组将包含接下来的字符。

英文:
  • . matches if it is not "w" in "word1" or "word2" (can be simplified \bword1\b|\bword2\b\bword[12]\b), between non-words. This is the meaning of the negative assertion,
  • +? means at least one such .,
  • but actually only one, because the quantifier is non-greedy and is followed by \s* that always matches. Therefore+? can be dropped,
  • \s*? in this assertion is meaningless, as it always matches, and consumes no input, and not followed by anything,
  • The positive lookahead assertion (?=...) here means that the position is followed by any character (except for "w" "word", etc. as is described above).

Further simplifications would remove group captures, which could be required in the context.

So, the simplified regex is (?=((?!\bword[12]\b).)). It will succeed before any character of the input, except at the beginning of "word1" or "word2" between non-words. The match will be empty, but the first capture group will contain the following character.

https://regex101.com/r/O10c3u/1

huangapple
  • 本文由 发表于 2020年10月11日 20:12:18
  • 转载请务必保留本文链接:https://go.coder-hub.com/64303864.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定