英文:
What does this combination of positive and negative lookahead do?
问题
最近我偶然发现了这个奇怪的正则表达式,它是正向预查和负向预查的组合,我无法理解它到底做了什么。请记住,这是一些Java正则表达式的语法。
这两个嵌套的预查是做什么的?这个能简化吗?
英文:
Recently I stumbled upon this weird REGEX, which is a combination of positive and negative lookahead and I can not wrap my head around what does really it do. Keep in mind this is some Java regex syntax.
(?=((?!\bword1\b|\bword2\b).)+?\s*?)
^^ ^^
What does those two nested lookaheads do? Can this be simplified?
答案1
得分: 0
.
匹配在非单词字符之间不是 "word1" 或 "word2" 的情况(可以简化为\bword1\b|\bword2\b
→\bword[12]\b
),这是负断言的含义,+?
表示至少有一个这样的.
,- 但实际上只有一个,因为量词是非贪婪的,后面跟着总是匹配的
\s*
。因此,可以省略+?
, - 在这个断言中的
\s*?
是没有意义的,因为它总是匹配的,不会消耗任何输入,并且后面没有跟任何东西, - 正向先行断言
(?=...)
表示该位置后面跟着任何字符(除了 "w"、"word" 等,如上所述)。
进一步简化会移除捕获组,这在特定情境下可能是必需的。
因此,简化后的正则表达式是 (?=((?!\bword[12]\b).))
。它会在输入的任何字符之前成功匹配,除非在非单词字符之间的 "word1" 或 "word2" 的开头。匹配会是空的,但第一个捕获组将包含接下来的字符。
英文:
.
matches if it is not "w" in "word1" or "word2" (can be simplified\bword1\b|\bword2\b
→\bword[12]\b
), between non-words. This is the meaning of the negative assertion,+?
means at least one such.
,- but actually only one, because the quantifier is non-greedy and is followed by
\s*
that always matches. Therefore+?
can be dropped, \s*?
in this assertion is meaningless, as it always matches, and consumes no input, and not followed by anything,- The positive lookahead assertion
(?=...)
here means that the position is followed by any character (except for "w" "word", etc. as is described above).
Further simplifications would remove group captures, which could be required in the context.
So, the simplified regex is (?=((?!\bword[12]\b).))
. It will succeed before any character of the input, except at the beginning of "word1" or "word2" between non-words. The match will be empty, but the first capture group will contain the following character.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论