负查找在这个正则表达式中如何工作?

huangapple go评论64阅读模式
英文:

How does negative look up works in this regex

问题

import re

text = """
This is a line.
Short
Long line
<!-- Comment line -->
"""

pattern = r'''(?:^.{1,3}$|^.{4}(?<!--))'''

matches = re.findall(pattern, text, flags=re.MULTILINE)

print(matches)

使用 pattern = r'''(?:^.{1,3}$|^.{4}(?<!--))''' 时的输出:

['This', 'Shor', 'Long']

.{4}(?<!--)) 中,除了数字4之外的任何数字都会导致显示和匹配 <!--

英文:
import re

text = &quot;&quot;&quot;
This is a line.
Short
Long line
&lt;!-- Comment line --&gt;
&quot;&quot;&quot;

pattern = r&#39;&#39;&#39;(?:^.{1,3}$|^.{4}(?&lt;!&lt;!--))&#39;&#39;&#39;

matches = re.findall(pattern, text, flags=re.MULTILINE)

print(matches)

OUTPUT with pattern = r&#39;&#39;&#39;(?:^.{1,3}$|^.{4}(?&lt;!&lt;!--))&#39;&#39;&#39; :

[&#39;This&#39;, &#39;Shor&#39;, &#39;Long&#39;]

OUTPUT with pattern = r&#39;&#39;&#39;(?:^.{1,3}$|^.{3}(?&lt;!&lt;!--))&#39;&#39;&#39; :

[&#39;Thi&#39;, &#39;Sho&#39;, &#39;Lon&#39;, &#39;&lt;!-&#39;]

OUTPUT with pattern = r&#39;&#39;&#39;(?:^.{1,3}$|^.{5}(?&lt;!&lt;!--))&#39;&#39;&#39; :

[&#39;This &#39;, &#39;Short&#39;, &#39;Long &#39;, &#39;&lt;!-- &#39;]

Any number other than 4 in .{4}(?&lt;!&lt;!--)) causes to display and match <!-- . How?

答案1

得分: 1

以下是正则表达式模式的分解部分:

(
    ?: # 匹配其中之一
      ^.{1,3}$ # ...一个包含1到3个字符的行,可以是任意字符 (例如 "&quot;aaa&quot;")
      | # ...或者
      ^.{4} # ...从行的开头匹配4个任意字符
        (?&lt;! # 前提是这4个字符不是
            &lt;!-- # 这些字符
            )  
)

现在基本的模式已经分解完毕,我们可以看看变体:

r&#39;&#39;&#39;(?:^.{1,3}$|^.{3}(?&lt;!&lt;!--))&#39;&#39;&#39;

通过这个例子,我们可以看出它的第二部分不太合适-它在寻找三个字符,但不匹配一个四个字符的字符串(&quot;&lt;!--&quot;,这没有任何意义。这也是为什么输出中包含 &lt;!- 的原因- Python 正在寻找的是 &lt;!--,而不是 &lt;!-

r&#39;&#39;&#39;(?:^.{1,3}$|^.{5}(?&lt;!&lt;!--))&#39;&#39;&#39;

对于这个例子,情况与前一个例子相同,只是在这种情况下,它寻找一个包含5个字符的字符串,而不是3个字符。同样,&lt;!-- 会被找到,因为它不是 &lt;!--

希望这有所帮助!

英文:

Here is the regex pattern broken down:

(
    ?: # match either
      ^.{1,3}$ # ...a line of 1 to 3 characters, any characters (e.g. &quot;aaa&quot;)
      | # ...or
      ^.{4} # ...4 characters of any kind, from the start of a line
        (?&lt;! # # provided those 4 characters are not
            &lt;!-- # these ones
            )  
)

Now the basic pattern has been broken down, we can look at the variants:

r&#39;&#39;&#39;(?:^.{1,3}$|^.{3}(?&lt;!&lt;!--))&#39;&#39;&#39;

With this one, we can see that the second part of it doesn't work well- it's looking for three characters that don't match a four character string (&quot;&lt;!--&quot;, which doesn't make any sense. It's also why &lt;!- is part of the output- Python is looking for &lt;!--, not &lt;!-

r&#39;&#39;&#39;(?:^.{1,3}$|^.{5}(?&lt;!&lt;!--))&#39;&#39;&#39;

The same applies for this as for the previous example, except in this case, it's looking for a 5 character string, not a 3 character one. Once again, &lt;!-- is found because it is not &lt;!--.

Hope this helps!

huangapple
  • 本文由 发表于 2023年6月29日 17:04:51
  • 转载请务必保留本文链接:https://go.coder-hub.com/76579606.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定