英文:
How does negative look up works in this regex
问题
import re
text = """
This is a line.
Short
Long line
<!-- Comment line -->
"""
pattern = r'''(?:^.{1,3}$|^.{4}(?<!--))'''
matches = re.findall(pattern, text, flags=re.MULTILINE)
print(matches)
使用 pattern = r'''(?:^.{1,3}$|^.{4}(?<!--))'''
时的输出:
['This', 'Shor', 'Long']
在 .{4}(?<!--))
中,除了数字4之外的任何数字都会导致显示和匹配 <!--
。
英文:
import re
text = """
This is a line.
Short
Long line
<!-- Comment line -->
"""
pattern = r'''(?:^.{1,3}$|^.{4}(?<!<!--))'''
matches = re.findall(pattern, text, flags=re.MULTILINE)
print(matches)
OUTPUT with pattern = r'''(?:^.{1,3}$|^.{4}(?<!<!--))'''
:
['This', 'Shor', 'Long']
OUTPUT with pattern = r'''(?:^.{1,3}$|^.{3}(?<!<!--))'''
:
['Thi', 'Sho', 'Lon', '<!-']
OUTPUT with pattern = r'''(?:^.{1,3}$|^.{5}(?<!<!--))'''
:
['This ', 'Short', 'Long ', '<!-- ']
Any number other than 4 in .{4}(?<!<!--))
causes to display and match <!-- . How?
答案1
得分: 1
以下是正则表达式模式的分解部分:
(
?: # 匹配其中之一
^.{1,3}$ # ...一个包含1到3个字符的行,可以是任意字符 (例如 ""aaa"")
| # ...或者
^.{4} # ...从行的开头匹配4个任意字符
(?<! # 前提是这4个字符不是
<!-- # 这些字符
)
)
现在基本的模式已经分解完毕,我们可以看看变体:
r'''(?:^.{1,3}$|^.{3}(?<!<!--))'''
通过这个例子,我们可以看出它的第二部分不太合适-它在寻找三个字符,但不匹配一个四个字符的字符串("<!--"
,这没有任何意义。这也是为什么输出中包含 <!-
的原因- Python 正在寻找的是 <!--
,而不是 <!-
。
r'''(?:^.{1,3}$|^.{5}(?<!<!--))'''
对于这个例子,情况与前一个例子相同,只是在这种情况下,它寻找一个包含5个字符的字符串,而不是3个字符。同样,<!--
会被找到,因为它不是 <!--
。
希望这有所帮助!
英文:
Here is the regex pattern broken down:
(
?: # match either
^.{1,3}$ # ...a line of 1 to 3 characters, any characters (e.g. "aaa")
| # ...or
^.{4} # ...4 characters of any kind, from the start of a line
(?<! # # provided those 4 characters are not
<!-- # these ones
)
)
Now the basic pattern has been broken down, we can look at the variants:
r'''(?:^.{1,3}$|^.{3}(?<!<!--))'''
With this one, we can see that the second part of it doesn't work well- it's looking for three characters that don't match a four character string ("<!--"
, which doesn't make any sense. It's also why <!-
is part of the output- Python is looking for <!--
, not <!-
r'''(?:^.{1,3}$|^.{5}(?<!<!--))'''
The same applies for this as for the previous example, except in this case, it's looking for a 5 character string, not a 3 character one. Once again, <!--
is found because it is not <!--
.
Hope this helps!
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论