在一个 while 循环中的赋值表达式

huangapple go评论58阅读模式
英文:

Assignment expression in a while loop

问题

m.end() 是指前一个匹配 m 的结束位置,因此搜索从前一个匹配的结束位置开始。我阅读了 PEP-572,其中有一节专门讨论了正则表达式匹配,但没有涉及到 while 循环。我理解 m 在循环体内是可用的,但这不是你提出的问题。

英文:
m = re.match(pattern, text)
while m := p.search(text, m.end()):
    # do with match

Does the m.end() refer to the previous match m so that the search starts from the end of the previous match? I read PEP-572, and there's a section that talks specifically about regex matches, but not with a while loop. I understand that m is available in the body of the loop, that's not the question here.

答案1

得分: 0

这段代码假设文本始终以匹配项开头,否则 while 语句会引发类型错误,因为 re.match 总是尝试从开头匹配,不像 re.search 会搜索整个字符串(当然,除非你使用锚点)。

所以是的,它基本上从上一次匹配位置的末尾开始不断搜索,找到每个后续的非重叠匹配。你可以像这样看到这个过程:

>>> import re
>>> text = "foo some other stuff foo again foo foofofoo"
>>> p = re.compile("foo")
>>> m = p.match(s) # 假定它总是从开头匹配
>>> while print(m) or (m:= p.search(s, m.end())):
...     pass
...
<re.Match object; span=(0, 3), match='foo'>
<re.Match object; span=(21, 24), match='foo'>
<re.Match object; span=(31, 34), match='foo'>
<re.Match object; span=(35, 38), match='foo'>
<re.Match object; span=(40, 43), match='foo'>

因此,while 循环的实际主体将只访问从第二个匹配项开始的每个匹配项。

请注意,在这种情况下,你可能只想使用 .finditer 和一个 for 循环:

>>> for m in p.finditer(text):
...     print(m)
...
<re.Match object; span=(0, 3), match='foo'>
<re.Match object; span=(21, 24), match='foo'>
<re.Match object; span=(31, 34), match='foo'>
<re.Match object; span=(35, 38), match='foo'>
<re.Match object; span=(40, 43), match='foo'>

除非主体在更新 m。请注意,在这种情况下,for 循环将访问第一个匹配项,与 while 循环的主体不同。如果你想跳过第一个匹配项,你可以这样做:

>>> it = p.finditer(text)
>>> next(it)
<re.Match object; span=(0, 3), match='foo'>
>>> for m in it:
...     print(m)
...
<re.Match object; span=(21, 24), match='foo'>
<re.Match object; span=(31, 34), match='foo'>
<re.Match object; span=(35, 38), match='foo'>
<re.Match object; span=(40, 43), match='foo'>
英文:

This code assumes the text will always start with a match, otherwise, the while cause would raise a type error, since re.match always tries to match at the beginning, unlike re.search which will search the whole string (unless of course you use an anchor).

So yes, it basically keeps searching from the end of the last matching position, finding each subsequent non-overlapping match. You can see this happen like this:

&gt;&gt;&gt; import re
&gt;&gt;&gt; text = &quot;foo some other stuff foo again foo foofofoo&quot;
&gt;&gt;&gt; p = re.compile(&quot;foo&quot;)
&gt;&gt;&gt; m = p.match(s) # assumes it always matches at the beginning
&gt;&gt;&gt; while print(m) or (m:= p.search(s, m.end())):
...     pass
...
&lt;re.Match object; span=(0, 3), match=&#39;foo&#39;&gt;
&lt;re.Match object; span=(21, 24), match=&#39;foo&#39;&gt;
&lt;re.Match object; span=(31, 34), match=&#39;foo&#39;&gt;
&lt;re.Match object; span=(35, 38), match=&#39;foo&#39;&gt;
&lt;re.Match object; span=(40, 43), match=&#39;foo&#39;&gt;

So the actual body of the while-loop will only have access to every match starting with the second match.

Note that in this case, you probably just want to use .finditer and a for-loop

&gt;&gt;&gt; for m in p.finditer(text):
...     print(m)
...
&lt;re.Match object; span=(0, 3), match=&#39;foo&#39;&gt;
&lt;re.Match object; span=(21, 24), match=&#39;foo&#39;&gt;
&lt;re.Match object; span=(31, 34), match=&#39;foo&#39;&gt;
&lt;re.Match object; span=(35, 38), match=&#39;foo&#39;&gt;
&lt;re.Match object; span=(40, 43), match=&#39;foo&#39;&gt;

Unless the body is updating m. Note, in this case, the for-loop would have access to the first match, unlike the body of the while loop. if you wanted to skip the first one, you could always do:

&gt;&gt;&gt; it = p.finditer(text)
&gt;&gt;&gt; next(it)
&lt;re.Match object; span=(0, 3), match=&#39;foo&#39;&gt;
&gt;&gt;&gt; for m in it:
...     print(m)
...
&lt;re.Match object; span=(21, 24), match=&#39;foo&#39;&gt;
&lt;re.Match object; span=(31, 34), match=&#39;foo&#39;&gt;
&lt;re.Match object; span=(35, 38), match=&#39;foo&#39;&gt;
&lt;re.Match object; span=(40, 43), match

答案2

得分: 0

OP在这里。我进行了一个小实验,证明了我的猜想。

def do(x):
    print(f'在do中的x = {x}')
    return (x + 1) if x < 5 else None
    
x = 0
while (x := do(x)) is not None:
    print(f'在while中的x = {x}')

输出:

在do中的x = 0
在while中的x = 1
在do中的x = 1
在while中的x = 2
在do中的x = 2
在while中的x = 3
在do中的x = 3
在while中的x = 4
在do中的x = 4
在while中的x = 5
在do中的x = 5

因此,对我的问题的答案是:“是的,在while循环中,即使在退出循环体后,变量的先前值仍然可用”。实际上,就像普通变量一样,赋值变量在定义点之后的函数中仍然处于作用域内。

英文:

OP here. I did a small experiment that proved what I was after.

def do(x):
    print(f&#39;x in do = {x}&#39;)
    return (x + 1) if x &lt; 5 else None
    
x = 0
while (x := do(x)) is not None:
    print(f&#39;x in while = {x}&#39;)

Output:

x in do = 0
x in while = 1
x in do = 1
x in while = 2
x in do = 2
x in while = 3
x in do = 3
x in while = 4
x in do = 4
x in while = 5
x in do = 5

Thus, the answer to my question is "yes, the previous value of the variable is available to the while loop even after the body is exited". Actually, just like a regular variable, the assignment variable is in scope for the rest of the function from the point it is defined.

huangapple
  • 本文由 发表于 2023年6月19日 13:02:36
  • 转载请务必保留本文链接:https://go.coder-hub.com/76503726.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定