2023年2月10日 11:41:24go评论72阅读模式

英文:

Trying to understand the difference in what matches and the resulting output for findall vs finditer

问题

使用findall:

import re
target_string = "please sir, that's obviously a clip-on."
result = re.findall(r"[a-z]+('[a-z])?[a-z]*", target_string)
print(result)
# result: ['', '', "'s", '', '', '', '']

使用finditer:

import re
target_string = "please sir, that's obviously a clip-on."
result = re.finditer(r"[a-z]+('[a-z])?[a-z]*", target_string)
matched = []
    
for match_obj in result:
    matched.append(match_obj.group())
print(matched)
    
# result: ['please', 'sir', "that's", 'obviously', 'a', 'clip', 'on']

这两种方法如何匹配模式以及为什么会有不同的输出结果呢？请解释。

这两种方法都使用正则表达式来匹配文本中的模式，但它们有不同的工作方式和输出结果：

findall 方法返回一个包含所有匹配结果的列表。在这个例子中，正则表达式 [a-z]+('[a-z])?[a-z]* 用来匹配目标字符串中的单词。然而，由于正则表达式中使用了括号，导致结果中包含了子匹配，即带有单引号的字符。这些子匹配会被包含在结果列表中，即 '' 和 "'s"。
finditer 方法返回一个迭代器，你需要迭代它来访问匹配对象。在这个例子中，我们使用一个循环来迭代匹配对象，并使用 match_obj.group() 来获取匹配的文本。由于我们只获取了匹配的文本而不是子匹配，因此结果中没有空字符串，只包含了匹配的单词。

因此，findall 返回一个包含所有匹配和子匹配的列表，而finditer 返回一个迭代器，你可以根据需要处理每个匹配对象并获取所需的匹配文本。

英文:

Using findall:

import re
target_string = &quot;please sir, that&#39;s obviously a clip-on.&quot;
result = re.findall(r&quot;[a-z]+(&#39;[a-z])?[a-z]*&quot;, target_string)
print(result)
# result: [&#39;&#39;, &#39;&#39;, &quot;&#39;s&quot;, &#39;&#39;, &#39;&#39;, &#39;&#39;, &#39;&#39;]

Using finditer:

import re
target_string =&quot;please sir, that&#39;s obviously a clip-on.&quot;
result = re.finditer(r&quot;[a-z]+(&#39;[a-z])?[a-z]*&quot;, target_string)
matched = []
    
for match_obj in result:
    matched.append(match_obj.group())
print(matched)
    
# result: [&#39;please&#39;, &#39;sir&#39;, &quot;that&#39;s&quot;, &#39;obviously&#39;, &#39;a&#39;, &#39;clip&#39;, &#39;on&#39;]

How does these two methods match patterns and why is there a difference in resulting output. Please explain.

Tried to read the docs but still confused on the workings of findall vs finditer

答案1

得分: 0

在 findall 情况下，输出将是捕获组 ('[a-z])。
如果您想要完全匹配，请将您的组转换为非捕获组 (?:'[a-z])：

target_string = &quot;please sir, that&#39;s obviously a clip-on.&quot;
result = re.findall(r&quot;[a-z]+(?:&#39;[a-z])?[a-z]*&quot;, target_string)
print(result)

输出：

[&#39;please&#39;, &#39;sir&#39;, &quot;that&#39;s&quot;, &#39;obviously&#39;, &#39;a&#39;, &#39;clip&#39;, &#39;on&#39;]

请注意，如果您有多个捕获组，findall 将返回它们的元组：

re.findall(r&quot;([a-z]+(&#39;[a-z])?[a-z]*)&quot;, target_string)
[(&#39;please&#39;, &#39;&#39;), (&#39;sir&#39;, &#39;&#39;), (&quot;that&#39;s&quot;, &quot;&#39;s&quot;), (&#39;obviously&#39;, &#39;&#39;), (&#39;a&#39;, &#39;&#39;), (&#39;clip&#39;, &#39;&#39;), (&#39;on&#39;, &#39;&#39;)]

英文:

In the findall case, the output will be the capturing group ('[a-z]).
If you want the full match transform your group into a non-capturing one (?:'[a-z]):

target_string = &quot;please sir, that&#39;s obviously a clip-on.&quot;
result = re.findall(r&quot;[a-z]+(?:&#39;[a-z])?[a-z]*&quot;, target_string)
print(result)

Output:

[&#39;please&#39;, &#39;sir&#39;, &quot;that&#39;s&quot;, &#39;obviously&#39;, &#39;a&#39;, &#39;clip&#39;, &#39;on&#39;]

Note that if you have multiple capturing groups, findall will return a tuple of them:

re.findall(r&quot;([a-z]+(&#39;[a-z])?[a-z]*)&quot;, target_string)
[(&#39;please&#39;, &#39;&#39;), (&#39;sir&#39;, &#39;&#39;), (&quot;that&#39;s&quot;, &quot;&#39;s&quot;), (&#39;obviously&#39;, &#39;&#39;), (&#39;a&#39;, &#39;&#39;), (&#39;clip&#39;, &#39;&#39;), (&#39;on&#39;, &#39;&#39;)]

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

理解`findall`与`finditer`的匹配方式和结果输出之间的差异。

问题

答案1

无法使其正常工作。无法按照我想要的方式获得中位数。

找到最后一列满足条件的列号。

Error: ImportError: 无法从’torchvision.models.vgg’导入’model_urls’。

Feature importance scores with GridSearchCV

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。