英文:
Trying to understand the difference in what matches and the resulting output for findall vs finditer
问题
- 使用findall:
import re
target_string = "please sir, that's obviously a clip-on."
result = re.findall(r"[a-z]+('[a-z])?[a-z]*", target_string)
print(result)
# result: ['', '', "'s", '', '', '', '']
- 使用finditer:
import re
target_string = "please sir, that's obviously a clip-on."
result = re.finditer(r"[a-z]+('[a-z])?[a-z]*", target_string)
matched = []
for match_obj in result:
matched.append(match_obj.group())
print(matched)
# result: ['please', 'sir', "that's", 'obviously', 'a', 'clip', 'on']
这两种方法如何匹配模式以及为什么会有不同的输出结果呢?请解释。
这两种方法都使用正则表达式来匹配文本中的模式,但它们有不同的工作方式和输出结果:
-
findall
方法返回一个包含所有匹配结果的列表。在这个例子中,正则表达式[a-z]+('[a-z])?[a-z]*
用来匹配目标字符串中的单词。然而,由于正则表达式中使用了括号,导致结果中包含了子匹配,即带有单引号的字符。这些子匹配会被包含在结果列表中,即''
和"'s"
。 -
finditer
方法返回一个迭代器,你需要迭代它来访问匹配对象。在这个例子中,我们使用一个循环来迭代匹配对象,并使用match_obj.group()
来获取匹配的文本。由于我们只获取了匹配的文本而不是子匹配,因此结果中没有空字符串,只包含了匹配的单词。
因此,findall
返回一个包含所有匹配和子匹配的列表,而finditer
返回一个迭代器,你可以根据需要处理每个匹配对象并获取所需的匹配文本。
英文:
- Using findall:
import re
target_string = "please sir, that's obviously a clip-on."
result = re.findall(r"[a-z]+('[a-z])?[a-z]*", target_string)
print(result)
# result: ['', '', "'s", '', '', '', '']
- Using finditer:
import re
target_string ="please sir, that's obviously a clip-on."
result = re.finditer(r"[a-z]+('[a-z])?[a-z]*", target_string)
matched = []
for match_obj in result:
matched.append(match_obj.group())
print(matched)
# result: ['please', 'sir', "that's", 'obviously', 'a', 'clip', 'on']
How does these two methods match patterns and why is there a difference in resulting output. Please explain.
Tried to read the docs but still confused on the workings of findall vs finditer
答案1
得分: 0
在 findall
情况下,输出将是捕获组 ('[a-z])
。
如果您想要完全匹配,请将您的组转换为非捕获组 (?:'[a-z])
:
target_string = "please sir, that's obviously a clip-on."
result = re.findall(r"[a-z]+(?:'[a-z])?[a-z]*", target_string)
print(result)
输出:
['please', 'sir', "that's", 'obviously', 'a', 'clip', 'on']
请注意,如果您有多个捕获组,findall
将返回它们的元组:
re.findall(r"([a-z]+('[a-z])?[a-z]*)", target_string)
[('please', ''), ('sir', ''), ("that's", "'s"), ('obviously', ''), ('a', ''), ('clip', ''), ('on', '')]
英文:
In the findall
case, the output will be the capturing group ('[a-z])
.
If you want the full match transform your group into a non-capturing one (?:'[a-z])
:
target_string = "please sir, that's obviously a clip-on."
result = re.findall(r"[a-z]+(?:'[a-z])?[a-z]*", target_string)
print(result)
Output:
['please', 'sir', "that's", 'obviously', 'a', 'clip', 'on']
Note that if you have multiple capturing groups, findall
will return a tuple of them:
re.findall(r"([a-z]+('[a-z])?[a-z]*)", target_string)
[('please', ''), ('sir', ''), ("that's", "'s"), ('obviously', ''), ('a', ''), ('clip', ''), ('on', '')]
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论