理解`findall`与`finditer`的匹配方式和结果输出之间的差异。

huangapple go评论52阅读模式
英文:

Trying to understand the difference in what matches and the resulting output for findall vs finditer

问题

  1. 使用findall:
import re

target_string = "please sir, that's obviously a clip-on."

result = re.findall(r"[a-z]+('[a-z])?[a-z]*", target_string)

print(result)

# result: ['', '', "'s", '', '', '', '']
  1. 使用finditer:
import re

target_string = "please sir, that's obviously a clip-on."

result = re.finditer(r"[a-z]+('[a-z])?[a-z]*", target_string)
matched = []
    
for match_obj in result:
    matched.append(match_obj.group())

print(matched)
    
# result: ['please', 'sir', "that's", 'obviously', 'a', 'clip', 'on']

这两种方法如何匹配模式以及为什么会有不同的输出结果呢?请解释。

这两种方法都使用正则表达式来匹配文本中的模式,但它们有不同的工作方式和输出结果:

  • findall 方法返回一个包含所有匹配结果的列表。在这个例子中,正则表达式 [a-z]+('[a-z])?[a-z]* 用来匹配目标字符串中的单词。然而,由于正则表达式中使用了括号,导致结果中包含了子匹配,即带有单引号的字符。这些子匹配会被包含在结果列表中,即 ''"'s"

  • finditer 方法返回一个迭代器,你需要迭代它来访问匹配对象。在这个例子中,我们使用一个循环来迭代匹配对象,并使用 match_obj.group() 来获取匹配的文本。由于我们只获取了匹配的文本而不是子匹配,因此结果中没有空字符串,只包含了匹配的单词。

因此,findall 返回一个包含所有匹配和子匹配的列表,而finditer 返回一个迭代器,你可以根据需要处理每个匹配对象并获取所需的匹配文本。

英文:
  1. Using findall:
import re

target_string = "please sir, that's obviously a clip-on."

result = re.findall(r"[a-z]+('[a-z])?[a-z]*", target_string)

print(result)

# result: ['', '', "'s", '', '', '', '']
  1. Using finditer:
import re

target_string ="please sir, that's obviously a clip-on."

result = re.finditer(r"[a-z]+('[a-z])?[a-z]*", target_string)
matched = []
    
for match_obj in result:
    matched.append(match_obj.group())

print(matched)
    
# result: ['please', 'sir', "that's", 'obviously', 'a', 'clip', 'on']

How does these two methods match patterns and why is there a difference in resulting output. Please explain.

Tried to read the docs but still confused on the workings of findall vs finditer

答案1

得分: 0

findall 情况下,输出将是捕获组 ('[a-z])
如果您想要完全匹配,请将您的组转换为非捕获组 (?:'[a-z])

target_string = "please sir, that's obviously a clip-on."
result = re.findall(r"[a-z]+(?:'[a-z])?[a-z]*", target_string)
print(result)

输出:

['please', 'sir', "that's", 'obviously', 'a', 'clip', 'on']

请注意,如果您有多个捕获组,findall 将返回它们的元组:

re.findall(r"([a-z]+('[a-z])?[a-z]*)", target_string)

[('please', ''), ('sir', ''), ("that's", "'s"), ('obviously', ''), ('a', ''), ('clip', ''), ('on', '')]
英文:

In the findall case, the output will be the capturing group ('[a-z]).
If you want the full match transform your group into a non-capturing one (?:'[a-z]):

target_string = "please sir, that's obviously a clip-on."
result = re.findall(r"[a-z]+(?:'[a-z])?[a-z]*", target_string)
print(result)

Output:

['please', 'sir', "that's", 'obviously', 'a', 'clip', 'on']

Note that if you have multiple capturing groups, findall will return a tuple of them:

re.findall(r"([a-z]+('[a-z])?[a-z]*)", target_string)

[('please', ''), ('sir', ''), ("that's", "'s"), ('obviously', ''), ('a', ''), ('clip', ''), ('on', '')]

huangapple
  • 本文由 发表于 2023年2月10日 11:41:24
  • 转载请务必保留本文链接:https://go.coder-hub.com/75406711.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定