理解`findall`与`finditer`的匹配方式和结果输出之间的差异。

huangapple go评论72阅读模式
英文:

Trying to understand the difference in what matches and the resulting output for findall vs finditer

问题

  1. 使用findall:
  1. import re
  2. target_string = "please sir, that's obviously a clip-on."
  3. result = re.findall(r"[a-z]+('[a-z])?[a-z]*", target_string)
  4. print(result)
  5. # result: ['', '', "'s", '', '', '', '']
  1. 使用finditer:
  1. import re
  2. target_string = "please sir, that's obviously a clip-on."
  3. result = re.finditer(r"[a-z]+('[a-z])?[a-z]*", target_string)
  4. matched = []
  5. for match_obj in result:
  6. matched.append(match_obj.group())
  7. print(matched)
  8. # result: ['please', 'sir', "that's", 'obviously', 'a', 'clip', 'on']

这两种方法如何匹配模式以及为什么会有不同的输出结果呢?请解释。

这两种方法都使用正则表达式来匹配文本中的模式,但它们有不同的工作方式和输出结果:

  • findall 方法返回一个包含所有匹配结果的列表。在这个例子中,正则表达式 [a-z]+('[a-z])?[a-z]* 用来匹配目标字符串中的单词。然而,由于正则表达式中使用了括号,导致结果中包含了子匹配,即带有单引号的字符。这些子匹配会被包含在结果列表中,即 ''"'s"

  • finditer 方法返回一个迭代器,你需要迭代它来访问匹配对象。在这个例子中,我们使用一个循环来迭代匹配对象,并使用 match_obj.group() 来获取匹配的文本。由于我们只获取了匹配的文本而不是子匹配,因此结果中没有空字符串,只包含了匹配的单词。

因此,findall 返回一个包含所有匹配和子匹配的列表,而finditer 返回一个迭代器,你可以根据需要处理每个匹配对象并获取所需的匹配文本。

英文:
  1. Using findall:
  1. import re
  2. target_string = "please sir, that's obviously a clip-on."
  3. result = re.findall(r"[a-z]+('[a-z])?[a-z]*", target_string)
  4. print(result)
  5. # result: ['', '', "'s", '', '', '', '']
  1. Using finditer:
  1. import re
  2. target_string ="please sir, that's obviously a clip-on."
  3. result = re.finditer(r"[a-z]+('[a-z])?[a-z]*", target_string)
  4. matched = []
  5. for match_obj in result:
  6. matched.append(match_obj.group())
  7. print(matched)
  8. # result: ['please', 'sir', "that's", 'obviously', 'a', 'clip', 'on']

How does these two methods match patterns and why is there a difference in resulting output. Please explain.

Tried to read the docs but still confused on the workings of findall vs finditer

答案1

得分: 0

findall 情况下,输出将是捕获组 ('[a-z])
如果您想要完全匹配,请将您的组转换为非捕获组 (?:'[a-z])

  1. target_string = "please sir, that's obviously a clip-on."
  2. result = re.findall(r"[a-z]+(?:'[a-z])?[a-z]*", target_string)
  3. print(result)

输出:

  1. ['please', 'sir', "that's", 'obviously', 'a', 'clip', 'on']

请注意,如果您有多个捕获组,findall 将返回它们的元组:

  1. re.findall(r"([a-z]+('[a-z])?[a-z]*)", target_string)
  2. [('please', ''), ('sir', ''), ("that's", "'s"), ('obviously', ''), ('a', ''), ('clip', ''), ('on', '')]
英文:

In the findall case, the output will be the capturing group ('[a-z]).
If you want the full match transform your group into a non-capturing one (?:'[a-z]):

  1. target_string = "please sir, that's obviously a clip-on."
  2. result = re.findall(r"[a-z]+(?:'[a-z])?[a-z]*", target_string)
  3. print(result)

Output:

  1. ['please', 'sir', "that's", 'obviously', 'a', 'clip', 'on']

Note that if you have multiple capturing groups, findall will return a tuple of them:

  1. re.findall(r"([a-z]+('[a-z])?[a-z]*)", target_string)
  2. [('please', ''), ('sir', ''), ("that's", "'s"), ('obviously', ''), ('a', ''), ('clip', ''), ('on', '')]

huangapple
  • 本文由 发表于 2023年2月10日 11:41:24
  • 转载请务必保留本文链接:https://go.coder-hub.com/75406711.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定