正则表达式在Python中未返回所期望的结果。

huangapple go评论64阅读模式
英文:

Regular expression in python is not returning the desired result

问题

The code you provided is trying to remove a specific part of the input string using regular expressions. The issue you're facing is that the regular expression is not working as expected, and it's not producing the desired output. To achieve your expected outcome, you can adjust your regular expression as follows:

import re
txt = 'It was formerly known as A. Withey & Black Limited. Withey Limited delivers many things. It has a facility in the UK, including many branches.'
out = re.sub(r'It was formerly known as [^\.]*\. ', '', txt)

print(out)

This modified regular expression should give you the following expected outcome:

Withey Limited delivers many things. It has a facility in the UK, including many branches.

The issue with your original regular expression was that it included lookahead patterns that caused the unexpected behavior. The adjusted regular expression focuses on removing the part starting from "It was formerly known as" until the first occurrence of a period followed by a space.

英文:

Suppose that I have a string consisting of different sentences. I expect to remove the part that begins with It was formerly known as until the end of this sentence. I want to stop cleaning until it reaches . Withey Limited. If it is not the case, it ends cleaning until . It.

import re
txt = 'It was formerly known as A. Withey & Black Limited. Withey Limited delivers many things. It has a facility in the UK, including many branches.'
out = re.sub("\s*It was formerly known as [\w\d\s@_!#$%^&*()<>?/\|}{~:\.]+" + "(?=(. Withey Limited |. It))","", txt)

This code returns . It has a facility in the UK, including many branches.' which is not my expected outcome. My expected outcome is as follows:

Withey Limited delivers many things. It has a facility in the UK, including many branches.

How can I adjust my regular expression to reach this outcome? And why is it behaving like this?

答案1

得分: 2

使用 +? 以使匹配变为非贪婪。

out = re.sub(r"\s*It was formerly known as [\w\d\s@_!#$%^&*()<>?/\|}{~:.]+?\. " + "(?=(Withey Limited|It))", "", txt)
英文:

Use +? to make the matching non-greedy.

out = re.sub(r&quot;\s*It was formerly known as [\w\d\s@_!#$%^&amp;*()&lt;&gt;?/\|}{~:\.]+?\. &quot; + &quot;(?=(Withey Limited|It))&quot;,&quot;&quot;, txt)

huangapple
  • 本文由 发表于 2023年3月21日 02:08:59
  • 转载请务必保留本文链接:https://go.coder-hub.com/75793858.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定