英文:
Regular expression in python is not returning the desired result
问题
The code you provided is trying to remove a specific part of the input string using regular expressions. The issue you're facing is that the regular expression is not working as expected, and it's not producing the desired output. To achieve your expected outcome, you can adjust your regular expression as follows:
import re
txt = 'It was formerly known as A. Withey & Black Limited. Withey Limited delivers many things. It has a facility in the UK, including many branches.'
out = re.sub(r'It was formerly known as [^\.]*\. ', '', txt)
print(out)
This modified regular expression should give you the following expected outcome:
Withey Limited delivers many things. It has a facility in the UK, including many branches.
The issue with your original regular expression was that it included lookahead patterns that caused the unexpected behavior. The adjusted regular expression focuses on removing the part starting from "It was formerly known as" until the first occurrence of a period followed by a space.
英文:
Suppose that I have a string consisting of different sentences. I expect to remove the part that begins with It was formerly known as
until the end of this sentence. I want to stop cleaning until it reaches . Withey Limited
. If it is not the case, it ends cleaning until . It
.
import re
txt = 'It was formerly known as A. Withey & Black Limited. Withey Limited delivers many things. It has a facility in the UK, including many branches.'
out = re.sub("\s*It was formerly known as [\w\d\s@_!#$%^&*()<>?/\|}{~:\.]+" + "(?=(. Withey Limited |. It))","", txt)
This code returns . It has a facility in the UK, including many branches.'
which is not my expected outcome. My expected outcome is as follows:
Withey Limited delivers many things. It has a facility in the UK, including many branches.
How can I adjust my regular expression to reach this outcome? And why is it behaving like this?
答案1
得分: 2
使用 +?
以使匹配变为非贪婪。
out = re.sub(r"\s*It was formerly known as [\w\d\s@_!#$%^&*()<>?/\|}{~:.]+?\. " + "(?=(Withey Limited|It))", "", txt)
英文:
Use +?
to make the matching non-greedy.
out = re.sub(r"\s*It was formerly known as [\w\d\s@_!#$%^&*()<>?/\|}{~:\.]+?\. " + "(?=(Withey Limited|It))","", txt)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论