英文:
How to remove multiple consecutive sequences of consecutive duplicate characters in python
问题
我正在尝试为一个机器学习项目预处理一些推文,在处理两种类型的字符串时遇到了问题,例如
str1 = "coooool"
和 str2 = "gooooaaaaaal"
。
在去除重复字符后,我想要保留str1中的单词,即
cleaned_str1 = "cool"
,而 cleaned_str2 = "goal"
。
我尝试了一些我找到的方法,但是无法获得正确的输出。有人可以帮助我吗?提前谢谢。
英文:
I am trying to preprocess some tweets for an ML project where I am having troubles with two types of strings e.g.
str1 = "coooool"
and str2 = "gooooaaaaaal"
.
After removing repeated characters, I would like to maintain the word in str1, i.e.
cleaned_str1 = "cool"
while cleaned_str2 = "goal"
.
I tried a few approaches that I found but I couldn't get the right output. Could someone help me with this? Thank you in advance.
答案1
得分: 1
使用正则表达式:
re.sub(r"(\w)\1+(\w)\2+", r"\1\2", "goooaaaal") # -> goal
re.sub(r"(\w)\1+(\w)\2+", r"\1\2", "coooool") # -> cool
英文:
Use regular expressions:
re.sub(r"(\w)+(\w)+", r"", "goooaaaal") # -> goal
re.sub(r"(\w)+(\w)+", r"", "coooool") # -> cool
答案2
得分: 0
def removeDuplicates(S):
n = len(S)
j = 0
if (n < 2):
return
for i in range(n):
if (S[j] != S[i]):
j += 1
S[j] = S[i]
j += 1
S = S[:j]
return S
This code removes consecutive duplicates from a string. It was taken from Geeks for Geeks. The comment mentions that the program cannot intuitively know that "cool" needs two "o's" as in your example.
英文:
def removeDuplicates(S):
n = len(S)
j = 0
if (n < 2) :
return
for i in range(n):
if (S[j] != S[i]):
j += 1
S[j] = S[i]
j += 1
S = S[:j]
return S
This was taken directly from Geeks for Geeks.
There is no way for a program to intuitively know that "cool" needs two "o's" as in your example.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论