如何在Python中移除多个连续的重复字符序列

huangapple go评论89阅读模式
英文:

How to remove multiple consecutive sequences of consecutive duplicate characters in python

问题

我正在尝试为一个机器学习项目预处理一些推文,在处理两种类型的字符串时遇到了问题,例如
str1 = "coooool"str2 = "gooooaaaaaal"
在去除重复字符后,我想要保留str1中的单词,即
cleaned_str1 = "cool",而 cleaned_str2 = "goal"

我尝试了一些我找到的方法,但是无法获得正确的输出。有人可以帮助我吗?提前谢谢。

英文:

I am trying to preprocess some tweets for an ML project where I am having troubles with two types of strings e.g.
str1 = "coooool" and str2 = "gooooaaaaaal".
After removing repeated characters, I would like to maintain the word in str1, i.e.
cleaned_str1 = "cool" while cleaned_str2 = "goal".

I tried a few approaches that I found but I couldn't get the right output. Could someone help me with this? Thank you in advance.

答案1

得分: 1

使用正则表达式:

re.sub(r"(\w)\1+(\w)\2+", r"\1\2", "goooaaaal") # -> goal
re.sub(r"(\w)\1+(\w)\2+", r"\1\2", "coooool") # -> cool

英文:

Use regular expressions:

re.sub(r"(\w)+(\w)+", r"", "goooaaaal") # -> goal
re.sub(r"(\w)+(\w)+", r"", "coooool") # -> cool

答案2

得分: 0

def removeDuplicates(S):
    n = len(S)
    j = 0
    
    if (n < 2):
        return
      
    for i in range(n):
        if (S[j] != S[i]):
            j += 1
            S[j] = S[i]
    j += 1
    S = S[:j]
    return S

This code removes consecutive duplicates from a string. It was taken from Geeks for Geeks. The comment mentions that the program cannot intuitively know that "cool" needs two "o's" as in your example.

英文:
def removeDuplicates(S):   
    n = len(S)  
    j = 0
        
    if (n &lt; 2) : 
        return
      
    for i in range(n):   
        if (S[j] != S[i]): 
            j += 1
            S[j] = S[i]  
    j += 1
    S = S[:j] 
    return S 

This was taken directly from Geeks for Geeks.
There is no way for a program to intuitively know that "cool" needs two "o's" as in your example.

huangapple
  • 本文由 发表于 2020年1月7日 00:43:44
  • 转载请务必保留本文链接:https://go.coder-hub.com/59615878.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定