2020年1月7日 00:43:44go评论193阅读模式

英文:

How to remove multiple consecutive sequences of consecutive duplicate characters in python

问题

我正在尝试为一个机器学习项目预处理一些推文，在处理两种类型的字符串时遇到了问题，例如
str1 = "coooool" 和 str2 = "gooooaaaaaal"。
在去除重复字符后，我想要保留str1中的单词，即
cleaned_str1 = "cool"，而 cleaned_str2 = "goal"。

我尝试了一些我找到的方法，但是无法获得正确的输出。有人可以帮助我吗？提前谢谢。

英文:

I am trying to preprocess some tweets for an ML project where I am having troubles with two types of strings e.g.
str1 = "coooool" and str2 = "gooooaaaaaal".
After removing repeated characters, I would like to maintain the word in str1, i.e.
cleaned_str1 = "cool" while cleaned_str2 = "goal".

I tried a few approaches that I found but I couldn't get the right output. Could someone help me with this? Thank you in advance.

答案1

得分: 1

使用正则表达式：

re.sub(r"(\w)\1+(\w)\2+", r"\1\2", "goooaaaal") # -> goal
re.sub(r"(\w)\1+(\w)\2+", r"\1\2", "coooool") # -> cool

英文:

Use regular expressions:

re.sub(r&quot;(\w)\1+(\w)\2+&quot;, r&quot;\1\2&quot;, &quot;goooaaaal&quot;) # -&gt; goal
re.sub(r&quot;(\w)\1+(\w)\2+&quot;, r&quot;\1\2&quot;, &quot;coooool&quot;) # -&gt; cool

答案2

得分: 0

def removeDuplicates(S):
    n = len(S)
    j = 0
    
    if (n < 2):
        return
      
    for i in range(n):
        if (S[j] != S[i]):
            j += 1
            S[j] = S[i]
    j += 1
    S = S[:j]
    return S

This code removes consecutive duplicates from a string. It was taken from Geeks for Geeks. The comment mentions that the program cannot intuitively know that "cool" needs two "o's" as in your example.

英文:

def removeDuplicates(S):   
    n = len(S)  
    j = 0
        
    if (n &lt; 2) : 
        return
      
    for i in range(n):   
        if (S[j] != S[i]): 
            j += 1
            S[j] = S[i]  
    j += 1
    S = S[:j] 
    return S

This was taken directly from Geeks for Geeks.
There is no way for a program to intuitively know that "cool" needs two "o's" as in your example.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何在Python中移除多个连续的重复字符序列

问题

答案1

答案2

Python pandas，筛选时输出不佳。

如何根据条件计算 Pandas 数据框中特定值的出现次数及其所占比例？

selenium webdriver chrome 115 stopped working

在Python中替换文本文件中的反斜杠后跟双引号。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论