2023年2月16日 07:21:49go评论85阅读模式

英文:

Replace multiple substrings with blanks in Pandas

问题

我有一个情况，我想用空白替换字符串的一部分。例如，我的列看起来像这样：

user_comment
it was a good day but nothing in particular happened
nothing specific happening today
no comments. all ok
not much happening really, it will be fine

我想要的期望结果是：

user_comment_clean
it was a good day but happened
happening today
all ok
it will be fine

基本上，我想要删除字符串中的部分，如“nothing in particular”，“nothing specific”，“no comment”和“not much happening really”。

我正在使用以下代码来实现这一点：

def remove_no_comments(text):
   text = re.sub(r"^nothing in particular", ' ', text)
   text = re.sub(r"^nothing specific", ' ', text)
   text = re.sub(r"^no comment", ' ', text)
   text = re.sub(r"^not much happening really", ' ', text)
   text = text.lower()
   return text
df['user_comment_clean'] = df['user_comment_clean'].astype(str).apply(remove_no_comments)

但是在使用这个代码时，它会将我的其他用户输入变为NaN，我真的不确定我在这里做错了什么。有解决这个问题的可能解决方案吗？

英文:

I have a situation where I want to replace part of a string with blanks. For example, my columns looks something like this:

user_comment
it was a good day but nothing in particular happened
nothing specific happening today
no comments. all ok
not much happening really, it will be fine

and the desired outcome I want is:

user_comment_clean
it was a good day but happened
happening today
all ok
it will be fine

Essentially I would like to remove parts of strings as shown above such as "nothing in particular" , nothing specific" , "no comment" and "not much happening really"

and I am using the following code to achieve this:

def remove_no_comments(text):
   text = re.sub(r&quot;^nothing in particular&quot;, &#39; &#39;, text)
   text = re.sub(r&quot;^nothing specific&quot;, &#39; &#39;, text)
   text = re.sub(r&quot;^no comment&quot;, &#39; &#39;, text)
   text = re.sub(r&quot;^not much happening really&quot;, &#39; &#39;, text)
   text = text.lower()
   return text
df[&#39;user_comments_clean] = df[&#39;user_comments_clean].astype(str).apply(remove_no_comments)

But while using this, it is making my other user inputs as nan and I am really not sure what I am doing wrong here. Any possible solutions to resolve this?

答案1

得分: 3

你可以使用str.replace()与正则表达式的交替：

terms = ["nothing in particular", "nothing specific", "no comment", "not much happening really"]
regex = r'^(?:' + '|'.join(terms) + r')\b\s*'
df["user_comment_clean"] = df["user_comment"].str.replace(regex, '', regex=True)

英文:

You could use str.replace() along with a regex alternation:

terms = [&quot;nothing in particular&quot;, &quot;nothing specific&quot;, &quot;no comment&quot;, &quot;not much happening really&quot;]
regex = r&#39;^(?:&#39; + r&#39;|&#39;.join(terms) + r&#39;)\b\s*&#39;
df[&quot;user_comment_clean&quot;] = df[&quot;user_comment&quot;].str.replace(regex, &#39;&#39;, regex=True)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

在Pandas中用空格替换多个子字符串

问题

答案1

在Python中缓存任意函数和值

Go validator.v2在正则表达式中出现错误”unknown tag”。

更高效的方式匹配数据框中的数值是什么？

正则表达式，用于查找不匹配特定大小写的特定单词？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。