在Pandas中用空格替换多个子字符串

huangapple go评论44阅读模式
英文:

Replace multiple substrings with blanks in Pandas

问题

我有一个情况,我想用空白替换字符串的一部分。例如,我的列看起来像这样:

user_comment
it was a good day but nothing in particular happened
nothing specific happening today
no comments. all ok
not much happening really, it will be fine

我想要的期望结果是:

user_comment_clean
it was a good day but happened
happening today
all ok
it will be fine

基本上,我想要删除字符串中的部分,如“nothing in particular”,“nothing specific”,“no comment”和“not much happening really”。

我正在使用以下代码来实现这一点:

def remove_no_comments(text):
   text = re.sub(r"^nothing in particular", ' ', text)
   text = re.sub(r"^nothing specific", ' ', text)
   text = re.sub(r"^no comment", ' ', text)
   text = re.sub(r"^not much happening really", ' ', text)
   text = text.lower()
   return text

df['user_comment_clean'] = df['user_comment_clean'].astype(str).apply(remove_no_comments)

但是在使用这个代码时,它会将我的其他用户输入变为NaN,我真的不确定我在这里做错了什么。有解决这个问题的可能解决方案吗?

英文:

I have a situation where I want to replace part of a string with blanks. For example, my columns looks something like this:

user_comment
it was a good day but nothing in particular happened
nothing specific happening today
no comments. all ok
not much happening really, it will be fine

and the desired outcome I want is:

user_comment_clean
it was a good day but happened
happening today
all ok
it will be fine

Essentially I would like to remove parts of strings as shown above such as "nothing in particular" , nothing specific" , "no comment" and "not much happening really"

and I am using the following code to achieve this:

def remove_no_comments(text):
   text = re.sub(r"^nothing in particular", ' ', text)
   text = re.sub(r"^nothing specific", ' ', text)
   text = re.sub(r"^no comment", ' ', text)
   text = re.sub(r"^not much happening really", ' ', text)
   text = text.lower()
   return text
df['user_comments_clean] = df['user_comments_clean].astype(str).apply(remove_no_comments)

But while using this, it is making my other user inputs as nan and I am really not sure what I am doing wrong here. Any possible solutions to resolve this?

答案1

得分: 3

你可以使用str.replace()与正则表达式的交替:

terms = ["nothing in particular", "nothing specific", "no comment", "not much happening really"]
regex = r'^(?:' + '|'.join(terms) + r')\b\s*'
df["user_comment_clean"] = df["user_comment"].str.replace(regex, '', regex=True)
英文:

You could use str.replace() along with a regex alternation:

<!-- language: python -->

terms = [&quot;nothing in particular&quot;, &quot;nothing specific&quot;, &quot;no comment&quot;, &quot;not much happening really&quot;]
regex = r&#39;^(?:&#39; + r&#39;|&#39;.join(terms) + r&#39;)\b\s*&#39;
df[&quot;user_comment_clean&quot;] = df[&quot;user_comment&quot;].str.replace(regex, &#39;&#39;, regex=True)

huangapple
  • 本文由 发表于 2023年2月16日 07:21:49
  • 转载请务必保留本文链接:https://go.coder-hub.com/75466340.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定