英文:
Replace multiple substrings with blanks in Pandas
问题
我有一个情况,我想用空白替换字符串的一部分。例如,我的列看起来像这样:
user_comment
it was a good day but nothing in particular happened
nothing specific happening today
no comments. all ok
not much happening really, it will be fine
我想要的期望结果是:
user_comment_clean
it was a good day but happened
happening today
all ok
it will be fine
基本上,我想要删除字符串中的部分,如“nothing in particular”,“nothing specific”,“no comment”和“not much happening really”。
我正在使用以下代码来实现这一点:
def remove_no_comments(text):
text = re.sub(r"^nothing in particular", ' ', text)
text = re.sub(r"^nothing specific", ' ', text)
text = re.sub(r"^no comment", ' ', text)
text = re.sub(r"^not much happening really", ' ', text)
text = text.lower()
return text
df['user_comment_clean'] = df['user_comment_clean'].astype(str).apply(remove_no_comments)
但是在使用这个代码时,它会将我的其他用户输入变为NaN,我真的不确定我在这里做错了什么。有解决这个问题的可能解决方案吗?
英文:
I have a situation where I want to replace part of a string with blanks. For example, my columns looks something like this:
user_comment
it was a good day but nothing in particular happened
nothing specific happening today
no comments. all ok
not much happening really, it will be fine
and the desired outcome I want is:
user_comment_clean
it was a good day but happened
happening today
all ok
it will be fine
Essentially I would like to remove parts of strings as shown above such as "nothing in particular" , nothing specific" , "no comment" and "not much happening really"
and I am using the following code to achieve this:
def remove_no_comments(text):
text = re.sub(r"^nothing in particular", ' ', text)
text = re.sub(r"^nothing specific", ' ', text)
text = re.sub(r"^no comment", ' ', text)
text = re.sub(r"^not much happening really", ' ', text)
text = text.lower()
return text
df['user_comments_clean] = df['user_comments_clean].astype(str).apply(remove_no_comments)
But while using this, it is making my other user inputs as nan and I am really not sure what I am doing wrong here. Any possible solutions to resolve this?
答案1
得分: 3
你可以使用str.replace()
与正则表达式的交替:
terms = ["nothing in particular", "nothing specific", "no comment", "not much happening really"]
regex = r'^(?:' + '|'.join(terms) + r')\b\s*'
df["user_comment_clean"] = df["user_comment"].str.replace(regex, '', regex=True)
英文:
You could use str.replace()
along with a regex alternation:
<!-- language: python -->
terms = ["nothing in particular", "nothing specific", "no comment", "not much happening really"]
regex = r'^(?:' + r'|'.join(terms) + r')\b\s*'
df["user_comment_clean"] = df["user_comment"].str.replace(regex, '', regex=True)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论