英文:
How to remove punctuation within a string?
问题
我正在为我的pandas数据框做文本清洗。
这是在去除标点符号之前,从我的描述列中提取的字符串:
['dedicated', 'to', 'support', 'the', 'fast-paced', 'technology', 'lifestyle', 'needs', 'of', 'today', '’', 's', 'modern', 'society', '.', 'gadget', 'mix', 'have', 'the', 'benefit', 'of', '“', 'efficient', 'life', 'â€', 'tied', 'to', 'the', 'products', 'and', 'services', '.']
这是在我应用下面的代码之后字符串的样子:
['dedicated', 'to', 'support', 'the', 'fast-paced', 'technology', 'lifestyle', 'needs', 'of', 'today', '’', 's', 'modern', 'society', 'gadget', 'mix', 'have', 'the', 'benefit', 'of', '“', 'efficient', 'life', 'â€', 'tied', 'to', 'the', 'products', 'and', 'services', 'they', 'provide']
这是我的代码:
#去除标点符号
import string
punc=string.punctuation
updated_mall['Cleansed_description']=updated_mall['Cleansed_description'].apply(lambda x: [word for word in x if word not in punc])
updated_mall.head(105)
这段代码确实去除了标点符号,但是除了像"Fast-paced","...","restaurant/catering"这样的词。除此之外,在去除标点符号并将单词转换为小写后,像"Asia's"变成了'asia'和's'。
我被告知这只是检查整个字符串是否为标点符号,而不是检查字符串中的每个单词是否包含标点符号。
英文:
I am doing text cleaning for my pandas dataframe
This is a string from my description column before punctuation is removed:
['dedicated', 'to', 'support', 'the', 'fast-paced', 'technology',
'lifestyle', 'needs', 'of', 'today', '’', 's', 'modern', 'society',
'.', 'gadget', 'mix', 'have', 'the', 'benefit', 'of', '“',
'efficient', 'life', 'â€', 'tied', 'to', 'the', 'products', 'and',
'services', 'they', 'provide', '.']
This is how the string look like after i applied the code below:
['dedicated', 'to', 'support', 'the', 'fast-paced', 'technology',
'lifestyle', 'needs', 'of', 'today', '’', 's', 'modern', 'society',
'gadget', 'mix', 'have', 'the', 'benefit', 'of', '“', 'efficient',
'life', 'â€', 'tied', 'to', 'the', 'products', 'and', 'services',
'they', 'provide']
This is my code:
#removing punctuation
import string
punc=string.punctuation
updated_mall['Cleansed_description']=update_mall['Cleansed_description'].apply(lambdax: [word for word in x if word not in punc])
update_mall.head(105)
This code did remove punctuation except:
words like "Fast-paced","...","restaurant/catering".
Other than that,after punctuation removal and changing to lower casing words like Asia's became 'asia' and 's.
I was told that this only check an entire string if is a punctuation instead of checking every single word in a string for punctuation.
答案1
得分: 1
可以尝试使用正则表达式来运行以下代码:
import re
updated_mall['Cleansed_description'] = updated_mall['Cleansed_description'].apply(lambda x: [re.sub(r'[^\w\d\s]', ' ', word.lower()) for word in x])
update_mall.head(105)
英文:
Can you try the below code using regex
import re
updated_mall['Cleansed_description']=update_mall['Cleansed_description'].apply(lambda x: [re.sub(r'[^\w\d\s]', ' ', word.lower()) for word in x])
update_mall.head(105)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论