英文:
Replace a part of string in Pandas Data Column, replace doesn't work
问题
我尝试清理我的数据列,只取文本的一部分。不幸的是,我无法理解。
我尝试使用 pandas 系列中的 .replace 方法,但似乎没有起作用。
英文:
I have been trying to clean my data column by taking a part of the text out. Unfortunately cannot get my head around it.
I tried using the .replace method in pandas series, but that did not seem to have worked
df['Salary Estimate'].str.replace(' (Glassdoor est.)', '',regex=True)
0 $53K-$91K (Glassdoor est.)
1 $63K-$112K (Glassdoor est.)
2 $80K-$90K (Glassdoor est.)
3 $56K-$97K (Glassdoor est.)
4 $86K-$143K (Glassdoor est.)
...
922 -1
925 -1
928 $59K-$125K (Glassdoor est.)
945 $80K-$142K (Glassdoor est.)
948 $62K-$113K (Glassdoor est.)
Name: Salary Estimate, Length: 600, dtype: object
What I expected was
0 $53K-$91K
1 $63K-$112K
2 $80K-$90K
3 $56K-$97K
4 $86K-$143K
...
922 -1
925 -1
928 $59K-$125K
945 $80K-$142K
948 $62K-$113K
Name: Salary Estimate, Length: 600, dtype: object`
答案1
得分: 3
如果您启用正则表达式,必须转义正则表达式符号,如(
,)
或.
:
import re
>>> df['Salary Estimate'].str.replace(re.escape(r' (Glassdoor est.)'), '', regex=True)
0 $53K-$91K
1 $63K-$112K
2 $80K-$90K
3 $56K-$97K
4 $86K-$143K
Name: Salary Estimate, dtype: object
# 或者不导入re模块
>>> df['Salary Estimate'].str.replace(r' \(Glassdoor est\.\)', '', regex=True)
0 $53K-$91K
1 $63K-$112K
2 $80K-$90K
3 $56K-$97K
4 $86K-$143K
Name: Salary Estimate, dtype: object
您也可以提取数字:
>>> df['Salary Estimate'].str.extract(r'$(?P<min>\d+)K-$(?P<max>\d+)K')
min max
0 53 91
1 63 112
2 80 90
3 56 97
4 86 143
英文:
If you enable regex, you have to escape regex symbol like (
, )
or .
:
import re
>>> df['Salary Estimate'].str.replace(re.escape(r' (Glassdoor est.)'), '',regex=True)
0 $53K-$91K
1 $63K-$112K
2 $80K-$90K
3 $56K-$97K
4 $86K-$143K
Name: Salary Estimate, dtype: object
# Or without import re module
>>> df['Salary Estimate'].str.replace(r' \(Glassdoor est\.\)', '',regex=True)
0 $53K-$91K
1 $63K-$112K
2 $80K-$90K
3 $56K-$97K
4 $86K-$143K
Name: Salary Estimate, dtype: object
You can also extract numbers:
>>> df['Salary Estimate'].str.extract(r'$(?P<min>\d+)K-$(?P<max>\d+)K')
min max
0 53 91
1 63 112
2 80 90
3 56 97
4 86 143
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论