在Pandas数据列中替换字符串的一部分,替换不起作用。

huangapple go评论59阅读模式
英文:

Replace a part of string in Pandas Data Column, replace doesn't work

问题

我尝试清理我的数据列,只取文本的一部分。不幸的是,我无法理解。

我尝试使用 pandas 系列中的 .replace 方法,但似乎没有起作用。

英文:

I have been trying to clean my data column by taking a part of the text out. Unfortunately cannot get my head around it.

I tried using the .replace method in pandas series, but that did not seem to have worked

df['Salary Estimate'].str.replace(' (Glassdoor est.)', '',regex=True)


0       $53K-$91K (Glassdoor est.)
1      $63K-$112K (Glassdoor est.)
2       $80K-$90K (Glassdoor est.)
3       $56K-$97K (Glassdoor est.)
4      $86K-$143K (Glassdoor est.)
                  ...             
922                             -1
925                             -1
928    $59K-$125K (Glassdoor est.)
945    $80K-$142K (Glassdoor est.)
948    $62K-$113K (Glassdoor est.)
Name: Salary Estimate, Length: 600, dtype: object

What I expected was



0       $53K-$91K
1      $63K-$112K
2       $80K-$90K
3       $56K-$97K
4      $86K-$143K
                  ...             
922                             -1
925                             -1
928    $59K-$125K
945    $80K-$142K
948    $62K-$113K
Name: Salary Estimate, Length: 600, dtype: object`

答案1

得分: 3

如果您启用正则表达式,必须转义正则表达式符号,如().

import re

>>> df['Salary Estimate'].str.replace(re.escape(r' (Glassdoor est.)'), '', regex=True)
0     $53K-$91K
1    $63K-$112K
2     $80K-$90K
3     $56K-$97K
4    $86K-$143K
Name: Salary Estimate, dtype: object

# 或者不导入re模块
>>> df['Salary Estimate'].str.replace(r' \(Glassdoor est\.\)', '', regex=True)
0     $53K-$91K
1    $63K-$112K
2     $80K-$90K
3     $56K-$97K
4    $86K-$143K
Name: Salary Estimate, dtype: object

您也可以提取数字:

&gt;&gt;&gt; df['Salary Estimate'].str.extract(r'$(?P<min>\d+)K-$(?P<max>\d+)K')
  min  max
0  53   91
1  63  112
2  80   90
3  56   97
4  86  143
英文:

If you enable regex, you have to escape regex symbol like (, ) or .:

import re

&gt;&gt;&gt; df[&#39;Salary Estimate&#39;].str.replace(re.escape(r&#39; (Glassdoor est.)&#39;), &#39;&#39;,regex=True)
0     $53K-$91K
1    $63K-$112K
2     $80K-$90K
3     $56K-$97K
4    $86K-$143K
Name: Salary Estimate, dtype: object

# Or without import re module
&gt;&gt;&gt; df[&#39;Salary Estimate&#39;].str.replace(r&#39; \(Glassdoor est\.\)&#39;, &#39;&#39;,regex=True)
0     $53K-$91K
1    $63K-$112K
2     $80K-$90K
3     $56K-$97K
4    $86K-$143K
Name: Salary Estimate, dtype: object

You can also extract numbers:

&gt;&gt;&gt; df[&#39;Salary Estimate&#39;].str.extract(r&#39;$(?P&lt;min&gt;\d+)K-$(?P&lt;max&gt;\d+)K&#39;)
  min  max
0  53   91
1  63  112
2  80   90
3  56   97
4  86  143

huangapple
  • 本文由 发表于 2023年4月20日 01:17:59
  • 转载请务必保留本文链接:https://go.coder-hub.com/76057225.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定