英文:
Pandas dataframe how can I remove rows by comparing regex output of column A and B
问题
我有一个包含两列的数据框,两列都包含字符串。
我想要删除那些列B(Resources)的正则表达式匹配与列A(ServicePlan)中的正则表达式命名捕获组"termduration"不匹配的行。
数据:
ServicePlan Resources
0 Plan A (CSP COM BAS 1YR ANN) Resource A (CSP COM BAS 1YR ANN)
1 Plan A (CSP COM BAS 1YR ANN) Resource B (CSP COM BAS 1YR ANN)
2 Plan A (CSP COM BAS 1YR ANN) Resource C (CSP COM BAS 6YR ANN)
我尝试了以下方法,但出现了类型错误。我在两个字符串之间比较正则表达式命名捕获组的结果时感到困难。
import pandas as pd
import re
e_name = r'(?P<name>.*)\((?P<product>[A-Z]{3})\s(?P<type>[A-Z]{3})\s?(?P<baseattach>BAS|ADD|ATT|SWS)?\s?(?P<telco_overusage>OVG)?\s?(?P<termduration>[A-Z0-9]{3})?\s?(?P<billing>[A-Z0-9]{3})\)$'
name_re = re.compile(e_name)
data = {'ServicePlan': ["Plan A (CSP COM BAS 1YR ANN)","Plan A (CSP COM BAS 1YR ANN)","Plan A (CSP COM BAS 1YR ANN)"],
'Resources': ["Resource A (CSP COM BAS 1YR ANN)","Resource B (CSP COM BAS 1YR ANN)","Resource C (CSP COM BAS 6YR ANN)"]}
df = pd.DataFrame(data)
print(df)
df[~(name_re.findall(df['ServicePlan'].astype(str))[0]['termduration']).ne(name_re.findall(df['Resources'].astype(str))[0]['termduration'])]
print(df)
请注意,我只翻译了代码的注释部分,如有其他需要翻译的部分,请提供详细信息。
英文:
I have a dataframe with two columns , both columns contain strings
I want to delete rows where the regex match of column B (Resources) does not match column A (ServicePlan) for a named capture group "termduration" in the regex result.
Data:
ServicePlan Resources
0 Plan A (CSP COM BAS 1YR ANN) Resource A (CSP COM BAS 1YR ANN)
1 Plan A (CSP COM BAS 1YR ANN) Resource B (CSP COM BAS 1YR ANN)
2 Plan A (CSP COM BAS 1YR ANN) Resource C (CSP COM BAS 6YR ANN)
I tried the following but I get a type error. I am struggling to compare the regex named capture group result between two strings.
import pandas as pd
import re
e_name = r'(?P<name>.*)\((?P<product>[A-Z]{3})\s(?P<type>[A-Z]{3})\s?(?P<baseattach>BAS|ADD|ATT|SWS)?\s?(?P<telco_overusage>OVG)?\s?(?P<termduration>[A-Z0-9]{3})?\s?(?P<billing>[A-Z0-9]{3})\)$'
name_re = re.compile(e_name)
data = {'ServicePlan': ["Plan A (CSP COM BAS 1YR ANN)","Plan A (CSP COM BAS 1YR ANN)","Plan A (CSP COM BAS 1YR ANN)"],
'Resources': ["Resource A (CSP COM BAS 1YR ANN)","Resource B (CSP COM BAS 1YR ANN)","Resource C (CSP COM BAS 6YR ANN)"]}
df = pd.DataFrame(data)
print(df)
df[~(name_re.findall(df['ServicePlan'].astype(str))[0]['termduration']).ne(name_re.findall(df['Resource'].astype(str))[0]['termduration'])]
print(df)
答案1
得分: 1
df = df[df['ServicePlan'].str.extract(name_re, expand=False)['termduration']
.eq(df['Resources'].str.extract(name_re, expand=False)['termduration'])]
print(df)
ServicePlan Resources
0 Plan A (CSP COM BAS 1YR ANN) Resource A (CSP COM BAS 1YR ANN)
1 Plan A (CSP COM BAS 1YR ANN) Resource B (CSP COM BAS 1YR ANN)
英文:
Use pandas.Series.str.extract
:
df = df[df['ServicePlan'].str.extract(name_re, expand=False)['termduration']
.eq(df['Resources'].str.extract(name_re, expand=False)['termduration'])]
print(df)
ServicePlan Resources
0 Plan A (CSP COM BAS 1YR ANN) Resource A (CSP COM BAS 1YR ANN)
1 Plan A (CSP COM BAS 1YR ANN) Resource B (CSP COM BAS 1YR ANN)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论