2023年7月17日 23:42:53go评论196阅读模式

英文:

Pandas dataframe how can I remove rows by comparing regex output of column A and B

问题

我有一个包含两列的数据框，两列都包含字符串。

我想要删除那些列B（Resources）的正则表达式匹配与列A（ServicePlan）中的正则表达式命名捕获组"termduration"不匹配的行。

数据：

                    ServicePlan                         Resources
0  Plan A (CSP COM BAS 1YR ANN)  Resource A (CSP COM BAS 1YR ANN)
1  Plan A (CSP COM BAS 1YR ANN)  Resource B (CSP COM BAS 1YR ANN)
2  Plan A (CSP COM BAS 1YR ANN)  Resource C (CSP COM BAS 6YR ANN)

我尝试了以下方法，但出现了类型错误。我在两个字符串之间比较正则表达式命名捕获组的结果时感到困难。

import pandas as pd
import re

e_name = r'(?P<name>.*)\((?P<product>[A-Z]{3})\s(?P<type>[A-Z]{3})\s?(?P<baseattach>BAS|ADD|ATT|SWS)?\s?(?P<telco_overusage>OVG)?\s?(?P<termduration>[A-Z0-9]{3})?\s?(?P<billing>[A-Z0-9]{3})\)$'
name_re = re.compile(e_name)

data = {'ServicePlan': ["Plan A (CSP COM BAS 1YR ANN)","Plan A (CSP COM BAS 1YR ANN)","Plan A (CSP COM BAS 1YR ANN)"],
        'Resources': ["Resource A (CSP COM BAS 1YR ANN)","Resource B (CSP COM BAS 1YR ANN)","Resource C (CSP COM BAS 6YR ANN)"]}

df = pd.DataFrame(data)
print(df)
df[~(name_re.findall(df['ServicePlan'].astype(str))[0]['termduration']).ne(name_re.findall(df['Resources'].astype(str))[0]['termduration'])]
print(df)

请注意，我只翻译了代码的注释部分，如有其他需要翻译的部分，请提供详细信息。

英文:

I have a dataframe with two columns , both columns contain strings

I want to delete rows where the regex match of column B (Resources) does not match column A (ServicePlan) for a named capture group "termduration" in the regex result.

Data:

                    ServicePlan                         Resources
0  Plan A (CSP COM BAS 1YR ANN)  Resource A (CSP COM BAS 1YR ANN)
1  Plan A (CSP COM BAS 1YR ANN)  Resource B (CSP COM BAS 1YR ANN)
2  Plan A (CSP COM BAS 1YR ANN)  Resource C (CSP COM BAS 6YR ANN)

I tried the following but I get a type error. I am struggling to compare the regex named capture group result between two strings.

import pandas as pd
import re
e_name = r&#39;(?P&lt;name&gt;.*)\((?P&lt;product&gt;[A-Z]{3})\s(?P&lt;type&gt;[A-Z]{3})\s?(?P&lt;baseattach&gt;BAS|ADD|ATT|SWS)?\s?(?P&lt;telco_overusage&gt;OVG)?\s?(?P&lt;termduration&gt;[A-Z0-9]{3})?\s?(?P&lt;billing&gt;[A-Z0-9]{3})\)$&#39;
name_re = re.compile(e_name)


data = {&#39;ServicePlan&#39;: [&quot;Plan A (CSP COM BAS 1YR ANN)&quot;,&quot;Plan A (CSP COM BAS 1YR ANN)&quot;,&quot;Plan A (CSP COM BAS 1YR ANN)&quot;],
        &#39;Resources&#39;: [&quot;Resource A (CSP COM BAS 1YR ANN)&quot;,&quot;Resource B (CSP COM BAS 1YR ANN)&quot;,&quot;Resource C (CSP COM BAS 6YR ANN)&quot;]}

df = pd.DataFrame(data)
print(df)
df[~(name_re.findall(df[&#39;ServicePlan&#39;].astype(str))[0][&#39;termduration&#39;]).ne(name_re.findall(df[&#39;Resource&#39;].astype(str))[0][&#39;termduration&#39;])]
print(df)

答案1

得分: 1

使用 pandas.Series.str.extract：

df = df[df['ServicePlan'].str.extract(name_re, expand=False)['termduration']
        .eq(df['Resources'].str.extract(name_re, expand=False)['termduration'])]
print(df)

                       ServicePlan                         Resources
0  Plan A (CSP COM BAS 1YR ANN)  Resource A (CSP COM BAS 1YR ANN)
1  Plan A (CSP COM BAS 1YR ANN)  Resource B (CSP COM BAS 1YR ANN)

英文:

Use pandas.Series.str.extract:

df = df[df[&#39;ServicePlan&#39;].str.extract(name_re, expand=False)[&#39;termduration&#39;]
        .eq(df[&#39;Resources&#39;].str.extract(name_re, expand=False)[&#39;termduration&#39;])]
print(df)

                   ServicePlan                         Resources
0  Plan A (CSP COM BAS 1YR ANN)  Resource A (CSP COM BAS 1YR ANN)
1  Plan A (CSP COM BAS 1YR ANN)  Resource B (CSP COM BAS 1YR ANN)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Pandas数据框如何通过比较列A和B的正则表达式输出来删除行？

问题

答案1

如何使用 re.sub 替换字符 $？

如何在Ubuntu和WSL上将Python从3.10降级到3.9。

获取趋势线的方程（二次多项式）

使用特定字符未包围时，通过空格拆分字符串

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论