2023年2月10日 12:37:10go评论152阅读模式

英文:

How can I add a delimiter to my "findall" result when only one match is found for a given cell?

问题

你的输出看起来是这样的：

0 XXX-1000ZZZ-9999|ZZZ-8000
1 XXX-2000|PPP-2200
2 XXX-3000|XXX-1000

但我希望它看起来像这样：

0 XXX-1000|ZZZ-9999|ZZZ-8000
1 XXX-2000|PPP-2200
2 XXX-3000|XXX-1000

所以，我添加了上面的最后两行来尝试添加竖线字符。这不起作用，并且给我以下错误：ValueError: Series的真值是模棱两可的。使用a.empty、a.bool()、a.item()、a.any()或a.all()。

我知道这是因为程序期望一个布尔值，但我不知道如何解决它。

英文:

I'm trying to extract substrings containing equipment names from the cells in a dataframe. Because of the way the data was created, these substrings can be in any cell. I created this program which uses "findall" and some regex to create a list of all the equipment found in the cells in a given row.

The problem is, the output isn't exactly as I need it. For example, if "findall" matches only one substring in the cell, my script does not add a delimiter afterwards. When the program continues to the next column, it joins the first column match with the second column matches, without a delimiter between the results. And I need the delimiter so I can explode the list later on.

Here is the code:

import pandas as pd
# IMPORT FILE AND CREATE DATAFRAME
d = {&#39;Cause&#39;:[&#39;Consider checking XXX-1000 for deficiencies prior to train switch&#39;, &#39;XXX-2000 AND PPP-2200 to be taken out of service&#39;, &#39;Need to check XXX-3000 and potentially XXX-1000 for degradation&#39;], &#39;Mitigation&#39;:[&#39;ZZZ-9999 is dependent on ZZZ-8000&#39;, &#39;These equipment will be out of service in 2025, not applicable&#39;, &#39;No further comments&#39;]}
df = pd.DataFrame(data=d)
# Trying the findall technique
df[&#39;new_eq&#39;] = &quot;&quot;
for column in df.columns:
    df[&#39;equipment&#39;] =  df[&#39;equipment&#39;] + df[column].str.findall(r&#39;\s*(\w{3,}-\d{4}\D*?) &#39;).str.join(&#39;|&#39;)
    if df[&#39;equipment&#39;].str.contains(&#39;|&#39;) == False:
         df[&#39;equipment&#39;] += &#39;|&#39;

My output looks like this:

0   XXX-1000ZZZ-9999|ZZZ-8000
1   XXX-2000|PPP-2200
2   XXX-3000|XXX-1000

But I want it to look like this:

0   XXX-1000|ZZZ-9999|ZZZ-8000
1   XXX-2000|PPP-2200
2   XXX-3000|XXX-1000

So I added the last two lines of above to try to add the pipe character. It doesn't work and is giving me the following error: ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

I know this is because the program expects a boolean value but I can't figure out how to fix it.

答案1

得分: 0

我建议使用这个解决方案：

import pandas as pd

导入文件并创建数据框

d = {'Cause':['在切换火车之前，考虑检查XXX-1000是否存在缺陷', 'XXX-2000和PPP-2200需要停止使用', '需要检查XXX-3000，可能还有XXX-1000是否有退化'], 'Mitigation':['ZZZ-9999依赖于ZZZ-8000', '这些设备将在2025年停止使用，不适用', '没有进一步的评论']}

df = pd.DataFrame(data=d)

df['equipment'] = (df['Cause'] + ' ' + df['Mitigation']).str.findall(r'(\w{3,}-\d{4})').apply(lambda x: '|'.join(x))
df['equipment'] = df['equipment'].apply(lambda x: x.rstrip('|') if x.endswith('|') else x)

for i in df['equipment']:
print(i)


它会返回：

XXX-1000|ZZZ-9999|ZZZ-8000
XXX-2000|PPP-2200
XXX-3000|XXX-1000


或者简单地使用

df['equipment']


返回

0 XXX-1000|ZZZ-9999|ZZZ-8000
1 XXX-2000|PPP-2200
2 XXX-3000|XXX-1000
Name: equipment, dtype: object


<details>
<summary>英文:</summary>
I suggest this solution:

import pandas as pd

IMPORT FILE AND CREATE DATAFRAME

d = {'Cause':['Consider checking XXX-1000 for deficiencies prior to train switch', 'XXX-2000 AND PPP-2200 to be taken out of service', 'Need to check XXX-3000 and potentially XXX-1000 for degradation'], 'Mitigation':['ZZZ-9999 is dependent on ZZZ-8000', 'These equipment will be out of service in 2025, not applicable', 'No further comments']}

df = pd.DataFrame(data=d)

for i in df['equipment']:
print(i)


which returns:

XXX-1000|ZZZ-9999|ZZZ-8000
XXX-2000|PPP-2200
XXX-3000|XXX-1000


or simply

df['equipment]


giving

0 XXX-1000|ZZZ-9999|ZZZ-8000
1 XXX-2000|PPP-2200
2 XXX-3000|XXX-1000
Name: equipment, dtype: object


</details>

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何在仅找到一个匹配项时，为特定单元格的“findall”结果添加分隔符？

问题

答案1

导入文件并创建数据框

IMPORT FILE AND CREATE DATAFRAME

Python函数来反转一个链表

Binning error for a dataframe column – KeyError: "None of [Float64Index([61.5, 59.8, 56.8…. dtype='float64', length=53940)] are in the [columns]"

检查字符串中的参数是否后跟特定字母 – Python

可以在递归级别1的repr方法中返回对象的str表示。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。