如何在仅找到一个匹配项时,为特定单元格的“findall”结果添加分隔符?

huangapple go评论110阅读模式
英文:

How can I add a delimiter to my "findall" result when only one match is found for a given cell?

问题

你的输出看起来是这样的:

0 XXX-1000ZZZ-9999|ZZZ-8000
1 XXX-2000|PPP-2200
2 XXX-3000|XXX-1000

但我希望它看起来像这样:

0 XXX-1000|ZZZ-9999|ZZZ-8000
1 XXX-2000|PPP-2200
2 XXX-3000|XXX-1000

所以,我添加了上面的最后两行来尝试添加竖线字符。这不起作用,并且给我以下错误:ValueError: Series的真值是模棱两可的。使用a.empty、a.bool()、a.item()、a.any()或a.all()。

我知道这是因为程序期望一个布尔值,但我不知道如何解决它。

英文:

I'm trying to extract substrings containing equipment names from the cells in a dataframe. Because of the way the data was created, these substrings can be in any cell. I created this program which uses "findall" and some regex to create a list of all the equipment found in the cells in a given row.

The problem is, the output isn't exactly as I need it. For example, if "findall" matches only one substring in the cell, my script does not add a delimiter afterwards. When the program continues to the next column, it joins the first column match with the second column matches, without a delimiter between the results. And I need the delimiter so I can explode the list later on.

Here is the code:

import pandas as pd

# IMPORT FILE AND CREATE DATAFRAME
d = {'Cause':['Consider checking XXX-1000 for deficiencies prior to train switch', 'XXX-2000 AND PPP-2200 to be taken out of service', 'Need to check XXX-3000 and potentially XXX-1000 for degradation'], 'Mitigation':['ZZZ-9999 is dependent on ZZZ-8000', 'These equipment will be out of service in 2025, not applicable', 'No further comments']}

df = pd.DataFrame(data=d)

# Trying the findall technique
df['new_eq'] = ""
for column in df.columns:
    df['equipment'] =  df['equipment'] + df[column].str.findall(r'\s*(\w{3,}-\d{4}\D*?) ').str.join('|')
    if df['equipment'].str.contains('|') == False:
         df['equipment'] += '|'

My output looks like this:

0   XXX-1000ZZZ-9999|ZZZ-8000
1   XXX-2000|PPP-2200
2   XXX-3000|XXX-1000

But I want it to look like this:

0   XXX-1000|ZZZ-9999|ZZZ-8000
1   XXX-2000|PPP-2200
2   XXX-3000|XXX-1000

So I added the last two lines of above to try to add the pipe character. It doesn't work and is giving me the following error: ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

I know this is because the program expects a boolean value but I can't figure out how to fix it.

答案1

得分: 0

我建议使用这个解决方案:

import pandas as pd

导入文件并创建数据框

d = {'Cause':['在切换火车之前,考虑检查XXX-1000是否存在缺陷', 'XXX-2000和PPP-2200需要停止使用', '需要检查XXX-3000,可能还有XXX-1000是否有退化'], 'Mitigation':['ZZZ-9999依赖于ZZZ-8000', '这些设备将在2025年停止使用,不适用', '没有进一步的评论']}

df = pd.DataFrame(data=d)

df['equipment'] = (df['Cause'] + ' ' + df['Mitigation']).str.findall(r'(\w{3,}-\d{4})').apply(lambda x: '|'.join(x))
df['equipment'] = df['equipment'].apply(lambda x: x.rstrip('|') if x.endswith('|') else x)

for i in df['equipment']:
print(i)


它会返回:

XXX-1000|ZZZ-9999|ZZZ-8000
XXX-2000|PPP-2200
XXX-3000|XXX-1000


或者简单地使用

df['equipment']


返回

0 XXX-1000|ZZZ-9999|ZZZ-8000
1 XXX-2000|PPP-2200
2 XXX-3000|XXX-1000
Name: equipment, dtype: object


<details>
<summary>英文:</summary>

I suggest this solution:

import pandas as pd

IMPORT FILE AND CREATE DATAFRAME

d = {'Cause':['Consider checking XXX-1000 for deficiencies prior to train switch', 'XXX-2000 AND PPP-2200 to be taken out of service', 'Need to check XXX-3000 and potentially XXX-1000 for degradation'], 'Mitigation':['ZZZ-9999 is dependent on ZZZ-8000', 'These equipment will be out of service in 2025, not applicable', 'No further comments']}

df = pd.DataFrame(data=d)

df['equipment'] = (df['Cause'] + ' ' + df['Mitigation']).str.findall(r'(\w{3,}-\d{4})').apply(lambda x: '|'.join(x))
df['equipment'] = df['equipment'].apply(lambda x: x.rstrip('|') if x.endswith('|') else x)

for i in df['equipment']:
print(i)


which returns:

XXX-1000|ZZZ-9999|ZZZ-8000
XXX-2000|PPP-2200
XXX-3000|XXX-1000


or simply

df['equipment]


giving

0 XXX-1000|ZZZ-9999|ZZZ-8000
1 XXX-2000|PPP-2200
2 XXX-3000|XXX-1000
Name: equipment, dtype: object


</details>



huangapple
  • 本文由 发表于 2023年2月10日 12:37:10
  • 转载请务必保留本文链接:https://go.coder-hub.com/75407002.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定