2023年5月18日 11:21:16go评论74阅读模式

英文:

Group pandas dataframe and flag corresponding rows where all values from a list exist in a column

问题

Sure, here's a code snippet that should help you achieve the desired output using the pandas library:

import pandas as pd

# Your DataFrame
df = pd.DataFrame({
    'Group1': ['G1', 'G1', 'G1', 'G1', 'G1', 'G2', 'G2', 'G2', 'G2', 'G2', 'G2'],
    'Group2': ['A1', 'A1', 'A1', 'A2', 'A2', 'A1', 'A1', 'A2', 'A2', 'A2', 'A2'],
    'Label': ['AA', 'BB', 'CC', 'AA', 'CC', 'BB', 'DD', 'AA', 'CC', 'DD', 'BB']
})

l1 = ['AA', 'BB', 'CC', 'DD']

# Create a function to check if all values in 'Label' column are in l1
def check_label(group):
    return all(label in l1 for label in group['Label'])

# Apply the function and create the 'Flag' column
df['Flag'] = df.groupby(['Group1', 'Group2'])['Label'].transform(check_label).astype(int)

# Display the result
print(df)

This code defines a custom function check_label to check if all values in the 'Label' column are in the list l1. It then applies this function to each group of 'Group1' and 'Group2' using groupby and creates the 'Flag' column with the result. The output DataFrame will have the desired flagging based on your criteria.

英文:

I have a dataframe with the following structure:

Group1 Group2 Label
G1     A1    AA
G1     A1    BB
G1     A1    CC
G1     A2    AA
G1     A2    CC
G2     A1    BB
G2     A1    DD
G2     A2    AA
G2     A2    CC
G2     A2    DD
G2     A2    BB

l1 = [&#39;AA&#39;,&#39;BB&#39;,&#39;CC&#39;,&#39;DD&#39;]

I want to group the dataframe based on the Group1 and Group2 columns and check if the 'Label' column is equal(ordering may be different) to a list of values (l1), and flag those group.

Expected output:
The group Group1=G2, Group2=A2 has all values from l1 in the Label column. Therefore the rows corresponding to the group are flagged.

Group1 Group2 Label  Flag
G1     A1    AA     0
G1     A1    BB     0
G1     A1    CC     0
G1     A2    AA     0
G1     A2    CC     0
G2     A1    BB     0
G2     A1    DD     0
G2     A2    AA     1
G2     A2    CC     1
G2     A2    DD     1
G2     A2    BB     1

I haven't been able to make much progress:

import pandas as pd
df = pd.DataFrame({
                &#39;Group1&#39;: [ &#39;G1&#39;,&#39;G1&#39;, &#39;G1&#39;,&#39;G1&#39;,&#39;G1&#39;,
                            &#39;G2&#39;,&#39;G2&#39;, &#39;G2&#39;,&#39;G2&#39;,&#39;G2&#39;,&#39;G2&#39;],
                &#39;Group2&#39;: [&#39;A1&#39;,&#39;A1&#39;,&#39;A1&#39;,&#39;A2&#39;,&#39;A2&#39;,
                            &#39;A1&#39;,&#39;A1&#39;,&#39;A2&#39;,&#39;A2&#39;,&#39;A2&#39;,&#39;A2&#39;],
                &#39;Label&#39;: [&#39;AA&#39;,&#39;BB&#39;,&#39;CC&#39;,&#39;AA&#39;,&#39;CC&#39;,&#39;BB&#39;,
                            &#39;DD&#39;,&#39;AA&#39;,&#39;CC&#39;,&#39;DD&#39;,&#39;BB&#39;]})
df.groupby([&#39;Group1&#39;,&#39;Group2&#39;])

A link to a solution or a function/method I can use to achieve this is appreciated

答案1

得分: 1

你可以使用 groupby.transform 和 set 操作：

l1 = ['AA', 'BB', 'CC', 'DD']
S = set(l1)

df['Flag'] = (df
 .groupby(['Group1','Group2'])
 ['Label'].transform(lambda x: set(x)==S)
 .astype(int)
 )

输出：

   Group1 Group2 Label  Flag
0      G1     A1    AA     0
1      G1     A1    BB     0
2      G1     A1    CC     0
3      G1     A2    AA     0
4      G1     A2    CC     0
5      G2     A1    BB     0
6      G2     A1    DD     0
7      G2     A2    AA     1
8      G2     A2    BB     1
9      G2     A2    CC     1
10     G2     A2    DD     1

英文:

You can use a groupby.transform and set operations:


l1 = [&#39;AA&#39;,&#39;BB&#39;,&#39;CC&#39;,&#39;DD&#39;]
S = set(l1)

df[&#39;Flag&#39;] = (df
 .groupby([&#39;Group1&#39;,&#39;Group2&#39;])
 [&#39;Label&#39;].transform(lambda x: set(x)==S)
 .astype(int)
 )

Output:

   Group1 Group2 Label  Flag
0      G1     A1    AA     0
1      G1     A1    BB     0
2      G1     A1    CC     0
3      G1     A2    AA     0
4      G1     A2    CC     0
5      G2     A1    BB     0
6      G2     A1    DD     0
7      G2     A2    AA     1
8      G2     A2    BB     1
9      G2     A2    CC     1
10     G2     A2    DD     1

答案2

得分: 1

这是使用 .agg(set) 和 df.join() 的一种方法：

cols = ['Group1', 'Group2']
df.join(df.groupby(cols)['Label'].agg(set).eq(set(l1)).rename('Flag').astype(int), on=cols)

或者使用以下方法：

df['Label'].str.get_dummies().reindex(l, axis=1).groupby([df['Group1'], df['Group2']]).transform('any').all(axis=1).astype(int)

输出：

   Group1 Group2 Label Flag
0      G1     A1    AA    0
1      G1     A1    BB    0
2      G1     A1    CC    0
3      G1     A2    AA    0
4      G1     A2    CC    0
5      G2     A1    BB    0
6      G2     A1    DD    0
7      G2     A2    AA    1
8      G2     A2    CC    1
9      G2     A2    DD    1
10     G2     A2    BB    1

英文:

Here is a way using .agg(set) and df.join()

cols = [&#39;Group1&#39;,&#39;Group2&#39;]
df.join(df.groupby(cols)[&#39;Label&#39;].agg(set).eq(set(l1)).rename(&#39;Flag&#39;).astype(int),on = cols)

df[&#39;Label&#39;].str.get_dummies().reindex(l,axis=1).groupby([df[&#39;Group1&#39;],df[&#39;Group2&#39;]]).transform(&#39;any&#39;).all(axis=1).astype(int)

Output:

   Group1 Group2 Label  Flag
0      G1     A1    AA     0
1      G1     A1    BB     0
2      G1     A1    CC     0
3      G1     A2    AA     0
4      G1     A2    CC     0
5      G2     A1    BB     0
6      G2     A1    DD     0
7      G2     A2    AA     1
8      G2     A2    CC     1
9      G2     A2    DD     1
10     G2     A2    BB     1

答案3

得分: 1

快速解决方案

df['flag'] = df['Label'].mask(~df['Label'].isin(l1))
df['flag'] = df.groupby(['Group1', 'Group2'])['flag'].transform('nunique').eq(len(l1))

   Group1 Group2 Label   flag
0     G1     A1    AA  False
1     G1     A1    BB  False
2     G1     A1    CC  False
3     G1     A2    AA  False
4     G1     A2    CC  False
5     G2     A1    BB  False
6     G2     A1    DD  False
7     G2     A2    AA   True
8     G2     A2    CC   True
9     G2     A2    DD   True
10    G2     A2    BB   True

英文:

Fast solution

df[&#39;flag&#39;] = df[&#39;Label&#39;].mask(~df[&#39;Label&#39;].isin(l1))
df[&#39;flag&#39;] = df.groupby([&#39;Group1&#39;, &#39;Group2&#39;])[&#39;flag&#39;].transform(&#39;nunique&#39;).eq(len(l1))

   Group1 Group2 Label   flag
0      G1     A1    AA  False
1      G1     A1    BB  False
2      G1     A1    CC  False
3      G1     A2    AA  False
4      G1     A2    CC  False
5      G2     A1    BB  False
6      G2     A1    DD  False
7      G2     A2    AA   True
8      G2     A2    CC   True
9      G2     A2    DD   True
10     G2     A2    BB   True

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Group pandas dataframe and flag corresponding rows where all values from a list exist in a column

问题

答案1

答案2

答案3

快速解决方案

Fast solution

理解 pandas 的 .apply(axis=’columns’) 方法？

布尔索引在数组中

如何从Python中的嵌套列表中提取特定元素？

如何在从一开始的枚举中删除列表中的项目？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论