Group pandas dataframe and flag corresponding rows where all values from a list exist in a column

huangapple go评论74阅读模式
英文:

Group pandas dataframe and flag corresponding rows where all values from a list exist in a column

问题

Sure, here's a code snippet that should help you achieve the desired output using the pandas library:

import pandas as pd

# Your DataFrame
df = pd.DataFrame({
    'Group1': ['G1', 'G1', 'G1', 'G1', 'G1', 'G2', 'G2', 'G2', 'G2', 'G2', 'G2'],
    'Group2': ['A1', 'A1', 'A1', 'A2', 'A2', 'A1', 'A1', 'A2', 'A2', 'A2', 'A2'],
    'Label': ['AA', 'BB', 'CC', 'AA', 'CC', 'BB', 'DD', 'AA', 'CC', 'DD', 'BB']
})

l1 = ['AA', 'BB', 'CC', 'DD']

# Create a function to check if all values in 'Label' column are in l1
def check_label(group):
    return all(label in l1 for label in group['Label'])

# Apply the function and create the 'Flag' column
df['Flag'] = df.groupby(['Group1', 'Group2'])['Label'].transform(check_label).astype(int)

# Display the result
print(df)

This code defines a custom function check_label to check if all values in the 'Label' column are in the list l1. It then applies this function to each group of 'Group1' and 'Group2' using groupby and creates the 'Flag' column with the result. The output DataFrame will have the desired flagging based on your criteria.

英文:

I have a dataframe with the following structure:

Group1 Group2 Label
G1     A1    AA
G1     A1    BB
G1     A1    CC
G1     A2    AA
G1     A2    CC
G2     A1    BB
G2     A1    DD
G2     A2    AA
G2     A2    CC
G2     A2    DD
G2     A2    BB

l1 = ['AA','BB','CC','DD']

I want to group the dataframe based on the Group1 and Group2 columns and check if the 'Label' column is equal(ordering may be different) to a list of values (l1), and flag those group.

Expected output:
The group Group1=G2, Group2=A2 has all values from l1 in the Label column. Therefore the rows corresponding to the group are flagged.

Group1 Group2 Label  Flag
G1     A1    AA     0
G1     A1    BB     0
G1     A1    CC     0
G1     A2    AA     0
G1     A2    CC     0
G2     A1    BB     0
G2     A1    DD     0
G2     A2    AA     1
G2     A2    CC     1
G2     A2    DD     1
G2     A2    BB     1

I haven't been able to make much progress:

import pandas as pd
df = pd.DataFrame({
                'Group1': [ 'G1','G1', 'G1','G1','G1',
                            'G2','G2', 'G2','G2','G2','G2'],
                'Group2': ['A1','A1','A1','A2','A2',
                            'A1','A1','A2','A2','A2','A2'],
                'Label': ['AA','BB','CC','AA','CC','BB',
                            'DD','AA','CC','DD','BB']})
df.groupby(['Group1','Group2'])

A link to a solution or a function/method I can use to achieve this is appreciated

答案1

得分: 1

你可以使用 groupby.transformset 操作:

l1 = ['AA', 'BB', 'CC', 'DD']
S = set(l1)

df['Flag'] = (df
 .groupby(['Group1','Group2'])
 ['Label'].transform(lambda x: set(x)==S)
 .astype(int)
 )

输出:

   Group1 Group2 Label  Flag
0      G1     A1    AA     0
1      G1     A1    BB     0
2      G1     A1    CC     0
3      G1     A2    AA     0
4      G1     A2    CC     0
5      G2     A1    BB     0
6      G2     A1    DD     0
7      G2     A2    AA     1
8      G2     A2    BB     1
9      G2     A2    CC     1
10     G2     A2    DD     1
英文:

You can use a groupby.transform and set operations:


l1 = ['AA','BB','CC','DD']
S = set(l1)

df['Flag'] = (df
 .groupby(['Group1','Group2'])
 ['Label'].transform(lambda x: set(x)==S)
 .astype(int)
 )

Output:

   Group1 Group2 Label  Flag
0      G1     A1    AA     0
1      G1     A1    BB     0
2      G1     A1    CC     0
3      G1     A2    AA     0
4      G1     A2    CC     0
5      G2     A1    BB     0
6      G2     A1    DD     0
7      G2     A2    AA     1
8      G2     A2    BB     1
9      G2     A2    CC     1
10     G2     A2    DD     1

答案2

得分: 1

这是使用 .agg(set)df.join() 的一种方法:

cols = ['Group1', 'Group2']
df.join(df.groupby(cols)['Label'].agg(set).eq(set(l1)).rename('Flag').astype(int), on=cols)

或者使用以下方法:

df['Label'].str.get_dummies().reindex(l, axis=1).groupby([df['Group1'], df['Group2']]).transform('any').all(axis=1).astype(int)

输出:

   Group1 Group2 Label Flag
0      G1     A1    AA    0
1      G1     A1    BB    0
2      G1     A1    CC    0
3      G1     A2    AA    0
4      G1     A2    CC    0
5      G2     A1    BB    0
6      G2     A1    DD    0
7      G2     A2    AA    1
8      G2     A2    CC    1
9      G2     A2    DD    1
10     G2     A2    BB    1
英文:

Here is a way using .agg(set) and df.join()

cols = ['Group1','Group2']
df.join(df.groupby(cols)['Label'].agg(set).eq(set(l1)).rename('Flag').astype(int),on = cols)

or

df['Label'].str.get_dummies().reindex(l,axis=1).groupby([df['Group1'],df['Group2']]).transform('any').all(axis=1).astype(int)

Output:

   Group1 Group2 Label  Flag
0      G1     A1    AA     0
1      G1     A1    BB     0
2      G1     A1    CC     0
3      G1     A2    AA     0
4      G1     A2    CC     0
5      G2     A1    BB     0
6      G2     A1    DD     0
7      G2     A2    AA     1
8      G2     A2    CC     1
9      G2     A2    DD     1
10     G2     A2    BB     1

答案3

得分: 1

快速解决方案

df['flag'] = df['Label'].mask(~df['Label'].isin(l1))
df['flag'] = df.groupby(['Group1', 'Group2'])['flag'].transform('nunique').eq(len(l1))

   Group1 Group2 Label   flag
0     G1     A1    AA  False
1     G1     A1    BB  False
2     G1     A1    CC  False
3     G1     A2    AA  False
4     G1     A2    CC  False
5     G2     A1    BB  False
6     G2     A1    DD  False
7     G2     A2    AA   True
8     G2     A2    CC   True
9     G2     A2    DD   True
10    G2     A2    BB   True
英文:

Fast solution

df['flag'] = df['Label'].mask(~df['Label'].isin(l1))
df['flag'] = df.groupby(['Group1', 'Group2'])['flag'].transform('nunique').eq(len(l1))

   Group1 Group2 Label   flag
0      G1     A1    AA  False
1      G1     A1    BB  False
2      G1     A1    CC  False
3      G1     A2    AA  False
4      G1     A2    CC  False
5      G2     A1    BB  False
6      G2     A1    DD  False
7      G2     A2    AA   True
8      G2     A2    CC   True
9      G2     A2    DD   True
10     G2     A2    BB   True

huangapple
  • 本文由 发表于 2023年5月18日 11:21:16
  • 转载请务必保留本文链接:https://go.coder-hub.com/76277517.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定