Python按列分组,并确保分组中的值不会跳过另一个数据帧的顺序。

huangapple go评论66阅读模式
英文:

Python group by columns and make sure values in group doesn't skip values in order of another dataframe

问题

你可以使用以下代码来实现你的需求:

import pandas as pd

# 创建数据框
df1 = pd.DataFrame({
    'Group1': ['G1','G1','G1','G1','G1','G2','G2','G2','G2','G2','G2'],
    'Group2': ['A1','A1','A1','A2','A2','A1','A1','A2','A2','A2','A2'],
    'Label': ['AA','BB','CC','AA','CC','BB','DD','AA','CC','DD','BB']
})

df2 = pd.DataFrame({
    'ID': [1, 2, 4, 5, 7],
    'Label_ref': ['AA','BB','CC','DD','EE']
})

# 将df2的Label_ref转换为列表
label_ref_list = df2['Label_ref'].tolist()

# 定义一个函数来检查Label是否符合要求
def check_label_order(labels):
    index = 0
    for label in labels:
        if label == label_ref_list[index]:
            index += 1
            if index == len(label_ref_list):
                return 1
    return 0

# 使用groupby和apply来应用函数并创建Flag列
df1['Flag'] = df1.groupby(['Group1', 'Group2'])['Label'].transform(check_label_order)

# 打印结果
print(df1)

这将给你期望的输出,包括Flag列,用于指示是否满足条件。

英文:

I have 2 dataframes with the following structures:

df1
Group1 Group2 Label
G1     A1    AA
G1     A1    BB
G1     A1    CC
G1     A2    AA
G1     A2    CC
G2     A1    BB
G2     A1    DD
G2     A2    AA
G2     A2    CC
G2     A2    DD
G2     A2    BB

df2
ID  Label_ref
1     AA
2     BB
4     CC
5     DD
7     EE

I want to group the df1 based on the Group1 and Group2 columns and check if the 'Label' column contains values from df2 Label_ref in order of ID.

Label on df1 doesn't need to have all values from Label_ref on df2, but the values Label on df1 can't skip any Label_ref values in the order of ID

Expected output:

The group Group1=G1, Group2=A1 doesn't skip any values from AA - CC. Therefore the rows corresponding to the group are flagged.

The group Group1=G1, Group2=A2 skips values from BB but has the value CC. Therefore the rows corresponding to the group are not flagged.

The group Group1=G2, Group2=A2 doesn't skip any values from AA - DD although they are not in order. Therefore the rows corresponding to the group are flagged.

Group1 Group2 Label  Flag
G1     A1    AA     1
G1     A1    BB     1
G1     A1    CC     1
G1     A2    AA     0
G1     A2    CC     0
G2     A1    BB     0
G2     A1    DD     0
G2     A2    AA     1
G2     A2    CC     1
G2     A2    DD     1
G2     A2    BB     1

I haven't been able to make much progress:

import pandas as pd
df1 = pd.DataFrame({
                'Group1': [ 'G1','G1', 'G1','G1','G1',
                            'G2','G2', 'G2','G2','G2','G2'],
                'Group2': ['A1','A1','A1','A2','A2',
                            'A1','A1','A2','A2','A2','A2'],
                'Label': ['AA','BB','CC','AA','CC','BB',
                            'DD','AA','CC','DD','BB']})
df2 = pd.DataFrame({
                'ID': [ 1, 2, 4, 5, 7],
                'Label_ref': ['AA','BB','CC','DD','EE']})

A link to a solution or a function/method I can use to achieve this is appreciated

答案1

得分: 2

你可以使用一个映射系列:

# 创建映射系列
smap = pd.Series(*pd.factorize(df2.sort_values('ID')['Label_ref']))
df1['Flag'] = df1['Label'].map(smap)

# 检查每个组是否连续
is_consecutive = lambda x: (~x.diff().gt(1).any()).astype(int)
df1['Flag'] = df1.sort_values('Flag').groupby(['Group1', 'Group2'])['Flag'].transform(is_consecutive)

输出:

>>> df1
   Group1 Group2 Label  Flag
0      G1     A1    AA     1
1      G1     A1    BB     1
2      G1     A1    CC     1
3      G1     A2    AA     0
4      G1     A2    CC     0
5      G2     A1    BB     0
6      G2     A1    DD     0
7      G2     A2    AA     1
8      G2     A2    CC     1
9      G2     A2    DD     1
10     G2     A2    BB     1
英文:

You can use a mapping series:

# Create mapping series
smap = pd.Series(*pd.factorize(df2.sort_values('ID')['Label_ref']))
df1['Flag'] = df1['Label'].map(smap)

# Check if flags are consecutive for each group
is_consecutive = lambda x: (~x.diff().gt(1).any()).astype(int)
df1['Flag'] = df1.sort_values('Flag').groupby(['Group1', 'Group2'])['Flag'].transform(is_consecutive)

Output:

>>> df1
   Group1 Group2 Label  Flag
0      G1     A1    AA     1
1      G1     A1    BB     1
2      G1     A1    CC     1
3      G1     A2    AA     0
4      G1     A2    CC     0
5      G2     A1    BB     0
6      G2     A1    DD     0
7      G2     A2    AA     1
8      G2     A2    CC     1
9      G2     A2    DD     1
10     G2     A2    BB     1

huangapple
  • 本文由 发表于 2023年5月25日 12:21:15
  • 转载请务必保留本文链接:https://go.coder-hub.com/76328921.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定