英文:
How to aggregate multiple columns - Pandas
问题
我有这个数据框:
ID         Date  XXX  123_Var  456_Var  789_Var  123_P  456_P  789_P
 A  07/16/2019     1      987      551      313     22     12     94
 A  07/16/2019     9      135      748      403     92     40     41
 A  07/18/2019     8      376      938      825     14     69     96
 A  07/18/2019     5      259      176      674     52     75     72
 B   07/16/2019    9      690      304      948     56     14     78
 B   07/16/2019    8      819      185      699     33     81     83
 B   07/18/2019    1      580      210      847     51     64     87
我想要按ID和Date对数据框进行分组,通过最大值聚合XXX列,并通过最小值聚合123_Var、456_Var、789_Var列。
这是我已经开始编写的当前代码:
df = (df.groupby(['ID','Date'], as_index=False)
        .agg({'XXX':'max', list(df.filter(regex='_Var')): 'min'}))
期望的结果:
ID         Date  XXX  123_Var  456_Var  789_Var
 A  07/16/2019     9      135      551      313
 A  07/18/2019     8      259      176      674
 B   07/16/2019    9      690      185      699
 B   07/18/2019    1      580      210      847
英文:
I have this df:
ID         Date  XXX  123_Var  456_Var  789_Var  123_P  456_P  789_P
 A  07/16/2019     1      987      551      313     22     12     94
 A  07/16/2019     9      135      748      403     92     40     41
 A  07/18/2019     8      376      938      825     14     69     96
 A  07/18/2019     5      259      176      674     52     75     72
 B   07/16/2019    9      690      304      948     56     14     78
 B   07/16/2019    8      819      185      699     33     81     83
 B   07/18/2019    1      580      210      847     51     64     87
I want to group the df by ID and Date, aggregate the XXX column by the maximum value, and aggregate 123_Var, 456_Var, 789_Var columns by the minimum value.
*** Note: The df contains many of these columns. The shape is: {some int}_Var.**
This is the current code I've started to write:
df = (df.groupby(['ID','Date'], as_index=False)
        .agg({'XXX':'max', list(df.filter(regex='_Var')): 'min'}))
Expected result:
ID         Date  XXX  123_Var  456_Var  789_Var
 A  07/16/2019     9      135      551      313
 A  07/18/2019     8      259      176      674
 B   07/16/2019    9      690      185      699
 B   07/18/2019    1      580      210      847
答案1
得分: 2
使用dict.fromkeys创建动态字典,然后与{'XXX':'max'}字典合并,并传递给GroupBy.agg:
d = dict.fromkeys(df.filter(regex='_Var').columns, 'min')
df = df.groupby(['ID','Date'], as_index=False).agg({'XXX':'max', **d})
print (df)
输出结果如下:
  ID        Date  XXX  123_Var  456_Var  789_Var
0  A  07/16/2019    9      135      551      313
1  A  07/18/2019    8      259      176      674
2  B  07/16/2019    9      690      185      699
3  B  07/18/2019    1      580      210      847
英文:
Create dictionary dynamic with dict.fromkeys and then merge  it with {'XXX':'max'} dict and pass to GroupBy.agg:
d = dict.fromkeys(df.filter(regex='_Var').columns, 'min')
df = df.groupby(['ID','Date'], as_index=False).agg({**{'XXX':'max'}, **d})
print (df)
  ID        Date  XXX  123_Var  456_Var  789_Var
0  A  07/16/2019    9      135      551      313
1  A  07/18/2019    8      259      176      674
2  B  07/16/2019    9      690      185      699
3  B  07/18/2019    1      580      210      847
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。


评论