如何聚合多列 – Pandas

huangapple go评论67阅读模式
英文:

How to aggregate multiple columns - Pandas

问题

我有这个数据框:

ID         Date  XXX  123_Var  456_Var  789_Var  123_P  456_P  789_P
 A  07/16/2019     1      987      551      313     22     12     94
 A  07/16/2019     9      135      748      403     92     40     41
 A  07/18/2019     8      376      938      825     14     69     96
 A  07/18/2019     5      259      176      674     52     75     72
 B   07/16/2019    9      690      304      948     56     14     78
 B   07/16/2019    8      819      185      699     33     81     83
 B   07/18/2019    1      580      210      847     51     64     87

我想要按IDDate对数据框进行分组,通过最大值聚合XXX列,并通过最小值聚合123_Var456_Var789_Var列。

这是我已经开始编写的当前代码:

df = (df.groupby(['ID','Date'], as_index=False)
        .agg({'XXX':'max', list(df.filter(regex='_Var')): 'min'}))

期望的结果:

ID         Date  XXX  123_Var  456_Var  789_Var
 A  07/16/2019     9      135      551      313
 A  07/18/2019     8      259      176      674
 B   07/16/2019    9      690      185      699
 B   07/18/2019    1      580      210      847
英文:

I have this df:

ID         Date  XXX  123_Var  456_Var  789_Var  123_P  456_P  789_P
 A  07/16/2019     1      987      551      313     22     12     94
 A  07/16/2019     9      135      748      403     92     40     41
 A  07/18/2019     8      376      938      825     14     69     96
 A  07/18/2019     5      259      176      674     52     75     72
 B   07/16/2019    9      690      304      948     56     14     78
 B   07/16/2019    8      819      185      699     33     81     83
 B   07/18/2019    1      580      210      847     51     64     87

I want to group the df by ID and Date, aggregate the XXX column by the maximum value, and aggregate 123_Var, 456_Var, 789_Var columns by the minimum value.

*** Note: The df contains many of these columns. The shape is: {some int}_Var.**

This is the current code I've started to write:

df = (df.groupby(['ID','Date'], as_index=False)
        .agg({'XXX':'max', list(df.filter(regex='_Var')): 'min'}))

Expected result:

ID         Date  XXX  123_Var  456_Var  789_Var
 A  07/16/2019     9      135      551      313
 A  07/18/2019     8      259      176      674
 B   07/16/2019    9      690      185      699
 B   07/18/2019    1      580      210      847

答案1

得分: 2

使用dict.fromkeys创建动态字典,然后与{'XXX':'max'}字典合并,并传递给GroupBy.agg

d = dict.fromkeys(df.filter(regex='_Var').columns, 'min')
df = df.groupby(['ID','Date'], as_index=False).agg({'XXX':'max', **d})
print (df)

输出结果如下:

  ID        Date  XXX  123_Var  456_Var  789_Var
0  A  07/16/2019    9      135      551      313
1  A  07/18/2019    8      259      176      674
2  B  07/16/2019    9      690      185      699
3  B  07/18/2019    1      580      210      847
英文:

Create dictionary dynamic with dict.fromkeys and then merge it with {'XXX':'max'} dict and pass to GroupBy.agg:

d = dict.fromkeys(df.filter(regex='_Var').columns, 'min')
df = df.groupby(['ID','Date'], as_index=False).agg({**{'XXX':'max'}, **d})
print (df)
  ID        Date  XXX  123_Var  456_Var  789_Var
0  A  07/16/2019    9      135      551      313
1  A  07/18/2019    8      259      176      674
2  B  07/16/2019    9      690      185      699
3  B  07/18/2019    1      580      210      847

huangapple
  • 本文由 发表于 2020年1月6日 18:15:20
  • 转载请务必保留本文链接:https://go.coder-hub.com/59610236.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定