如何聚合多列 – Pandas

huangapple go评论87阅读模式
英文:

How to aggregate multiple columns - Pandas

问题

我有这个数据框:

  1. ID Date XXX 123_Var 456_Var 789_Var 123_P 456_P 789_P
  2. A 07/16/2019 1 987 551 313 22 12 94
  3. A 07/16/2019 9 135 748 403 92 40 41
  4. A 07/18/2019 8 376 938 825 14 69 96
  5. A 07/18/2019 5 259 176 674 52 75 72
  6. B 07/16/2019 9 690 304 948 56 14 78
  7. B 07/16/2019 8 819 185 699 33 81 83
  8. B 07/18/2019 1 580 210 847 51 64 87

我想要按IDDate对数据框进行分组,通过最大值聚合XXX列,并通过最小值聚合123_Var456_Var789_Var列。

这是我已经开始编写的当前代码:

  1. df = (df.groupby(['ID','Date'], as_index=False)
  2. .agg({'XXX':'max', list(df.filter(regex='_Var')): 'min'}))

期望的结果:

  1. ID Date XXX 123_Var 456_Var 789_Var
  2. A 07/16/2019 9 135 551 313
  3. A 07/18/2019 8 259 176 674
  4. B 07/16/2019 9 690 185 699
  5. B 07/18/2019 1 580 210 847
英文:

I have this df:

  1. ID Date XXX 123_Var 456_Var 789_Var 123_P 456_P 789_P
  2. A 07/16/2019 1 987 551 313 22 12 94
  3. A 07/16/2019 9 135 748 403 92 40 41
  4. A 07/18/2019 8 376 938 825 14 69 96
  5. A 07/18/2019 5 259 176 674 52 75 72
  6. B 07/16/2019 9 690 304 948 56 14 78
  7. B 07/16/2019 8 819 185 699 33 81 83
  8. B 07/18/2019 1 580 210 847 51 64 87

I want to group the df by ID and Date, aggregate the XXX column by the maximum value, and aggregate 123_Var, 456_Var, 789_Var columns by the minimum value.

*** Note: The df contains many of these columns. The shape is: {some int}_Var.**

This is the current code I've started to write:

  1. df = (df.groupby(['ID','Date'], as_index=False)
  2. .agg({'XXX':'max', list(df.filter(regex='_Var')): 'min'}))

Expected result:

  1. ID Date XXX 123_Var 456_Var 789_Var
  2. A 07/16/2019 9 135 551 313
  3. A 07/18/2019 8 259 176 674
  4. B 07/16/2019 9 690 185 699
  5. B 07/18/2019 1 580 210 847

答案1

得分: 2

使用dict.fromkeys创建动态字典,然后与{'XXX':'max'}字典合并,并传递给GroupBy.agg

  1. d = dict.fromkeys(df.filter(regex='_Var').columns, 'min')
  2. df = df.groupby(['ID','Date'], as_index=False).agg({'XXX':'max', **d})
  3. print (df)

输出结果如下:

  1. ID Date XXX 123_Var 456_Var 789_Var
  2. 0 A 07/16/2019 9 135 551 313
  3. 1 A 07/18/2019 8 259 176 674
  4. 2 B 07/16/2019 9 690 185 699
  5. 3 B 07/18/2019 1 580 210 847
英文:

Create dictionary dynamic with dict.fromkeys and then merge it with {'XXX':'max'} dict and pass to GroupBy.agg:

  1. d = dict.fromkeys(df.filter(regex='_Var').columns, 'min')
  2. df = df.groupby(['ID','Date'], as_index=False).agg({**{'XXX':'max'}, **d})
  3. print (df)
  4. ID Date XXX 123_Var 456_Var 789_Var
  5. 0 A 07/16/2019 9 135 551 313
  6. 1 A 07/18/2019 8 259 176 674
  7. 2 B 07/16/2019 9 690 185 699
  8. 3 B 07/18/2019 1 580 210 847

huangapple
  • 本文由 发表于 2020年1月6日 18:15:20
  • 转载请务必保留本文链接:https://go.coder-hub.com/59610236.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定