英文:
How to aggregate multiple columns - Pandas
问题
我有这个数据框:
ID Date XXX 123_Var 456_Var 789_Var 123_P 456_P 789_P
A 07/16/2019 1 987 551 313 22 12 94
A 07/16/2019 9 135 748 403 92 40 41
A 07/18/2019 8 376 938 825 14 69 96
A 07/18/2019 5 259 176 674 52 75 72
B 07/16/2019 9 690 304 948 56 14 78
B 07/16/2019 8 819 185 699 33 81 83
B 07/18/2019 1 580 210 847 51 64 87
我想要按ID
和Date
对数据框进行分组,通过最大值聚合XXX
列,并通过最小值聚合123_Var
、456_Var
、789_Var
列。
这是我已经开始编写的当前代码:
df = (df.groupby(['ID','Date'], as_index=False)
.agg({'XXX':'max', list(df.filter(regex='_Var')): 'min'}))
期望的结果:
ID Date XXX 123_Var 456_Var 789_Var
A 07/16/2019 9 135 551 313
A 07/18/2019 8 259 176 674
B 07/16/2019 9 690 185 699
B 07/18/2019 1 580 210 847
英文:
I have this df:
ID Date XXX 123_Var 456_Var 789_Var 123_P 456_P 789_P
A 07/16/2019 1 987 551 313 22 12 94
A 07/16/2019 9 135 748 403 92 40 41
A 07/18/2019 8 376 938 825 14 69 96
A 07/18/2019 5 259 176 674 52 75 72
B 07/16/2019 9 690 304 948 56 14 78
B 07/16/2019 8 819 185 699 33 81 83
B 07/18/2019 1 580 210 847 51 64 87
I want to group the df by ID
and Date
, aggregate the XXX
column by the maximum value, and aggregate 123_Var
, 456_Var
, 789_Var
columns by the minimum value.
*** Note: The df contains many of these columns. The shape is: {some int}_Var
.**
This is the current code I've started to write:
df = (df.groupby(['ID','Date'], as_index=False)
.agg({'XXX':'max', list(df.filter(regex='_Var')): 'min'}))
Expected result:
ID Date XXX 123_Var 456_Var 789_Var
A 07/16/2019 9 135 551 313
A 07/18/2019 8 259 176 674
B 07/16/2019 9 690 185 699
B 07/18/2019 1 580 210 847
答案1
得分: 2
使用dict.fromkeys
创建动态字典,然后与{'XXX':'max'}
字典合并,并传递给GroupBy.agg
:
d = dict.fromkeys(df.filter(regex='_Var').columns, 'min')
df = df.groupby(['ID','Date'], as_index=False).agg({'XXX':'max', **d})
print (df)
输出结果如下:
ID Date XXX 123_Var 456_Var 789_Var
0 A 07/16/2019 9 135 551 313
1 A 07/18/2019 8 259 176 674
2 B 07/16/2019 9 690 185 699
3 B 07/18/2019 1 580 210 847
英文:
Create dictionary dynamic with dict.fromkeys
and then merge it with {'XXX':'max'}
dict and pass to GroupBy.agg
:
d = dict.fromkeys(df.filter(regex='_Var').columns, 'min')
df = df.groupby(['ID','Date'], as_index=False).agg({**{'XXX':'max'}, **d})
print (df)
ID Date XXX 123_Var 456_Var 789_Var
0 A 07/16/2019 9 135 551 313
1 A 07/18/2019 8 259 176 674
2 B 07/16/2019 9 690 185 699
3 B 07/18/2019 1 580 210 847
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论