2020年1月6日 18:15:20go评论87阅读模式

英文:

How to aggregate multiple columns - Pandas

问题

我有这个数据框：

ID         Date  XXX  123_Var  456_Var  789_Var  123_P  456_P  789_P
 A  07/16/2019     1      987      551      313     22     12     94
 A  07/16/2019     9      135      748      403     92     40     41
 A  07/18/2019     8      376      938      825     14     69     96
 A  07/18/2019     5      259      176      674     52     75     72
 B   07/16/2019    9      690      304      948     56     14     78
 B   07/16/2019    8      819      185      699     33     81     83
 B   07/18/2019    1      580      210      847     51     64     87

我想要按ID和Date对数据框进行分组，通过最大值聚合XXX列，并通过最小值聚合123_Var、456_Var、789_Var列。

这是我已经开始编写的当前代码：

df = (df.groupby(['ID','Date'], as_index=False)
        .agg({'XXX':'max', list(df.filter(regex='_Var')): 'min'}))

期望的结果：

ID         Date  XXX  123_Var  456_Var  789_Var
 A  07/16/2019     9      135      551      313
 A  07/18/2019     8      259      176      674
 B   07/16/2019    9      690      185      699
 B   07/18/2019    1      580      210      847

英文:

I have this df:

ID         Date  XXX  123_Var  456_Var  789_Var  123_P  456_P  789_P
 A  07/16/2019     1      987      551      313     22     12     94
 A  07/16/2019     9      135      748      403     92     40     41
 A  07/18/2019     8      376      938      825     14     69     96
 A  07/18/2019     5      259      176      674     52     75     72
 B   07/16/2019    9      690      304      948     56     14     78
 B   07/16/2019    8      819      185      699     33     81     83
 B   07/18/2019    1      580      210      847     51     64     87

I want to group the df by ID and Date, aggregate the XXX column by the maximum value, and aggregate 123_Var, 456_Var, 789_Var columns by the minimum value.

*** Note: The df contains many of these columns. The shape is: {some int}_Var.**

This is the current code I've started to write:

df = (df.groupby([&#39;ID&#39;,&#39;Date&#39;], as_index=False)
        .agg({&#39;XXX&#39;:&#39;max&#39;, list(df.filter(regex=&#39;_Var&#39;)): &#39;min&#39;}))

Expected result:

ID         Date  XXX  123_Var  456_Var  789_Var
 A  07/16/2019     9      135      551      313
 A  07/18/2019     8      259      176      674
 B   07/16/2019    9      690      185      699
 B   07/18/2019    1      580      210      847

答案1

得分: 2

使用dict.fromkeys创建动态字典，然后与{'XXX':'max'}字典合并，并传递给GroupBy.agg：

d = dict.fromkeys(df.filter(regex='_Var').columns, 'min')
df = df.groupby(['ID','Date'], as_index=False).agg({'XXX':'max', **d})
print (df)

输出结果如下：

  ID        Date  XXX  123_Var  456_Var  789_Var
0  A  07/16/2019    9      135      551      313
1  A  07/18/2019    8      259      176      674
2  B  07/16/2019    9      690      185      699
3  B  07/18/2019    1      580      210      847

英文:

Create dictionary dynamic with dict.fromkeys and then merge it with {'XXX':'max'} dict and pass to GroupBy.agg:

d = dict.fromkeys(df.filter(regex=&#39;_Var&#39;).columns, &#39;min&#39;)
df = df.groupby([&#39;ID&#39;,&#39;Date&#39;], as_index=False).agg({**{&#39;XXX&#39;:&#39;max&#39;}, **d})
print (df)
  ID        Date  XXX  123_Var  456_Var  789_Var
0  A  07/16/2019    9      135      551      313
1  A  07/18/2019    8      259      176      674
2  B  07/16/2019    9      690      185      699
3  B  07/18/2019    1      580      210      847

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何聚合多列 – Pandas

问题

答案1

基于一个字符串聚合列合并两个DataFrame

使用pandas-gbq.to_gbq添加表格描述。

如何根据数据框中的模式变化删除重复的行？

.drop(columns=[]) 在 CSV 和数据框中存在列时返回 KeyError。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。