2023年8月10日 20:08:44go评论157阅读模式

英文:

Pandas DataFrame.groupby().agg() issue

问题

我尝试使用pandas的groupby()和agg()方法，但遇到了一些问题。

我需要对某些列进行求和（使用agg({'column': sum}）），对其他列进行加权平均，其中权重在一列中，还需要对剩余的列进行平均值计算。

我想在列C上进行加权平均，权重是列B中的值。

然后，对于在agg()中未指定的列（F、G等等，我有很多这些列），我想应用.mean()方法，最终保留所有列。

您能帮助我吗？
谢谢

我尝试过以下代码：

df = df.groupby(['Date', 'Hour']).agg({'A': 'sum',
                                       'B': 'sum',
                                       'C': 加权平均?, 
                                       'D': 'sum',
                                       'E': 'mean'}).reset_index()

但我不知道如何正确编写它。

英文:

I'm trying to use pandas groupby().agg() but I have some issues.

Date	           Year Month Week Hour A	B 	    C	D	      E   F	 G ..
mercoled&#236; 5 aprile 2023	4	14	5	6	6	144,79	0	868,74 	  6	  36
mercoled&#236; 5 aprile 2023	4	14	6	214	214	144,79	0	30985,0   6	  214
mercoled&#236; 5 aprile 2023	4	14	6	6	6	144,79	0	868,74	  6	  36
mercoled&#236; 5 aprile 2023	4	14	7	220	220	180,26	0	39657,2	  220 48
mercoled&#236; 5 aprile 2023	4	14	7	100	100	180,26	146	18026	  100 10
mercoled&#236; 5 aprile 2023	4	14	8	220	220	225,2	0	49544	  220 48
mercoled&#236; 5 aprile 2023	4	14	8	57	57	2,2 	146	129,38    6	  57

I have to sum some columns (and that goes with agg({'column':sum}), do a weighted average on others, with the weights being in a column, and have to mean() the remaining columns.

df = df.groupby([&#39;Date&#39;,&#39;Hour&#39;]).agg({&#39;A&#39;:&#39;sum&#39;, 
                                      &#39;B&#39;:&#39;sum&#39;,
                                      &#39;C&#39;: weighted average?, 
                                      &#39;D&#39;:&#39;sum&#39;, 
                                      &#39;E&#39;:&#39;mean&#39;}).reset_index()

I wanna do the weighted average on C, and the weights are the values in column B.

Then, for the non-indicated columns in .agg() (F,G, and so on, I have many of them) I wanna apply the method .mean(), keeping all the columns in the end.

Can you help me?
Thank you

Tried this:

df = df.groupby([&#39;Date&#39;,&#39;Hour&#39;]).agg({&#39;A&#39;:&#39;sum&#39;, 
                                      &#39;B&#39;:&#39;sum&#39;,
                                      &#39;C&#39;: weighted average?, 
                                      &#39;D&#39;:&#39;sum&#39;, 
                                      &#39;E&#39;:&#39;mean&#39;}).reset_index()

But I don't know how to properly code it

答案1

得分: 1

以下是翻译好的部分：

import pandas as pd
data = {
    "Date": ["mercoled&#236; 5 aprile 2023", "mercoled&#236; 5 aprile 2023", "mercoled&#236; 5 aprile 2023", "mercoled&#236; 5 aprile 2023", "mercoled&#236; 5 aprile 2023", "mercoled&#236; 5 aprile 2023"],
    "Year": [2023, 2023, 2023, 2023, 2023, 2023],
    "Month": [4, 4, 4, 4, 4, 4],
    "Week": [14, 14, 14, 14, 14, 14],
    "Hour": [5, 6, 6, 7, 7, 8],
    "A": [6, 214, 6, 220, 100, 220],
    "B": [6, 214, 6, 220, 100, 57],
    "C": [144.79, 144.79, 144.79, 180.26, 180.26, 2.2],
    "D": [0, 0, 0, 0, 146, 146],
    "E": [868.74, 30985.0, 868.74, 39657.2, 18026.0, 129.38],
    "F": [6, 214, 36, 48, 10, 57],
    "G": [36, 214, 36, 48, 10, 57],
}
df = pd.DataFrame(data)
print(df)
# Calculate sum of A, mean of B, and weighted mean of C using B as weights
result = df.groupby(['Date','Hour']).agg({
'A': 'sum', 
'B': 'sum',
'C': lambda x: (df['B'] * df['C']).sum() / df['B'].sum(),
'D': 'sum', 
'E': 'mean'
})
print(result.reset_index())

请注意，代码中的HTML实体（如"和'）没有被翻译，因为它们是代码的一部分，不需要翻译。

英文:

Something like this:

import pandas as pd
data = {
    &quot;Date&quot;: [&quot;mercoled&#236; 5 aprile 2023&quot;, &quot;mercoled&#236; 5 aprile 2023&quot;, &quot;mercoled&#236; 5 aprile 2023&quot;, &quot;mercoled&#236; 5 aprile 2023&quot;, &quot;mercoled&#236; 5 aprile 2023&quot;, &quot;mercoled&#236; 5 aprile 2023&quot;],
    &quot;Year&quot;: [2023, 2023, 2023, 2023, 2023, 2023],
    &quot;Month&quot;: [4, 4, 4, 4, 4, 4],
    &quot;Week&quot;: [14, 14, 14, 14, 14, 14],
    &quot;Hour&quot;: [5, 6, 6, 7, 7, 8],
    &quot;A&quot;: [6, 214, 6, 220, 100, 220],
    &quot;B&quot;: [6, 214, 6, 220, 100, 57],
    &quot;C&quot;: [144.79, 144.79, 144.79, 180.26, 180.26, 2.2],
    &quot;D&quot;: [0, 0, 0, 0, 146, 146],
    &quot;E&quot;: [868.74, 30985.0, 868.74, 39657.2, 18026.0, 129.38],
    &quot;F&quot;: [6, 214, 36, 48, 10, 57],
    &quot;G&quot;: [36, 214, 36, 48, 10, 57],
}
df = pd.DataFrame(data)
print(df)
# Calculate sum of A, mean of B, and weighted mean of C using B as weights
result = df.groupby([&#39;Date&#39;,&#39;Hour&#39;]).agg({
&#39;A&#39;: &#39;sum&#39;, 
&#39;B&#39;: &#39;sum&#39;,
&#39;C&#39;: lambda x: (df[&#39;B&#39;] * df[&#39;C&#39;]).sum() / df[&#39;B&#39;].sum(),
&#39;D&#39;: &#39;sum&#39;, 
&#39;E&#39;: &#39;mean&#39;
})
print(result.reset_index())
Date  Hour    A    B           C    D         E
0  mercoled&#236; 5 aprile 2023     5    6    6  150.134561    0    868.74
1  mercoled&#236; 5 aprile 2023     6  220  220  150.134561    0  15926.87
2  mercoled&#236; 5 aprile 2023     7  320  320  150.134561  146  28841.60
3  mercoled&#236; 5 aprile 2023     8  220   57  150.134561  146    129.38

答案2

得分: 0

不能直接使用agg来计算加权平均值，因为这需要两列*。

一种方法是在计算前/后进行预处理。加权平均值等于sum(C*B)/sum(B)：

out = (df.eval('C = C*B')
         .groupby(['Date', 'Hour'])
         .agg({'A': 'sum', 
               'B': 'sum',
               'C': 'sum', 
               'D': 'sum', 
               'E': 'mean'})
         .eval('C = C/B')
         .reset_index()
     )

*注意：如果您已经使用B/C计算不同的聚合值，您需要使用它们的副本。

要处理所有列，您可以使用一个字典：

d = {c: 'mean' for c in df.columns.difference(['Date', 'Hour'])}
for c in ['A', 'B', 'C', 'D']:
    d[c] = 'sum'
out = (df.eval('C = C*B')
         .groupby(['Date', 'Hour'], as_index=False)
         .agg(d)
         .eval('C = C/B')
     )

*您可以使用groupby.apply来计算加权平均值，但这应该作为单独的操作进行。

英文:

You cannot compute a weighted average with agg directly as this requires two columns*.

One way would be to pre-/post-process the computation. The weighted average is equal to sum(C*B)/sum(B):

out = (df.eval(&#39;C = C*B&#39;)
         .groupby([&#39;Date&#39;, &#39;Hour&#39;])
         .agg({&#39;A&#39;: &#39;sum&#39;, 
               &#39;B&#39;: &#39;sum&#39;,
               &#39;C&#39;: &#39;sum&#39;, 
               &#39;D&#39;: &#39;sum&#39;, 
               &#39;E&#39;: &#39;mean&#39;})
         .eval(&#39;C = C/B&#39;)
         .reset_index()
     )

NB. If you were already computing a different aggregation with B/C you would need to use copies of them.

To handle all columns you can use a dictionary:

d = {c: &#39;mean&#39; for c in df.columns.difference([&#39;Date&#39;, &#39;Hour&#39;])}
for c in [&#39;A&#39;, &#39;B&#39;, &#39;C&#39;, &#39;D&#39;]:
    d[c] = &#39;sum&#39;
out = (df.eval(&#39;C = C*B&#39;)
         .groupby([&#39;Date&#39;, &#39;Hour&#39;], as_index=False)
         .agg(d)
         .eval(&#39;C = C/B&#39;)
     )

* you can however compute the weighted average with groupby.apply, but this should be done as a separate operation.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Pandas DataFrame.groupby().agg() 问题

问题

答案1

答案2

sklearn.impute.SimpleImputer: 无法填充数据框列的最常见值

在列中筛选包含子字符串的pandas数据框。

如何使用另一个数组作为参考来填充一个数组为0（或NaN）？

webdriver-manager getting error as executable

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。