2023年2月26日 20:10:10go评论105阅读模式

英文:

Calculation problem after pandas grouping

问题

在pandas中，要实现按日期分组，在每个日期组中，将列A和B的每一行相乘，然后相加，再除以日期组中所有B列的总和，可以尝试以下方式：

(df.groupby('date')
    .apply(lambda group: (group['A'] * group['B']).sum() / group['B'].sum())
    .reset_index(name='result'))

这将返回一个包含日期、结果的DataFrame。

英文:

In pandas, how to achieve, grouped by date, in each date group, each row of column A and B multiplied and then summed, and then divided by the sum of all B columns in the date group.

I have tried:

(df.groupby(&#39;date&#39;)[&#39;A&#39;,&#39;B&#39;]
    .transform(lambda x: (x[&#39;A&#39;] * x[&#39;B&#39;]).sum())
    .div(df.groupby(&#39;date&#39;)[&#39;B&#39;].agg(&#39;sum&#39;)))

and:

(df.groupby(&#39;date&#39;)
    .transform(lambda x: (x[&#39;A&#39;] * x[&#39;B&#39;]).sum())
    .div(df.groupby(&#39;date&#39;)[&#39;B&#39;].agg(&#39;sum&#39;)))

both showed:

> KeyError: 'A'

答案1

得分: 1

你应该使用 .apply() 而不是 .transform()，如下所示：

df.groupby('date').apply(lambda x: (x['A'] * x['B']).sum() / x['B'].sum())

通过使用 .apply()，lambda 函数中的 x 变量将被识别为代表每个分组的 DataFrame 对象，可以分别索引各个列。

.transform() 方法只将 x 视为代表一次仅一个列的 Series 对象。

来自这个答案：

`apply` 和 `transform` 之间的两个主要区别

transform 和 apply groupby 方法之间有两个主要区别。

输入:

apply 隐式地将每个组的所有列作为DataFrame传递给自定义函数。

而 transform 将每个组的每列分别作为Series传递给自定义函数。

输出:

传递给**apply的自定义函数可以返回标量，或Series或DataFrame（甚至是numpy数组或列表）。

传递给**transform的自定义函数必须返回一个序列**（长度与组相同的一维Series、数组或列表）。

因此，transform仅逐一处理一列Series，而apply一次处理整个DataFrame。

示例

示例数据：

import pandas as pd
data = {
    'date': ['01/01/1999', '02/01/1999', '03/01/1999', '03/01/1999'],
    'A': [2, 4, 7, 4],
    'B': [5, 7, 9, 6]
}
df = pd.DataFrame(data)
print(df)

         date  A  B
0  01/01/1999  2  5
1  02/01/1999  4  7
2  03/01/1999  7  9
3  03/01/1999  4  6

对于 03/01/1999 的计算将是：

((7 * 9) + (4 * 6)) / (9 + 6) # = 5.8

使用 .apply() 计算每个日期组的结果：

df_ab_calc = df.groupby('date').apply(lambda x: (x['A'] * x['B']).sum() / x['B'].sum())
print(df_ab_calc)

date
01/01/1999    2.0
02/01/1999    4.0
03/01/1999    5.8

英文:

You should use .apply() instead of .transform(), as follows:

df.groupby(&#39;date&#39;).apply(lambda x: (x[&#39;A&#39;] * x[&#39;B&#39;]).sum() / x[&#39;B&#39;].sum())

By using .apply(), the x variables in the lambda function will be recognised as DataFrame objects representing each group, which can each be indexed for individual columns.

The .transform() method only treats x as a Series object representing only one column at a time.

From this answer:
<blockquote>
<h3>Two major differences between apply and transform</h3>

There are two major differences between the transform and apply groupby methods.

Input:
- apply implicitly passes all the columns for each group as a DataFrame to the custom function.
- while transform passes each column for each group individually as a Series to the custom function.
Output:
- The custom function passed to apply can return a scalar, or a Series or DataFrame (or numpy array or even list).
- The custom function passed to transform must return a sequence (a one dimensional Series, array or list) the same length as the group.

So, transform works on just one Series at a time and apply works on the entire DataFrame at once.
</blockquote>

Example

Sample data:

import pandas as pd
data = {
    &#39;date&#39;: [&#39;01/01/1999&#39;, &#39;02/01/1999&#39;, &#39;03/01/1999&#39;, &#39;03/01/1999&#39;],
    &#39;A&#39;: [2, 4, 7, 4],
    &#39;B&#39;: [5, 7, 9, 6]
}
df = pd.DataFrame(data)
print(df)

         date  A  B
0  01/01/1999  2  5
1  02/01/1999  4  7
2  03/01/1999  7  9
3  03/01/1999  4  6

The calculation for '03/01/1999' would be:

((7 * 9) + (4 * 6)) / (9 + 6) # = 5.8

Calculation for each date group using .apply():

df_ab_calc = df.groupby(&#39;date&#39;).apply(lambda x: (x[&#39;A&#39;] * x[&#39;B&#39;]).sum() / x[&#39;B&#39;].sum())
print(df_ab_calc)

date
01/01/1999    2.0
02/01/1999    4.0
03/01/1999    5.8

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

在使用 pandas 进行分组后出现的计算问题。

问题

答案1

`apply` 和 `transform` 之间的两个主要区别

示例

Example

如何在列表列中获取元素索引，如果元素在不同列中指定

Pandas：返回第一行，其中列值满足与值列表的条件相符。

将样式化的数据框导出到Excel（背景颜色）

如何使用Python脚本将点分隔的字符串转换为YAML格式。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。