英文:
Calculation problem after pandas grouping
问题
在pandas中,要实现按日期分组,在每个日期组中,将列A和B的每一行相乘,然后相加,再除以日期组中所有B列的总和,可以尝试以下方式:
(df.groupby('date')
.apply(lambda group: (group['A'] * group['B']).sum() / group['B'].sum())
.reset_index(name='result'))
这将返回一个包含日期、结果的DataFrame。
英文:
In pandas, how to achieve, grouped by date, in each date group, each row of column A and B multiplied and then summed, and then divided by the sum of all B columns in the date group.
I have tried:
(df.groupby('date')['A','B']
.transform(lambda x: (x['A'] * x['B']).sum())
.div(df.groupby('date')['B'].agg('sum')))
and:
(df.groupby('date')
.transform(lambda x: (x['A'] * x['B']).sum())
.div(df.groupby('date')['B'].agg('sum')))
both showed:
> KeyError: 'A'
答案1
得分: 1
你应该使用 .apply() 而不是 .transform(),如下所示:
df.groupby('date').apply(lambda x: (x['A'] * x['B']).sum() / x['B'].sum())
通过使用 .apply(),lambda 函数中的 x 变量将被识别为代表每个分组的 DataFrame 对象,可以分别索引各个列。
.transform() 方法只将 x 视为代表一次仅一个列的 Series 对象。
来自这个答案:
`apply` 和 `transform` 之间的两个主要区别
transform和applygroupby 方法之间有两个主要区别。
- 输入:
apply隐式地将每个组的所有列作为DataFrame传递给自定义函数。- 而
transform将每个组的每列分别作为Series传递给自定义函数。- 输出:
- 传递给**
apply的自定义函数可以返回标量,或Series或DataFrame(甚至是numpy数组或列表)。- 传递给**
transform的自定义函数必须返回一个序列**(长度与组相同的一维Series、数组或列表)。因此,
transform仅逐一处理一列Series,而apply一次处理整个DataFrame。
示例
示例数据:
import pandas as pd
data = {
'date': ['01/01/1999', '02/01/1999', '03/01/1999', '03/01/1999'],
'A': [2, 4, 7, 4],
'B': [5, 7, 9, 6]
}
df = pd.DataFrame(data)
print(df)
date A B
0 01/01/1999 2 5
1 02/01/1999 4 7
2 03/01/1999 7 9
3 03/01/1999 4 6
对于 03/01/1999 的计算将是:
((7 * 9) + (4 * 6)) / (9 + 6) # = 5.8
使用 .apply() 计算每个日期组的结果:
df_ab_calc = df.groupby('date').apply(lambda x: (x['A'] * x['B']).sum() / x['B'].sum())
print(df_ab_calc)
date
01/01/1999 2.0
02/01/1999 4.0
03/01/1999 5.8
英文:
You should use .apply() instead of .transform(), as follows:
df.groupby('date').apply(lambda x: (x['A'] * x['B']).sum() / x['B'].sum())
By using .apply(), the x variables in the lambda function will be recognised as DataFrame objects representing each group, which can each be indexed for individual columns.
The .transform() method only treats x as a Series object representing only one column at a time.
From this answer:
<blockquote>
<h3>Two major differences between apply and transform</h3>
There are two major differences between the transform and apply groupby methods.
- Input:
applyimplicitly passes all the columns for each group as a DataFrame to the custom function.- while
transformpasses each column for each group individually as a Series to the custom function.
- Output:
- The custom function passed to
applycan return a scalar, or a Series or DataFrame (or numpy array or even list). - The custom function passed to
transformmust return a sequence (a one dimensional Series, array or list) the same length as the group.
- The custom function passed to
So, transform works on just one Series at a time and apply works on the entire DataFrame at once.
</blockquote>
Example
Sample data:
import pandas as pd
data = {
'date': ['01/01/1999', '02/01/1999', '03/01/1999', '03/01/1999'],
'A': [2, 4, 7, 4],
'B': [5, 7, 9, 6]
}
df = pd.DataFrame(data)
print(df)
date A B
0 01/01/1999 2 5
1 02/01/1999 4 7
2 03/01/1999 7 9
3 03/01/1999 4 6
The calculation for '03/01/1999' would be:
((7 * 9) + (4 * 6)) / (9 + 6) # = 5.8
Calculation for each date group using .apply():
df_ab_calc = df.groupby('date').apply(lambda x: (x['A'] * x['B']).sum() / x['B'].sum())
print(df_ab_calc)
date
01/01/1999 2.0
02/01/1999 4.0
03/01/1999 5.8
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。


评论