在使用 pandas 进行分组后出现的计算问题。

huangapple go评论68阅读模式
英文:

Calculation problem after pandas grouping

问题

在pandas中,要实现按日期分组,在每个日期组中,将列A和B的每一行相乘,然后相加,再除以日期组中所有B列的总和,可以尝试以下方式:

(df.groupby('date')
    .apply(lambda group: (group['A'] * group['B']).sum() / group['B'].sum())
    .reset_index(name='result'))

这将返回一个包含日期、结果的DataFrame。

英文:

In pandas, how to achieve, grouped by date, in each date group, each row of column A and B multiplied and then summed, and then divided by the sum of all B columns in the date group.

I have tried:

(df.groupby('date')['A','B']
    .transform(lambda x: (x['A'] * x['B']).sum())
    .div(df.groupby('date')['B'].agg('sum')))

and:

(df.groupby('date')
    .transform(lambda x: (x['A'] * x['B']).sum())
    .div(df.groupby('date')['B'].agg('sum')))

both showed:

> KeyError: 'A'

答案1

得分: 1

你应该使用 .apply() 而不是 .transform(),如下所示:

df.groupby('date').apply(lambda x: (x['A'] * x['B']).sum() / x['B'].sum())

通过使用 .apply()lambda 函数中的 x 变量将被识别为代表每个分组的 DataFrame 对象,可以分别索引各个列。

.transform() 方法只将 x 视为代表一次仅一个列的 Series 对象。

来自这个答案

`apply` 和 `transform` 之间的两个主要区别

transformapply groupby 方法之间有两个主要区别。

  • 输入:
    • apply 隐式地将每个组的所有列作为DataFrame传递给自定义函数。
    • transform 将每个组的每列分别作为Series传递给自定义函数。
  • 输出:
    • 传递给**apply的自定义函数可以返回标量,或Series或DataFrame(甚至是numpy数组或列表)。
    • 传递给**transform的自定义函数必须返回一个序列**(长度与组相同的一维Series、数组或列表)。

因此,transform仅逐一处理一列Series,而apply一次处理整个DataFrame。

示例

示例数据:

import pandas as pd

data = {
    'date': ['01/01/1999', '02/01/1999', '03/01/1999', '03/01/1999'],
    'A': [2, 4, 7, 4],
    'B': [5, 7, 9, 6]
}

df = pd.DataFrame(data)
print(df)
         date  A  B
0  01/01/1999  2  5
1  02/01/1999  4  7
2  03/01/1999  7  9
3  03/01/1999  4  6

对于 03/01/1999 的计算将是:

((7 * 9) + (4 * 6)) / (9 + 6) # = 5.8

使用 .apply() 计算每个日期组的结果:

df_ab_calc = df.groupby('date').apply(lambda x: (x['A'] * x['B']).sum() / x['B'].sum())
print(df_ab_calc)
date
01/01/1999    2.0
02/01/1999    4.0
03/01/1999    5.8
英文:

You should use .apply() instead of .transform(), as follows:

df.groupby('date').apply(lambda x: (x['A'] * x['B']).sum() / x['B'].sum())

By using .apply(), the x variables in the lambda function will be recognised as DataFrame objects representing each group, which can each be indexed for individual columns.

The .transform() method only treats x as a Series object representing only one column at a time.

From this answer:
<blockquote>
<h3>Two major differences between apply and transform</h3>

There are two major differences between the transform and apply groupby methods.

  • Input:
    • apply implicitly passes all the columns for each group as a DataFrame to the custom function.
    • while transform passes each column for each group individually as a Series to the custom function.
  • Output:
    • The custom function passed to apply can return a scalar, or a Series or DataFrame (or numpy array or even list).
    • The custom function passed to transform must return a sequence (a one dimensional Series, array or list) the same length as the group.

So, transform works on just one Series at a time and apply works on the entire DataFrame at once.
</blockquote>

Example

Sample data:

import pandas as pd

data = {
    &#39;date&#39;: [&#39;01/01/1999&#39;, &#39;02/01/1999&#39;, &#39;03/01/1999&#39;, &#39;03/01/1999&#39;],
    &#39;A&#39;: [2, 4, 7, 4],
    &#39;B&#39;: [5, 7, 9, 6]
}

df = pd.DataFrame(data)
print(df)
         date  A  B
0  01/01/1999  2  5
1  02/01/1999  4  7
2  03/01/1999  7  9
3  03/01/1999  4  6

The calculation for &#39;03/01/1999&#39; would be:

((7 * 9) + (4 * 6)) / (9 + 6) # = 5.8

Calculation for each date group using .apply():

df_ab_calc = df.groupby(&#39;date&#39;).apply(lambda x: (x[&#39;A&#39;] * x[&#39;B&#39;]).sum() / x[&#39;B&#39;].sum())
print(df_ab_calc)
date
01/01/1999    2.0
02/01/1999    4.0
03/01/1999    5.8

huangapple
  • 本文由 发表于 2023年2月26日 20:10:10
  • 转载请务必保留本文链接:https://go.coder-hub.com/75571871.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定