在使用 pandas 进行分组后出现的计算问题。

huangapple go评论105阅读模式
英文:

Calculation problem after pandas grouping

问题

在pandas中,要实现按日期分组,在每个日期组中,将列A和B的每一行相乘,然后相加,再除以日期组中所有B列的总和,可以尝试以下方式:

  1. (df.groupby('date')
  2. .apply(lambda group: (group['A'] * group['B']).sum() / group['B'].sum())
  3. .reset_index(name='result'))

这将返回一个包含日期、结果的DataFrame。

英文:

In pandas, how to achieve, grouped by date, in each date group, each row of column A and B multiplied and then summed, and then divided by the sum of all B columns in the date group.

I have tried:

  1. (df.groupby('date')['A','B']
  2. .transform(lambda x: (x['A'] * x['B']).sum())
  3. .div(df.groupby('date')['B'].agg('sum')))

and:

  1. (df.groupby('date')
  2. .transform(lambda x: (x['A'] * x['B']).sum())
  3. .div(df.groupby('date')['B'].agg('sum')))

both showed:

> KeyError: 'A'

答案1

得分: 1

你应该使用 .apply() 而不是 .transform(),如下所示:

  1. df.groupby('date').apply(lambda x: (x['A'] * x['B']).sum() / x['B'].sum())

通过使用 .apply()lambda 函数中的 x 变量将被识别为代表每个分组的 DataFrame 对象,可以分别索引各个列。

.transform() 方法只将 x 视为代表一次仅一个列的 Series 对象。

来自这个答案

`apply` 和 `transform` 之间的两个主要区别

transformapply groupby 方法之间有两个主要区别。

  • 输入:
    • apply 隐式地将每个组的所有列作为DataFrame传递给自定义函数。
    • transform 将每个组的每列分别作为Series传递给自定义函数。
  • 输出:
    • 传递给**apply的自定义函数可以返回标量,或Series或DataFrame(甚至是numpy数组或列表)。
    • 传递给**transform的自定义函数必须返回一个序列**(长度与组相同的一维Series、数组或列表)。

因此,transform仅逐一处理一列Series,而apply一次处理整个DataFrame。

示例

示例数据:

  1. import pandas as pd
  2. data = {
  3. 'date': ['01/01/1999', '02/01/1999', '03/01/1999', '03/01/1999'],
  4. 'A': [2, 4, 7, 4],
  5. 'B': [5, 7, 9, 6]
  6. }
  7. df = pd.DataFrame(data)
  8. print(df)
  1. date A B
  2. 0 01/01/1999 2 5
  3. 1 02/01/1999 4 7
  4. 2 03/01/1999 7 9
  5. 3 03/01/1999 4 6

对于 03/01/1999 的计算将是:

  1. ((7 * 9) + (4 * 6)) / (9 + 6) # = 5.8

使用 .apply() 计算每个日期组的结果:

  1. df_ab_calc = df.groupby('date').apply(lambda x: (x['A'] * x['B']).sum() / x['B'].sum())
  2. print(df_ab_calc)
  1. date
  2. 01/01/1999 2.0
  3. 02/01/1999 4.0
  4. 03/01/1999 5.8
英文:

You should use .apply() instead of .transform(), as follows:

  1. df.groupby('date').apply(lambda x: (x['A'] * x['B']).sum() / x['B'].sum())

By using .apply(), the x variables in the lambda function will be recognised as DataFrame objects representing each group, which can each be indexed for individual columns.

The .transform() method only treats x as a Series object representing only one column at a time.

From this answer:
<blockquote>
<h3>Two major differences between apply and transform</h3>

There are two major differences between the transform and apply groupby methods.

  • Input:
    • apply implicitly passes all the columns for each group as a DataFrame to the custom function.
    • while transform passes each column for each group individually as a Series to the custom function.
  • Output:
    • The custom function passed to apply can return a scalar, or a Series or DataFrame (or numpy array or even list).
    • The custom function passed to transform must return a sequence (a one dimensional Series, array or list) the same length as the group.

So, transform works on just one Series at a time and apply works on the entire DataFrame at once.
</blockquote>

Example

Sample data:

  1. import pandas as pd
  2. data = {
  3. &#39;date&#39;: [&#39;01/01/1999&#39;, &#39;02/01/1999&#39;, &#39;03/01/1999&#39;, &#39;03/01/1999&#39;],
  4. &#39;A&#39;: [2, 4, 7, 4],
  5. &#39;B&#39;: [5, 7, 9, 6]
  6. }
  7. df = pd.DataFrame(data)
  8. print(df)
  1. date A B
  2. 0 01/01/1999 2 5
  3. 1 02/01/1999 4 7
  4. 2 03/01/1999 7 9
  5. 3 03/01/1999 4 6

The calculation for &#39;03/01/1999&#39; would be:

  1. ((7 * 9) + (4 * 6)) / (9 + 6) # = 5.8

Calculation for each date group using .apply():

  1. df_ab_calc = df.groupby(&#39;date&#39;).apply(lambda x: (x[&#39;A&#39;] * x[&#39;B&#39;]).sum() / x[&#39;B&#39;].sum())
  2. print(df_ab_calc)
  1. date
  2. 01/01/1999 2.0
  3. 02/01/1999 4.0
  4. 03/01/1999 5.8

huangapple
  • 本文由 发表于 2023年2月26日 20:10:10
  • 转载请务必保留本文链接:https://go.coder-hub.com/75571871.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定