在pandas中,基于另一列中子组数值的总和添加一列。

huangapple go评论63阅读模式
英文:

Add a column in pandas based on sum of the subgroup values in another column

问题

以下是您数据框的简化版本(数据框中的人数远远多于3人):

df = pd.DataFrame({'Person':['John','David','Mary','John','David','Mary'],
               'Sales':[10,15,20,11,12,18],
               })

我想要在这个数据框中添加一个名为"Total"的列,该列是每个人的总销售额之和。

要实现这一目标,您可以尝试以下方法:

df['Total'] = df.groupby('Person')['Sales'].transform('sum')

这将为数据框添加一个名为"Total"的新列,其中包含每个人的总销售额。

英文:

Here is a simplified version of my dataframe (the number of persons in my dataframe is way more than 3):

df = pd.DataFrame({'Person':['John','David','Mary','John','David','Mary'],
               'Sales':[10,15,20,11,12,18],
               })
  Person  Sales
0   John     10
1  David     15
2   Mary     20
3   John     11
4  David     12
5   Mary     18

I would like to add a column "Total" to this data frame, which is the sum of total sales per person

  Person  Sales  Total
0   John     10     21
1  David     15     27
2   Mary     20     38
3   John     11     21
4  David     12     27
5   Mary     18     38

What would be the easiest way to achieve this?

I have tried

df.groupby('Person').sum()

but the shape of the output is not congruent with the shape of df.

        Sales
Person       
David      27
John       21
Mary       38

答案1

得分: 2

以下是翻译好的内容:

最简单的方法是使用pandas的groupby和sum函数。

df['Total'] = df.groupby('Person')['Sales'].sum()

这将在数据框中添加一个列,显示每个人的总销售额。

英文:

The easiest way to achieve this is by using the pandas groupby and sum functions.

df['Total'] = df.groupby('Person')['Sales'].sum()

This will add a column to the dataframe with the total sales per person.

答案2

得分: 2

你需要的是transform方法,它可以在每个分组上应用一个函数:

df['Total'] = df.groupby('Person')['Sales'].transform(sum)

它的输出如下所示:

      Person  Sales  Total
    0   John     10     21
    1  David     15     27
    2   Mary     20     38
    3   John     11     21
    4  David     12     27
    5   Mary     18     38
英文:

What you want is the transform method which can apply a function on each group:

df['Total'] = df.groupby('Person')['Sales'].transform(sum)

It gives as expected:

  Person  Sales  Total
0   John     10     21
1  David     15     27
2   Mary     20     38
3   John     11     21
4  David     12     27
5   Mary     18     38

答案3

得分: 0

你的数据框中的'Persons'列包含重复值<br>
无法通过groupby方法将新列应用于该列

我建议基于销售总额创建一个新的数据框<br>
以下代码将帮助你实现这一点

newDf = pd.DataFrame(df.groupby('Person')['Sales'].sum()).reset_index()

这将创建一个包含'Person'和'sales'列的新数据框<br>

英文:

your 'Persons' column in the dataframe contains repeated values<br>
it is not possible to apply a new column to this via groupby

I would suggest making a new dataframe based on sales sum<br>
The below code will help you with that

newDf = pd.DataFrame(df.groupby(&#39;Person&#39;)[&#39;Sales&#39;].sum()).reset_index()

This will create a new dataframe with 'Person' and 'sales' as columns.<br>

huangapple
  • 本文由 发表于 2023年2月8日 16:11:43
  • 转载请务必保留本文链接:https://go.coder-hub.com/75382884.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定