在pandas中,基于另一列中子组数值的总和添加一列。

huangapple go评论89阅读模式
英文:

Add a column in pandas based on sum of the subgroup values in another column

问题

以下是您数据框的简化版本(数据框中的人数远远多于3人):

  1. df = pd.DataFrame({'Person':['John','David','Mary','John','David','Mary'],
  2. 'Sales':[10,15,20,11,12,18],
  3. })

我想要在这个数据框中添加一个名为"Total"的列,该列是每个人的总销售额之和。

要实现这一目标,您可以尝试以下方法:

  1. df['Total'] = df.groupby('Person')['Sales'].transform('sum')

这将为数据框添加一个名为"Total"的新列,其中包含每个人的总销售额。

英文:

Here is a simplified version of my dataframe (the number of persons in my dataframe is way more than 3):

  1. df = pd.DataFrame({'Person':['John','David','Mary','John','David','Mary'],
  2. 'Sales':[10,15,20,11,12,18],
  3. })
  1. Person Sales
  2. 0 John 10
  3. 1 David 15
  4. 2 Mary 20
  5. 3 John 11
  6. 4 David 12
  7. 5 Mary 18

I would like to add a column "Total" to this data frame, which is the sum of total sales per person

  1. Person Sales Total
  2. 0 John 10 21
  3. 1 David 15 27
  4. 2 Mary 20 38
  5. 3 John 11 21
  6. 4 David 12 27
  7. 5 Mary 18 38

What would be the easiest way to achieve this?

I have tried

  1. df.groupby('Person').sum()

but the shape of the output is not congruent with the shape of df.

  1. Sales
  2. Person
  3. David 27
  4. John 21
  5. Mary 38

答案1

得分: 2

以下是翻译好的内容:

最简单的方法是使用pandas的groupby和sum函数。

  1. df['Total'] = df.groupby('Person')['Sales'].sum()

这将在数据框中添加一个列,显示每个人的总销售额。

英文:

The easiest way to achieve this is by using the pandas groupby and sum functions.

  1. df['Total'] = df.groupby('Person')['Sales'].sum()

This will add a column to the dataframe with the total sales per person.

答案2

得分: 2

你需要的是transform方法,它可以在每个分组上应用一个函数:

  1. df['Total'] = df.groupby('Person')['Sales'].transform(sum)

它的输出如下所示:

  1. Person Sales Total
  2. 0 John 10 21
  3. 1 David 15 27
  4. 2 Mary 20 38
  5. 3 John 11 21
  6. 4 David 12 27
  7. 5 Mary 18 38
英文:

What you want is the transform method which can apply a function on each group:

  1. df['Total'] = df.groupby('Person')['Sales'].transform(sum)

It gives as expected:

  1. Person Sales Total
  2. 0 John 10 21
  3. 1 David 15 27
  4. 2 Mary 20 38
  5. 3 John 11 21
  6. 4 David 12 27
  7. 5 Mary 18 38

答案3

得分: 0

你的数据框中的'Persons'列包含重复值<br>
无法通过groupby方法将新列应用于该列

我建议基于销售总额创建一个新的数据框<br>
以下代码将帮助你实现这一点

  1. newDf = pd.DataFrame(df.groupby('Person')['Sales'].sum()).reset_index()

这将创建一个包含'Person'和'sales'列的新数据框<br>

英文:

your 'Persons' column in the dataframe contains repeated values<br>
it is not possible to apply a new column to this via groupby

I would suggest making a new dataframe based on sales sum<br>
The below code will help you with that

  1. newDf = pd.DataFrame(df.groupby(&#39;Person&#39;)[&#39;Sales&#39;].sum()).reset_index()

This will create a new dataframe with 'Person' and 'sales' as columns.<br>

huangapple
  • 本文由 发表于 2023年2月8日 16:11:43
  • 转载请务必保留本文链接:https://go.coder-hub.com/75382884.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定