英文:
Add a column in pandas based on sum of the subgroup values in another column
问题
以下是您数据框的简化版本(数据框中的人数远远多于3人):
df = pd.DataFrame({'Person':['John','David','Mary','John','David','Mary'],
'Sales':[10,15,20,11,12,18],
})
我想要在这个数据框中添加一个名为"Total"的列,该列是每个人的总销售额之和。
要实现这一目标,您可以尝试以下方法:
df['Total'] = df.groupby('Person')['Sales'].transform('sum')
这将为数据框添加一个名为"Total"的新列,其中包含每个人的总销售额。
英文:
Here is a simplified version of my dataframe (the number of persons in my dataframe is way more than 3):
df = pd.DataFrame({'Person':['John','David','Mary','John','David','Mary'],
'Sales':[10,15,20,11,12,18],
})
Person Sales
0 John 10
1 David 15
2 Mary 20
3 John 11
4 David 12
5 Mary 18
I would like to add a column "Total" to this data frame, which is the sum of total sales per person
Person Sales Total
0 John 10 21
1 David 15 27
2 Mary 20 38
3 John 11 21
4 David 12 27
5 Mary 18 38
What would be the easiest way to achieve this?
I have tried
df.groupby('Person').sum()
but the shape of the output is not congruent with the shape of df
.
Sales
Person
David 27
John 21
Mary 38
答案1
得分: 2
以下是翻译好的内容:
最简单的方法是使用pandas的groupby和sum函数。
df['Total'] = df.groupby('Person')['Sales'].sum()
这将在数据框中添加一个列,显示每个人的总销售额。
英文:
The easiest way to achieve this is by using the pandas groupby and sum functions.
df['Total'] = df.groupby('Person')['Sales'].sum()
This will add a column to the dataframe with the total sales per person.
答案2
得分: 2
你需要的是transform
方法,它可以在每个分组上应用一个函数:
df['Total'] = df.groupby('Person')['Sales'].transform(sum)
它的输出如下所示:
Person Sales Total
0 John 10 21
1 David 15 27
2 Mary 20 38
3 John 11 21
4 David 12 27
5 Mary 18 38
英文:
What you want is the transform
method which can apply a function on each group:
df['Total'] = df.groupby('Person')['Sales'].transform(sum)
It gives as expected:
Person Sales Total
0 John 10 21
1 David 15 27
2 Mary 20 38
3 John 11 21
4 David 12 27
5 Mary 18 38
答案3
得分: 0
你的数据框中的'Persons'列包含重复值<br>
无法通过groupby方法将新列应用于该列
我建议基于销售总额创建一个新的数据框<br>
以下代码将帮助你实现这一点
newDf = pd.DataFrame(df.groupby('Person')['Sales'].sum()).reset_index()
这将创建一个包含'Person'和'sales'列的新数据框<br>
英文:
your 'Persons' column in the dataframe contains repeated values<br>
it is not possible to apply a new column to this via groupby
I would suggest making a new dataframe based on sales sum<br>
The below code will help you with that
newDf = pd.DataFrame(df.groupby('Person')['Sales'].sum()).reset_index()
This will create a new dataframe with 'Person' and 'sales' as columns.<br>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论