2023年2月8日 16:11:43go评论89阅读模式

英文:

Add a column in pandas based on sum of the subgroup values in another column

问题

以下是您数据框的简化版本（数据框中的人数远远多于3人）：

df = pd.DataFrame({'Person':['John','David','Mary','John','David','Mary'],
               'Sales':[10,15,20,11,12,18],
               })

我想要在这个数据框中添加一个名为"Total"的列，该列是每个人的总销售额之和。

要实现这一目标，您可以尝试以下方法：

df['Total'] = df.groupby('Person')['Sales'].transform('sum')

这将为数据框添加一个名为"Total"的新列，其中包含每个人的总销售额。

英文:

Here is a simplified version of my dataframe (the number of persons in my dataframe is way more than 3):

df = pd.DataFrame({&#39;Person&#39;:[&#39;John&#39;,&#39;David&#39;,&#39;Mary&#39;,&#39;John&#39;,&#39;David&#39;,&#39;Mary&#39;],
               &#39;Sales&#39;:[10,15,20,11,12,18],
               })

  Person  Sales
0   John     10
1  David     15
2   Mary     20
3   John     11
4  David     12
5   Mary     18

I would like to add a column "Total" to this data frame, which is the sum of total sales per person

  Person  Sales  Total
0   John     10     21
1  David     15     27
2   Mary     20     38
3   John     11     21
4  David     12     27
5   Mary     18     38

What would be the easiest way to achieve this?

I have tried

df.groupby(&#39;Person&#39;).sum()

but the shape of the output is not congruent with the shape of df.

        Sales
Person       
David      27
John       21
Mary       38

答案1

得分: 2

以下是翻译好的内容：

最简单的方法是使用pandas的groupby和sum函数。

df['Total'] = df.groupby('Person')['Sales'].sum()

这将在数据框中添加一个列，显示每个人的总销售额。

英文:

The easiest way to achieve this is by using the pandas groupby and sum functions.

df[&#39;Total&#39;] = df.groupby(&#39;Person&#39;)[&#39;Sales&#39;].sum()

This will add a column to the dataframe with the total sales per person.

答案2

得分: 2

你需要的是transform方法，它可以在每个分组上应用一个函数：

df['Total'] = df.groupby('Person')['Sales'].transform(sum)

它的输出如下所示：

      Person  Sales  Total
    0   John     10     21
    1  David     15     27
    2   Mary     20     38
    3   John     11     21
    4  David     12     27
    5   Mary     18     38

英文:

What you want is the transform method which can apply a function on each group:

df[&#39;Total&#39;] = df.groupby(&#39;Person&#39;)[&#39;Sales&#39;].transform(sum)

It gives as expected:

  Person  Sales  Total
0   John     10     21
1  David     15     27
2   Mary     20     38
3   John     11     21
4  David     12     27
5   Mary     18     38

答案3

得分: 0

你的数据框中的'Persons'列包含重复值 
无法通过groupby方法将新列应用于该列

我建议基于销售总额创建一个新的数据框 
以下代码将帮助你实现这一点

newDf = pd.DataFrame(df.groupby('Person')['Sales'].sum()).reset_index()

这将创建一个包含'Person'和'sales'列的新数据框

英文:

your 'Persons' column in the dataframe contains repeated values 
it is not possible to apply a new column to this via groupby

I would suggest making a new dataframe based on sales sum 
The below code will help you with that

newDf = pd.DataFrame(df.groupby(&#39;Person&#39;)[&#39;Sales&#39;].sum()).reset_index()

This will create a new dataframe with 'Person' and 'sales' as columns.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

在pandas中，基于另一列中子组数值的总和添加一列。

问题

答案1

答案2

答案3

应用Groupby和np.where函数来检测模式。

Google cloud SDK drops a warning on macOS Catalina: Executing a script that is loading libcrypto in an unsafe way

无法使for循环在使用Beautiful Soup 4解析HTML时工作

用Python如何根据特定变量过滤文件夹目录？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。