2023年2月23日 22:02:42go评论138阅读模式

英文:

Is there a way to calculate the running total across only a few columns (unique values only)?

问题

我正在尝试计算数据框中特定列的累积总和，只对唯一值进行计算感兴趣。

我下面有一个示例数据框：

姓名	产品	日期	地点	类型	销售	运费百分比	总费用
Tom	香蕉	01-01-2021	纽约	水果	120	0.01	1.2
Tom	苹果	01-01-2021	纽约	水果	120	0.01	1.2
Tom	香蕉	02-01-2021	德克萨斯	水果	420	0.01	4.2
Tom	香蕉	02-01-2021	德克萨斯	水果	120	0.01	1.2
Mat	香蕉	02-01-2021	纽约	水果	30	0.01	0.3

我想要一个“累积总计”列，但只考虑姓名和日期（作为分组列），并显示总费用列的唯一值之和。这将导致类似以下的结果：

姓名	产品	日期	地点	类型	销售	运费百分比	总费用	累积总计
Tom	香蕉	01-01-2021	纽约	水果	120	0.01	1.2	1.2
Tom	苹果	01-01-2021	纽约	水果	120	0.01	1.2	1.2
Tom	香蕉	02-01-2021	德克萨斯	水果	420	0.01	4.2	4.2
Tom	香蕉	02-01-2021	德克萨斯	水果	120	0.01	1.2	5.4
Mat	香蕉	02-01-2021	纽约	水果	30	0.01	0.3	0.3

我迷茫了，我还没有找到任何能给我这个结果的方法。

英文:

I am trying to calculate the running total across a few specific columns of my dataFrame and I am only interested in calculating using unique values.

I have below an example dataframe:

Name	Product	Date	Location	Type	Sales	Ship Fee %	Total Fee
Tom	Bananas	01-01-2021	NY	Fruit	120	0.01	1.2
Tom	Apples	01-01-2021	NY	Fruit	120	0.01	1.2
Tom	Bananas	02-01-2021	TX	Fruit	420	0.01	4.2
Tom	Bananas	02-01-2021	TX	Fruit	120	0.01	1.2
Mat	Bananas	02-01-2021	NY	Fruit	30	0.01	0.3

I want to have a Running Total column, but only considering the Name and Date (as groupBy columns) and showing the sum of the unique values of Total Fee column. That would result in something like this:

Name	Product	Date	Location	Type	Sales	Ship Fee %	Total Fee	Running Total
Tom	Bananas	01-01-2021	NY	Fruit	120	0.01	1.2	1.2
Tom	Apples	01-01-2021	NY	Fruit	120	0.01	1.2	1.2
Tom	Bananas	02-01-2021	TX	Fruit	420	0.01	4.2	4.2
Tom	Bananas	02-01-2021	TX	Fruit	120	0.01	1.2	5.4
Mat	Bananas	02-01-2021	NY	Fruit	30	0.01	0.3	0.3

I am lost -> I haven't been able to find anything that can give me this result.

答案1

得分: 2

以下是翻译好的内容：

Option 1: 使用groupby按“Name”和“Date”分组，然后仅对“Total Fee”的唯一值进行cumsum操作

df['Running Total'] = df.drop_duplicates(['Name', 'Date', 'Total Fee']).groupby(['Name', 'Date'])['Total Fee'].cumsum()
df['Running Total'] = df['Running Total'].fillna(df['Total Fee'])

Option 2: 使用groupby按“Name”、“Product”和“Date”分组，然后进行cumsum操作，给出每个人每天每种产品的累积总额。

df['Running Total'] = df.groupby(['Name', 'Product', 'Date'], as_index=False)['Total Fee'].cumsum()

测试和示例

给定这个数据框：

	Name	Product	Date	Location	Type	Sales	Ship Fee %	Total Fee
0	Tom	Bananas	01-01-2021	NY	Fruit	120	0.01	1.2
1	Tom	Apples	01-01-2021	NY	Fruit	120	0.01	1.2
2	Tom	Bananas	02-01-2021	TX	Fruit	420	0.01	4.2
3	Tom	Bananas	02-01-2021	TX	Fruit	120	0.01	1.2
4	Mat	Bananas	02-01-2021	NY	Fruit	30	0.01	0.3
5	Mat	Bananas	02-01-2021	NY	Fruit	50	0.01	0.3
6	Mat	Apples	03-01-2021	NY	Vegetable	80	0.02	1.6

Option 1 结果：

	Name	Product	Date	Location	Type	Sales	Ship Fee %	Total Fee	Running Total
0	Tom	Bananas	01-01-2021	NY	Fruit	120	0.01	1.2	1.2
1	Tom	Apples	01-01-2021	NY	Fruit	120	0.01	1.2	1.2
2	Tom	Bananas	02-01-2021	TX	Fruit	420	0.01	4.2	4.2
3	Tom	Bananas	02-01-2021	TX	Fruit	120	0.01	1.2	5.4
4	Mat	Bananas	02-01-2021	NY	Fruit	30	0.01	0.3	0.3
5	Mat	Bananas	02-01-2021	NY	Fruit	50	0.01	0.3	0.3
6	Mat	Apples	03-01-2021	NY	Vegetable	80	0.02	1.6	1.6

Option 2 结果：

	Name	Product	Date	Location	Type	Sales	Ship Fee %	Total Fee	Running Total
0	Tom	Bananas	01-01-2021	NY	Fruit	120	0.01	1.2	1.2
1	Tom	Apples	01-01-2021	NY	Fruit	120	0.01	1.2	1.2
2	Tom	Bananas	02-01-2021	TX	Fruit	420	0.01	4.2	4.2
3	Tom	Bananas	02-01-2021	TX	Fruit	120	0.01	1.2	5.4
4	Mat	Bananas	02-01-2021	NY	Fruit	30	0.01	0.3	0.3
5	Mat	Bananas	02-01-2021	NY	Fruit	50	0.01	0.3	0.6
6	Mat	Apples	03-01-2021	NY	Vegetable	80	0.02	1.6	1.6

英文:

I think this is what you are looking for:

Option 1: groupby "Name" and "Date" then cumsum only unique values for Total Fee

df[&#39;Running Total&#39;] = df.drop_duplicates([&#39;Name&#39;, &#39;Date&#39;, &#39;Total Fee&#39;]).groupby([&#39;Name&#39;, &#39;Date&#39;])[&#39;Total Fee&#39;].cumsum()
df[&#39;Running Total&#39;] = df[&#39;Running Total&#39;].fillna(df[&#39;Total Fee&#39;])

Option 2: groupby "Name", "Product", "Date". Then cumsum --> gives the accumulated sum for each product on each day for each person.

df[&#39;Running Total&#39;] = df.groupby([&#39;Name&#39;, &#39;Product&#39;,&#39;Date&#39;], as_index=False)[&#39;Total Fee&#39;].cumsum()

Testing and examples

Given this dataframe:

	Name	Product	Date	Location	Type	Sales	Ship Fee %	Total Fee
0	Tom	Bananas	01-01-2021	NY	Fruit	120	0.01	1.2
1	Tom	Apples	01-01-2021	NY	Fruit	120	0.01	1.2
2	Tom	Bananas	02-01-2021	TX	Fruit	420	0.01	4.2
3	Tom	Bananas	02-01-2021	TX	Fruit	120	0.01	1.2
4	Mat	Bananas	02-01-2021	NY	Fruit	30	0.01	0.3
5	Mat	Bananas	02-01-2021	NY	Fruit	50	0.01	0.3
6	Mat	Apples	03-01-2021	NY	Vegetable	80	0.02	1.6

Option 1 result:

	Name	Product	Date	Location	Type	Sales	Ship Fee %	Total Fee	Running Total
0	Tom	Bananas	01-01-2021	NY	Fruit	120	0.01	1.2	1.2
1	Tom	Apples	01-01-2021	NY	Fruit	120	0.01	1.2	1.2
2	Tom	Bananas	02-01-2021	TX	Fruit	420	0.01	4.2	4.2
3	Tom	Bananas	02-01-2021	TX	Fruit	120	0.01	1.2	5.4
4	Mat	Bananas	02-01-2021	NY	Fruit	30	0.01	0.3	0.3
5	Mat	Bananas	02-01-2021	NY	Fruit	50	0.01	0.3	0.3
6	Mat	Apples	03-01-2021	NY	Vegetable	80	0.02	1.6	1.6

Option 2 result:

	Name	Product	Date	Location	Type	Sales	Ship Fee %	Total Fee	Running Total
0	Tom	Bananas	01-01-2021	NY	Fruit	120	0.01	1.2	1.2
1	Tom	Apples	01-01-2021	NY	Fruit	120	0.01	1.2	1.2
2	Tom	Bananas	02-01-2021	TX	Fruit	420	0.01	4.2	4.2
3	Tom	Bananas	02-01-2021	TX	Fruit	120	0.01	1.2	5.4
4	Mat	Bananas	02-01-2021	NY	Fruit	30	0.01	0.3	0.3
5	Mat	Bananas	02-01-2021	NY	Fruit	50	0.01	0.3	0.6
6	Mat	Apples	03-01-2021	NY	Vegetable	80	0.02	1.6	1.6

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

有没有办法计算仅在几列中（仅唯一值）计算运行总数？

问题

答案1

快速从int16解析为float32的Python代码。

更改日志文件在执行过程中。

Django – 登录后的模板不知道用户是否已验证身份。

如何在 PySpark 数据帧中更改具有数组结构的列值

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论