2023年7月17日 23:42:51go评论122阅读模式

英文:

python dataframe: create a column, with dynamic calculation/percentage based on day and month

问题

这是我的数据框。最后两列是我可以在查询中创建的计算。

第一列用于计算所有“Good Result”的百分比

df['FirstCol'] = np.where(df['Result'].isin(['Good']), df['Vol'] / df[df['Result'] == 'Good']['Vol'].sum(), 0)

第二列用于计算所有“Result”的百分比

df['SecondCol'] = df['Vol'] / df['Vol'].sum()

对于其他列，代码需要更加动态，这是我正在努力解决的问题。

第三列应该基于每个月份获得百分比。因此，行0-8的百分比应为100%，行9-17也是如此。

第四列应该基于每天和月份获得百分比。因此，行0-2的百分比应为100%，行3-5以此类推。我想要一个动态查询，因为我不想每个月都要更改代码。

期望的输出如下：

	Day	Month	Result	Vol	FirstCol	SecondCol	ThirdCol1-9	MonthlyCol1-3
0	 26	5	Good	123	1%	        0%	        1%	                14%
1	 26	5	Bad	    716	0%	        2%	        5%	                82%
2	 26	5	Other	 36	0%	        0%	        0%	                4%
3	 26	6	Good	4721	26%	        11%	        31%	                36%
4	 26	6	Bad	    7148	0%	        16%	        48%	                54%
5	 26	6	Other	1387	0%	        3%	        9%	                10%
6	 27	5	Good	 196	1%	        0%	        1%	                22%
7	 27	5	Bad	     627	0%	        1%	        4%	                71%
8	 27	5	Other	  60	0%	        0%	        0%	                7%
9	 27	6	Good	6188	34%	        14%	        21%	                44%
10	 27	6	Bad	    6688	0%	        15%	        23%	                48%
11	 27	6	Other	1068	0%	        2%	        4%	                8%
12	 28	5	Good  	 339	2%	        1%	        1%	                22%
13	 28	5	Bad	    1114	0%	        3%	        4%	                73%
14	 28	5	Other     72	0%	        0%	        0%	                5%
15	 28	6	Good	6524	36%	        15%	        23%	                49%
16	 28	6	Bad	    6103	0%	        14%	        21%	                45%
17	 28	6	Other	 820	0%	        2%	        3%	                6%

英文:

I am struggling with the following.

This is my data frame. The last two columns, are calculations which I am able to create in my query.

	Day	Month	Result	Vol	    FirstCol	Second Col
0	 26	   5	  Good	 123	1%	        0%
1	 26	   5	  Bad	 716	0%	        2%
2	 26	   5	  Other   36	0%	        0%
3	 26	   6	  Good	4721	26%	        11%
4	 26	   6	  Bad	7148	0%	        16%
5	 26	   6	  Other	1387	0%	        3%
6	 27	   5	  Good	 196	1%	        0%
7	 27	   5	  Bad	 627	0%	        1%
8	 27	   5	  Other	  60	0%	        0%
9	 27	   6	  Good	6188	34%	        14%
10	 27	   6	  Bad	6688	0%	        15%
11	 27	   6	  Other	1068	0%	        2%
12	 28    5	  Good	 339	2%	        1%
13	 28	   5	  Bad	1114	0%	        3%
14	 28	   5	  Other   72	0%	        0%
15	 28	   6	  Good	6524	36%	        15%
16	 28	   6	  Bad	6103	0%	        14%
17	 28	   6	  Other	 820	0%	        2%

First column to calculate the percentage for all Good Result

df[&#39;FirstCol&#39;] = np.where(df[&#39;Result&#39;].isin([&#39;Good&#39;]),df[&#39;Vol&#39;]/df[df[&#39;Result&#39;]==&#39;Good&#39;][&#39;Vol&#39;].sum(),0)

Second column to calculate the percentage for all Result

df[&#39;SecondCol&#39;] = df[&#39;Vol&#39;]/df[&#39;Vol&#39;].sum()

For the other the code has to be more dynamic, which I am struggling with.
The Third Column should get the % based on each Month. So the percentage for row 0-8 should be 100% and the same for row 9-17.
The Forth Column should get the % based on each Day and Month. So the percentage for row 0-2 should be 100% and the same for row 3-5 and so on. I want to have a dynamic query. Because I don't want to change each month.

Desired Output
	Day	Month	Result	Vol	FirstCol	Second Col 0-17	Third Col 1-9 (Month)	Forth Col 1-3 (Day)
0	 26	5	Good	123	    1%	        0%	                1%	                14%
1	 26	5	Bad	    716	    0%	        2%	                5%                	82%
2	 26	5	Other	 36	    0%	        0%	                0%                	4%
3	 26	6	Good	4721	26%	        11%	                31%                	36%
4	 26	6	Bad	    7148	0%	        16%	                48%                	54%
5	 26	6	Other	1387	0%	        3%        	        9%                	10%
6	 27	5	Good	 196	1%	        0%        	        1%	                22%
7	 27	5	Bad	     627	0%	        1%        	        4%                	71%
8	 27	5	Other	  60	0%	        0%	                0%                	7%
9	 27	6	Good	6188	34%	        14%        	        21%                	44%
10	 27	6	Bad	    6688	0%	        15%        	        23%                	48%
11	 27	6	Other	1068	0%	        2%        	        4%                	8%
12	 28	5	Good  	 339	2%	        1%        	        1%                	22%
13	 28	5	Bad	    1114	0%	        3%        	        4%                	73%
14	 28	5	Other     72	0%	        0%	                0%                	5%
15	 28	6	Good	6524	36%	        15%                	23%                	49%
16	 28	6	Bad	    6103	0%	        14%                	21%                	45%
17	 28	6	Other	 820	0%	        2%                	3%	                6%

答案1

得分: 1

# 对数据框进行排序以获得更好的输出：
df = df.sort_values(by=['月份', '日期'])
df['第三列'] = df.groupby('月份')['容量'].transform(lambda x: (x / x.sum()) * 100)
df['第四列'] = df.groupby(['日期', '月份'])['容量'].transform(lambda x: (x / x.sum()) * 100)
print(df)

输出：

    日期  月份 结果   容量 第一列 第二列      第三列      第四列
0   26   5   好   123    1%     0%  3.746573   14.057143
1   26   5   差   716    0%     2%  21.809321  81.828571
2   26   5   其他   36    0%     0%   1.096558    4.114286
6   27   5   好   196    1%     0%   5.970149   22.197055
7   27   5   差   627    0%     1%  19.098386   71.007928
8   27   5   其他   60    0%     0%   1.827597    6.795017
12  28   5   好   339    2%     1%  10.325921   22.229508
13  28   5   差  1114    0%     3%  33.932379   73.049180
14  28   5   其他   72    0%     0%   2.193116    4.721311
3   26   6   好  4721   26%    11%  11.614633   35.614062
4   26   6   差  7148    0%    16%  17.585554   53.922752
5   26   6   其他  1387    0%     3%   3.412306   10.463186
9   27   6   好  6188   34%    14%  15.223756   44.377510
10  27   6   差  6688    0%    15%  16.453859   47.963282
11  27   6   其他  1068    0%     2%   2.627500    7.659208
15  28   6   好  6524   36%    15%  16.050385   48.516398
16  28   6   差  6103    0%    14%  15.014638   45.385588
17  28   6   其他   820    0%     2%   2.017369    6.098014

英文:

If I understand you correctly, you want to groupby by multiple columns:

# sort the dataframe to have nicer output:
df = df.sort_values(by=[&#39;Month&#39;, &#39;Day&#39;])
df[&#39;Third Col&#39;] = df.groupby(&#39;Month&#39;)[&#39;Vol&#39;].transform(lambda x: (x / x.sum()) *100)
df[&#39;Fourth Col&#39;] = df.groupby([&#39;Day&#39;, &#39;Month&#39;])[&#39;Vol&#39;].transform(lambda x: (x / x.sum())*100)
print(df)

Prints:

    Day  Month Result   Vol FirstCol Second Col  Third Col  Fourth Col
0    26      5   Good   123       1%         0%   3.746573   14.057143
1    26      5    Bad   716       0%         2%  21.809321   81.828571
2    26      5  Other    36       0%         0%   1.096558    4.114286
6    27      5   Good   196       1%         0%   5.970149   22.197055
7    27      5    Bad   627       0%         1%  19.098386   71.007928
8    27      5  Other    60       0%         0%   1.827597    6.795017
12   28      5   Good   339       2%         1%  10.325921   22.229508
13   28      5    Bad  1114       0%         3%  33.932379   73.049180
14   28      5  Other    72       0%         0%   2.193116    4.721311
3    26      6   Good  4721      26%        11%  11.614633   35.614062
4    26      6    Bad  7148       0%        16%  17.585554   53.922752
5    26      6  Other  1387       0%         3%   3.412306   10.463186
9    27      6   Good  6188      34%        14%  15.223756   44.377510
10   27      6    Bad  6688       0%        15%  16.453859   47.963282
11   27      6  Other  1068       0%         2%   2.627500    7.659208
15   28      6   Good  6524      36%        15%  16.050385   48.516398
16   28      6    Bad  6103       0%        14%  15.014638   45.385588
17   28      6  Other   820       0%         2%   2.017369    6.098014

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Python DataFrame：创建一个列，根据日期和月份进行动态计算/百分比计算。

问题

答案1

Pandas：检查Pandas Dataframe列是否包含一个Dataframe。

如何在索引函数中存储字符串列表？

从 Plotly 图表的 X 轴标签中移除数据

MariaDB软件包安装困难

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。