英文:
python dataframe: create a column, with dynamic calculation/percentage based on day and month
问题
这是我的数据框。最后两列是我可以在查询中创建的计算。
第一列用于计算所有“Good Result”的百分比
df['FirstCol'] = np.where(df['Result'].isin(['Good']), df['Vol'] / df[df['Result'] == 'Good']['Vol'].sum(), 0)
第二列用于计算所有“Result”的百分比
df['SecondCol'] = df['Vol'] / df['Vol'].sum()
对于其他列,代码需要更加动态,这是我正在努力解决的问题。
第三列应该基于每个月份获得百分比。因此,行0-8的百分比应为100%,行9-17也是如此。
第四列应该基于每天和月份获得百分比。因此,行0-2的百分比应为100%,行3-5以此类推。我想要一个动态查询,因为我不想每个月都要更改代码。
期望的输出如下:
Day Month Result Vol FirstCol SecondCol ThirdCol1-9 MonthlyCol1-3
0 26 5 Good 123 1% 0% 1% 14%
1 26 5 Bad 716 0% 2% 5% 82%
2 26 5 Other 36 0% 0% 0% 4%
3 26 6 Good 4721 26% 11% 31% 36%
4 26 6 Bad 7148 0% 16% 48% 54%
5 26 6 Other 1387 0% 3% 9% 10%
6 27 5 Good 196 1% 0% 1% 22%
7 27 5 Bad 627 0% 1% 4% 71%
8 27 5 Other 60 0% 0% 0% 7%
9 27 6 Good 6188 34% 14% 21% 44%
10 27 6 Bad 6688 0% 15% 23% 48%
11 27 6 Other 1068 0% 2% 4% 8%
12 28 5 Good 339 2% 1% 1% 22%
13 28 5 Bad 1114 0% 3% 4% 73%
14 28 5 Other 72 0% 0% 0% 5%
15 28 6 Good 6524 36% 15% 23% 49%
16 28 6 Bad 6103 0% 14% 21% 45%
17 28 6 Other 820 0% 2% 3% 6%
英文:
I am struggling with the following.
This is my data frame. The last two columns, are calculations which I am able to create in my query.
Day Month Result Vol FirstCol Second Col
0 26 5 Good 123 1% 0%
1 26 5 Bad 716 0% 2%
2 26 5 Other 36 0% 0%
3 26 6 Good 4721 26% 11%
4 26 6 Bad 7148 0% 16%
5 26 6 Other 1387 0% 3%
6 27 5 Good 196 1% 0%
7 27 5 Bad 627 0% 1%
8 27 5 Other 60 0% 0%
9 27 6 Good 6188 34% 14%
10 27 6 Bad 6688 0% 15%
11 27 6 Other 1068 0% 2%
12 28 5 Good 339 2% 1%
13 28 5 Bad 1114 0% 3%
14 28 5 Other 72 0% 0%
15 28 6 Good 6524 36% 15%
16 28 6 Bad 6103 0% 14%
17 28 6 Other 820 0% 2%
First column to calculate the percentage for all Good Result
df['FirstCol'] = np.where(df['Result'].isin(['Good']),df['Vol']/df[df['Result']=='Good']['Vol'].sum(),0)
Second column to calculate the percentage for all Result
df['SecondCol'] = df['Vol']/df['Vol'].sum()
For the other the code has to be more dynamic, which I am struggling with.
The Third Column should get the % based on each Month. So the percentage for row 0-8 should be 100% and the same for row 9-17.
The Forth Column should get the % based on each Day and Month. So the percentage for row 0-2 should be 100% and the same for row 3-5 and so on. I want to have a dynamic query. Because I don't want to change each month.
Desired Output
Day Month Result Vol FirstCol Second Col 0-17 Third Col 1-9 (Month) Forth Col 1-3 (Day)
0 26 5 Good 123 1% 0% 1% 14%
1 26 5 Bad 716 0% 2% 5% 82%
2 26 5 Other 36 0% 0% 0% 4%
3 26 6 Good 4721 26% 11% 31% 36%
4 26 6 Bad 7148 0% 16% 48% 54%
5 26 6 Other 1387 0% 3% 9% 10%
6 27 5 Good 196 1% 0% 1% 22%
7 27 5 Bad 627 0% 1% 4% 71%
8 27 5 Other 60 0% 0% 0% 7%
9 27 6 Good 6188 34% 14% 21% 44%
10 27 6 Bad 6688 0% 15% 23% 48%
11 27 6 Other 1068 0% 2% 4% 8%
12 28 5 Good 339 2% 1% 1% 22%
13 28 5 Bad 1114 0% 3% 4% 73%
14 28 5 Other 72 0% 0% 0% 5%
15 28 6 Good 6524 36% 15% 23% 49%
16 28 6 Bad 6103 0% 14% 21% 45%
17 28 6 Other 820 0% 2% 3% 6%
答案1
得分: 1
# 对数据框进行排序以获得更好的输出:
df = df.sort_values(by=['月份', '日期'])
df['第三列'] = df.groupby('月份')['容量'].transform(lambda x: (x / x.sum()) * 100)
df['第四列'] = df.groupby(['日期', '月份'])['容量'].transform(lambda x: (x / x.sum()) * 100)
print(df)
输出:
日期 月份 结果 容量 第一列 第二列 第三列 第四列
0 26 5 好 123 1% 0% 3.746573 14.057143
1 26 5 差 716 0% 2% 21.809321 81.828571
2 26 5 其他 36 0% 0% 1.096558 4.114286
6 27 5 好 196 1% 0% 5.970149 22.197055
7 27 5 差 627 0% 1% 19.098386 71.007928
8 27 5 其他 60 0% 0% 1.827597 6.795017
12 28 5 好 339 2% 1% 10.325921 22.229508
13 28 5 差 1114 0% 3% 33.932379 73.049180
14 28 5 其他 72 0% 0% 2.193116 4.721311
3 26 6 好 4721 26% 11% 11.614633 35.614062
4 26 6 差 7148 0% 16% 17.585554 53.922752
5 26 6 其他 1387 0% 3% 3.412306 10.463186
9 27 6 好 6188 34% 14% 15.223756 44.377510
10 27 6 差 6688 0% 15% 16.453859 47.963282
11 27 6 其他 1068 0% 2% 2.627500 7.659208
15 28 6 好 6524 36% 15% 16.050385 48.516398
16 28 6 差 6103 0% 14% 15.014638 45.385588
17 28 6 其他 820 0% 2% 2.017369 6.098014
英文:
If I understand you correctly, you want to groupby by multiple columns:
# sort the dataframe to have nicer output:
df = df.sort_values(by=['Month', 'Day'])
df['Third Col'] = df.groupby('Month')['Vol'].transform(lambda x: (x / x.sum()) *100)
df['Fourth Col'] = df.groupby(['Day', 'Month'])['Vol'].transform(lambda x: (x / x.sum())*100)
print(df)
Prints:
Day Month Result Vol FirstCol Second Col Third Col Fourth Col
0 26 5 Good 123 1% 0% 3.746573 14.057143
1 26 5 Bad 716 0% 2% 21.809321 81.828571
2 26 5 Other 36 0% 0% 1.096558 4.114286
6 27 5 Good 196 1% 0% 5.970149 22.197055
7 27 5 Bad 627 0% 1% 19.098386 71.007928
8 27 5 Other 60 0% 0% 1.827597 6.795017
12 28 5 Good 339 2% 1% 10.325921 22.229508
13 28 5 Bad 1114 0% 3% 33.932379 73.049180
14 28 5 Other 72 0% 0% 2.193116 4.721311
3 26 6 Good 4721 26% 11% 11.614633 35.614062
4 26 6 Bad 7148 0% 16% 17.585554 53.922752
5 26 6 Other 1387 0% 3% 3.412306 10.463186
9 27 6 Good 6188 34% 14% 15.223756 44.377510
10 27 6 Bad 6688 0% 15% 16.453859 47.963282
11 27 6 Other 1068 0% 2% 2.627500 7.659208
15 28 6 Good 6524 36% 15% 16.050385 48.516398
16 28 6 Bad 6103 0% 14% 15.014638 45.385588
17 28 6 Other 820 0% 2% 2.017369 6.098014
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论