Python DataFrame:创建一个列,根据日期和月份进行动态计算/百分比计算。

huangapple go评论122阅读模式
英文:

python dataframe: create a column, with dynamic calculation/percentage based on day and month

问题

这是我的数据框。最后两列是我可以在查询中创建的计算。

第一列用于计算所有“Good Result”的百分比

  1. df['FirstCol'] = np.where(df['Result'].isin(['Good']), df['Vol'] / df[df['Result'] == 'Good']['Vol'].sum(), 0)

第二列用于计算所有“Result”的百分比

  1. df['SecondCol'] = df['Vol'] / df['Vol'].sum()

对于其他列,代码需要更加动态,这是我正在努力解决的问题。

第三列应该基于每个月份获得百分比。因此,行0-8的百分比应为100%,行9-17也是如此。

第四列应该基于每天和月份获得百分比。因此,行0-2的百分比应为100%,行3-5以此类推。我想要一个动态查询,因为我不想每个月都要更改代码。

期望的输出如下:

  1. Day Month Result Vol FirstCol SecondCol ThirdCol1-9 MonthlyCol1-3
  2. 0 26 5 Good 123 1% 0% 1% 14%
  3. 1 26 5 Bad 716 0% 2% 5% 82%
  4. 2 26 5 Other 36 0% 0% 0% 4%
  5. 3 26 6 Good 4721 26% 11% 31% 36%
  6. 4 26 6 Bad 7148 0% 16% 48% 54%
  7. 5 26 6 Other 1387 0% 3% 9% 10%
  8. 6 27 5 Good 196 1% 0% 1% 22%
  9. 7 27 5 Bad 627 0% 1% 4% 71%
  10. 8 27 5 Other 60 0% 0% 0% 7%
  11. 9 27 6 Good 6188 34% 14% 21% 44%
  12. 10 27 6 Bad 6688 0% 15% 23% 48%
  13. 11 27 6 Other 1068 0% 2% 4% 8%
  14. 12 28 5 Good 339 2% 1% 1% 22%
  15. 13 28 5 Bad 1114 0% 3% 4% 73%
  16. 14 28 5 Other 72 0% 0% 0% 5%
  17. 15 28 6 Good 6524 36% 15% 23% 49%
  18. 16 28 6 Bad 6103 0% 14% 21% 45%
  19. 17 28 6 Other 820 0% 2% 3% 6%
英文:

I am struggling with the following.

This is my data frame. The last two columns, are calculations which I am able to create in my query.

  1. Day Month Result Vol FirstCol Second Col
  2. 0 26 5 Good 123 1% 0%
  3. 1 26 5 Bad 716 0% 2%
  4. 2 26 5 Other 36 0% 0%
  5. 3 26 6 Good 4721 26% 11%
  6. 4 26 6 Bad 7148 0% 16%
  7. 5 26 6 Other 1387 0% 3%
  8. 6 27 5 Good 196 1% 0%
  9. 7 27 5 Bad 627 0% 1%
  10. 8 27 5 Other 60 0% 0%
  11. 9 27 6 Good 6188 34% 14%
  12. 10 27 6 Bad 6688 0% 15%
  13. 11 27 6 Other 1068 0% 2%
  14. 12 28 5 Good 339 2% 1%
  15. 13 28 5 Bad 1114 0% 3%
  16. 14 28 5 Other 72 0% 0%
  17. 15 28 6 Good 6524 36% 15%
  18. 16 28 6 Bad 6103 0% 14%
  19. 17 28 6 Other 820 0% 2%

First column to calculate the percentage for all Good Result

  1. df['FirstCol'] = np.where(df['Result'].isin(['Good']),df['Vol']/df[df['Result']=='Good']['Vol'].sum(),0)

Second column to calculate the percentage for all Result

  1. df['SecondCol'] = df['Vol']/df['Vol'].sum()

For the other the code has to be more dynamic, which I am struggling with.
The Third Column should get the % based on each Month. So the percentage for row 0-8 should be 100% and the same for row 9-17.
The Forth Column should get the % based on each Day and Month. So the percentage for row 0-2 should be 100% and the same for row 3-5 and so on. I want to have a dynamic query. Because I don't want to change each month.

  1. Desired Output
  2. Day Month Result Vol FirstCol Second Col 0-17 Third Col 1-9 (Month) Forth Col 1-3 (Day)
  3. 0 26 5 Good 123 1% 0% 1% 14%
  4. 1 26 5 Bad 716 0% 2% 5% 82%
  5. 2 26 5 Other 36 0% 0% 0% 4%
  6. 3 26 6 Good 4721 26% 11% 31% 36%
  7. 4 26 6 Bad 7148 0% 16% 48% 54%
  8. 5 26 6 Other 1387 0% 3% 9% 10%
  9. 6 27 5 Good 196 1% 0% 1% 22%
  10. 7 27 5 Bad 627 0% 1% 4% 71%
  11. 8 27 5 Other 60 0% 0% 0% 7%
  12. 9 27 6 Good 6188 34% 14% 21% 44%
  13. 10 27 6 Bad 6688 0% 15% 23% 48%
  14. 11 27 6 Other 1068 0% 2% 4% 8%
  15. 12 28 5 Good 339 2% 1% 1% 22%
  16. 13 28 5 Bad 1114 0% 3% 4% 73%
  17. 14 28 5 Other 72 0% 0% 0% 5%
  18. 15 28 6 Good 6524 36% 15% 23% 49%
  19. 16 28 6 Bad 6103 0% 14% 21% 45%
  20. 17 28 6 Other 820 0% 2% 3% 6%

答案1

得分: 1

  1. # 对数据框进行排序以获得更好的输出:
  2. df = df.sort_values(by=['月份', '日期'])
  3. df['第三列'] = df.groupby('月份')['容量'].transform(lambda x: (x / x.sum()) * 100)
  4. df['第四列'] = df.groupby(['日期', '月份'])['容量'].transform(lambda x: (x / x.sum()) * 100)
  5. print(df)

输出:

  1. 日期 月份 结果 容量 第一列 第二列 第三列 第四列
  2. 0 26 5 123 1% 0% 3.746573 14.057143
  3. 1 26 5 716 0% 2% 21.809321 81.828571
  4. 2 26 5 其他 36 0% 0% 1.096558 4.114286
  5. 6 27 5 196 1% 0% 5.970149 22.197055
  6. 7 27 5 627 0% 1% 19.098386 71.007928
  7. 8 27 5 其他 60 0% 0% 1.827597 6.795017
  8. 12 28 5 339 2% 1% 10.325921 22.229508
  9. 13 28 5 1114 0% 3% 33.932379 73.049180
  10. 14 28 5 其他 72 0% 0% 2.193116 4.721311
  11. 3 26 6 4721 26% 11% 11.614633 35.614062
  12. 4 26 6 7148 0% 16% 17.585554 53.922752
  13. 5 26 6 其他 1387 0% 3% 3.412306 10.463186
  14. 9 27 6 6188 34% 14% 15.223756 44.377510
  15. 10 27 6 6688 0% 15% 16.453859 47.963282
  16. 11 27 6 其他 1068 0% 2% 2.627500 7.659208
  17. 15 28 6 6524 36% 15% 16.050385 48.516398
  18. 16 28 6 6103 0% 14% 15.014638 45.385588
  19. 17 28 6 其他 820 0% 2% 2.017369 6.098014
英文:

If I understand you correctly, you want to groupby by multiple columns:

  1. # sort the dataframe to have nicer output:
  2. df = df.sort_values(by=['Month', 'Day'])
  3. df['Third Col'] = df.groupby('Month')['Vol'].transform(lambda x: (x / x.sum()) *100)
  4. df['Fourth Col'] = df.groupby(['Day', 'Month'])['Vol'].transform(lambda x: (x / x.sum())*100)
  5. print(df)

Prints:

  1. Day Month Result Vol FirstCol Second Col Third Col Fourth Col
  2. 0 26 5 Good 123 1% 0% 3.746573 14.057143
  3. 1 26 5 Bad 716 0% 2% 21.809321 81.828571
  4. 2 26 5 Other 36 0% 0% 1.096558 4.114286
  5. 6 27 5 Good 196 1% 0% 5.970149 22.197055
  6. 7 27 5 Bad 627 0% 1% 19.098386 71.007928
  7. 8 27 5 Other 60 0% 0% 1.827597 6.795017
  8. 12 28 5 Good 339 2% 1% 10.325921 22.229508
  9. 13 28 5 Bad 1114 0% 3% 33.932379 73.049180
  10. 14 28 5 Other 72 0% 0% 2.193116 4.721311
  11. 3 26 6 Good 4721 26% 11% 11.614633 35.614062
  12. 4 26 6 Bad 7148 0% 16% 17.585554 53.922752
  13. 5 26 6 Other 1387 0% 3% 3.412306 10.463186
  14. 9 27 6 Good 6188 34% 14% 15.223756 44.377510
  15. 10 27 6 Bad 6688 0% 15% 16.453859 47.963282
  16. 11 27 6 Other 1068 0% 2% 2.627500 7.659208
  17. 15 28 6 Good 6524 36% 15% 16.050385 48.516398
  18. 16 28 6 Bad 6103 0% 14% 15.014638 45.385588
  19. 17 28 6 Other 820 0% 2% 2.017369 6.098014

huangapple
  • 本文由 发表于 2023年7月17日 23:42:51
  • 转载请务必保留本文链接:https://go.coder-hub.com/76706112.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定