将pandas列按groupby数据框分割的有效方法

huangapple go评论96阅读模式
英文:

Efficient way to divide a pandas col by a groupby df

问题

以下是您提供的内容的翻译部分:

问题

如何计算每年的number_x的百分比?意思是:

将pandas列按groupby数据框分割的有效方法

直接除法不起作用,因为原始数据框中的年份无法设置为索引,因为它不是唯一的。

现在我正在执行以下操作,但它效率不高,我相信有更好的方法。

  1. df= pd.merge(df, df.groupby('year').sum(), left_on='year',right_index=True)
  2. df['%'] = round((df['number_x'] / df['number_y'])*100 , 2)
  3. df = df.drop('number_y', axis=1)

谢谢!

英文:

Easier to explain with an example, say I have an example dataframe here with year, cc_rating and number_x.

  1. df = pd.DataFrame({"year":{"0":2005,"1":2005,"2":2005,"3":2006,"4":2006,"5":2006,"6":2007,"7":2007,"8":2007},"cc_rating":{"0":"2","1":"2a","2":"2b","3":"2","4":"2a","5":"2b","6":"2","7":"2a","8":"2b"},"number_x":{"0":9368,"1":21643,"2":107577,"3":10069,"4":21486,"5":110326,"6":10834,"7":21566,"8":111082}})
  2. df
  3. year cc_rating number_x
  4. 0 2005 2 9368
  5. 1 2005 2a 21643
  6. 2 2005 2b 107577
  7. 3 2006 2 10069
  8. 4 2006 2a 21486
  9. 5 2006 2b 110326
  10. 6 2007 2 10834
  11. 7 2007 2a 21566
  12. 8 2007 2b 111082

Problem

How can I get the % of number_x per year? Meaning:

将pandas列按groupby数据框分割的有效方法

Straight division wont work as year cant be set as the index in the original df as it is not unique.

Right now I'm doing the following but its quite inefficient and im sure theres a better way.

  1. df= pd.merge(df, df.groupby('year').sum(), left_on='year',right_index=True)
  2. df['%'] = round((df['number_x'] / df['number_y'])*100 , 2)
  3. df = df.drop('number_y', axis=1)

Thanks!

答案1

得分: 0

以下是已翻译好的部分:

  1. 可能的解决方案:
  2. df.assign(
  3. perc = (100*df.number_x.div(df.groupby('year').number_x.transform('sum')))
  4. .round(2))))
  5. 输出:
  6. year cc_rating number_x perc
  7. 0 2005 2 9368 6.76
  8. 1 2005 2a 21643 15.62
  9. 2 2005 2b 107577 77.62
  10. 3 2006 2 10069 7.10
  11. 4 2006 2a 21486 15.14
  12. 5 2006 2b 110326 77.76
  13. 6 2007 2 10834 7.55
  14. 7 2007 2a 21566 15.03
  15. 8 2007 2b 111082 77.42
英文:

A possible solution:

  1. (df.assign(
  2. perc = (100*df.number_x.div(df.groupby('year').number_x.transform('sum')))
  3. .round(2)))

Output:

  1. year cc_rating number_x perc
  2. 0 2005 2 9368 6.76
  3. 1 2005 2a 21643 15.62
  4. 2 2005 2b 107577 77.62
  5. 3 2006 2 10069 7.10
  6. 4 2006 2a 21486 15.14
  7. 5 2006 2b 110326 77.76
  8. 6 2007 2 10834 7.55
  9. 7 2007 2a 21566 15.03
  10. 8 2007 2b 111082 77.42

huangapple
  • 本文由 发表于 2023年5月8日 02:32:27
  • 转载请务必保留本文链接:https://go.coder-hub.com/76195648.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定