将pandas列按groupby数据框分割的有效方法

huangapple go评论52阅读模式
英文:

Efficient way to divide a pandas col by a groupby df

问题

以下是您提供的内容的翻译部分:

问题

如何计算每年的number_x的百分比?意思是:

将pandas列按groupby数据框分割的有效方法

直接除法不起作用,因为原始数据框中的年份无法设置为索引,因为它不是唯一的。

现在我正在执行以下操作,但它效率不高,我相信有更好的方法。

df= pd.merge(df, df.groupby('year').sum(), left_on='year',right_index=True)
df['%'] = round((df['number_x'] / df['number_y'])*100 , 2)
df = df.drop('number_y', axis=1)

谢谢!

英文:

Easier to explain with an example, say I have an example dataframe here with year, cc_rating and number_x.

df = pd.DataFrame({"year":{"0":2005,"1":2005,"2":2005,"3":2006,"4":2006,"5":2006,"6":2007,"7":2007,"8":2007},"cc_rating":{"0":"2","1":"2a","2":"2b","3":"2","4":"2a","5":"2b","6":"2","7":"2a","8":"2b"},"number_x":{"0":9368,"1":21643,"2":107577,"3":10069,"4":21486,"5":110326,"6":10834,"7":21566,"8":111082}})

df 

year	cc_rating	number_x
0	2005	2	9368
1	2005	2a	21643
2	2005	2b	107577
3	2006	2	10069
4	2006	2a	21486
5	2006	2b	110326
6	2007	2	10834
7	2007	2a	21566
8	2007	2b	111082

Problem

How can I get the % of number_x per year? Meaning:

将pandas列按groupby数据框分割的有效方法

Straight division wont work as year cant be set as the index in the original df as it is not unique.

Right now I'm doing the following but its quite inefficient and im sure theres a better way.

df= pd.merge(df, df.groupby('year').sum(), left_on='year',right_index=True)
df['%'] = round((df['number_x'] / df['number_y'])*100 , 2)
df = df.drop('number_y', axis=1)

Thanks!

答案1

得分: 0

以下是已翻译好的部分:

可能的解决方案:

(df.assign(
    perc = (100*df.number_x.div(df.groupby('year').number_x.transform('sum')))
    .round(2))))

输出:

   year cc_rating  number_x   perc
0  2005         2      9368   6.76
1  2005        2a     21643  15.62
2  2005        2b    107577  77.62
3  2006         2     10069   7.10
4  2006        2a     21486  15.14
5  2006        2b    110326  77.76
6  2007         2     10834   7.55
7  2007        2a     21566  15.03
8  2007        2b    111082  77.42
英文:

A possible solution:

(df.assign(
    perc = (100*df.number_x.div(df.groupby('year').number_x.transform('sum')))
    .round(2)))

Output:

   year cc_rating  number_x   perc
0  2005         2      9368   6.76
1  2005        2a     21643  15.62
2  2005        2b    107577  77.62
3  2006         2     10069   7.10
4  2006        2a     21486  15.14
5  2006        2b    110326  77.76
6  2007         2     10834   7.55
7  2007        2a     21566  15.03
8  2007        2b    111082  77.42

huangapple
  • 本文由 发表于 2023年5月8日 02:32:27
  • 转载请务必保留本文链接:https://go.coder-hub.com/76195648-2.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定