2023年5月8日 02:32:27go评论96阅读模式

英文:

Efficient way to divide a pandas col by a groupby df

问题

以下是您提供的内容的翻译部分：

问题

如何计算每年的number_x的百分比？意思是：

直接除法不起作用，因为原始数据框中的年份无法设置为索引，因为它不是唯一的。

现在我正在执行以下操作，但它效率不高，我相信有更好的方法。

df= pd.merge(df, df.groupby('year').sum(), left_on='year',right_index=True)
df['%'] = round((df['number_x'] / df['number_y'])*100 , 2)
df = df.drop('number_y', axis=1)

谢谢！

英文:

Easier to explain with an example, say I have an example dataframe here with year, cc_rating and number_x.

df = pd.DataFrame({&quot;year&quot;:{&quot;0&quot;:2005,&quot;1&quot;:2005,&quot;2&quot;:2005,&quot;3&quot;:2006,&quot;4&quot;:2006,&quot;5&quot;:2006,&quot;6&quot;:2007,&quot;7&quot;:2007,&quot;8&quot;:2007},&quot;cc_rating&quot;:{&quot;0&quot;:&quot;2&quot;,&quot;1&quot;:&quot;2a&quot;,&quot;2&quot;:&quot;2b&quot;,&quot;3&quot;:&quot;2&quot;,&quot;4&quot;:&quot;2a&quot;,&quot;5&quot;:&quot;2b&quot;,&quot;6&quot;:&quot;2&quot;,&quot;7&quot;:&quot;2a&quot;,&quot;8&quot;:&quot;2b&quot;},&quot;number_x&quot;:{&quot;0&quot;:9368,&quot;1&quot;:21643,&quot;2&quot;:107577,&quot;3&quot;:10069,&quot;4&quot;:21486,&quot;5&quot;:110326,&quot;6&quot;:10834,&quot;7&quot;:21566,&quot;8&quot;:111082}})
df 
year	cc_rating	number_x
0	2005	2	9368
1	2005	2a	21643
2	2005	2b	107577
3	2006	2	10069
4	2006	2a	21486
5	2006	2b	110326
6	2007	2	10834
7	2007	2a	21566
8	2007	2b	111082

Problem

How can I get the % of number_x per year? Meaning:

Straight division wont work as year cant be set as the index in the original df as it is not unique.

Right now I'm doing the following but its quite inefficient and im sure theres a better way.

df= pd.merge(df, df.groupby(&#39;year&#39;).sum(), left_on=&#39;year&#39;,right_index=True)
df[&#39;%&#39;] = round((df[&#39;number_x&#39;] / df[&#39;number_y&#39;])*100 , 2)
df = df.drop(&#39;number_y&#39;, axis=1)

Thanks!

答案1

得分: 0

以下是已翻译好的部分：

可能的解决方案：
（df.assign(
    perc = (100*df.number_x.div(df.groupby('year').number_x.transform('sum')))
    .round(2))))
输出：
   year cc_rating  number_x   perc
0  2005         2      9368   6.76
1  2005        2a     21643  15.62
2  2005        2b    107577  77.62
3  2006         2     10069   7.10
4  2006        2a     21486  15.14
5  2006        2b    110326  77.76
6  2007         2     10834   7.55
7  2007        2a     21566  15.03
8  2007        2b    111082  77.42

英文:

A possible solution:

(df.assign(
    perc = (100*df.number_x.div(df.groupby(&#39;year&#39;).number_x.transform(&#39;sum&#39;)))
    .round(2)))

Output:

   year cc_rating  number_x   perc
0  2005         2      9368   6.76
1  2005        2a     21643  15.62
2  2005        2b    107577  77.62
3  2006         2     10069   7.10
4  2006        2a     21486  15.14
5  2006        2b    110326  77.76
6  2007         2     10834   7.55
7  2007        2a     21566  15.03
8  2007        2b    111082  77.42

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

将pandas列按groupby数据框分割的有效方法

问题

答案1

Pyspark日期列上的条件

如何纠正我对子情节的误解

“我的变量改变了（虽然我不想要），一旦我改变了用来设置它的其他变量”

生成特定形式的字母数字字符组合

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。