问题

我需要创建一个新列，该列是所有子类别（在一个类别内）的加权平均价格（根据类别内的收入进行加权），除了子类别列中的那一个，即对于第一行，我只需要A2和A3的加权平均价格，因为子类别列中的值是A1。可以有人帮忙吗？

英文:

I have a dataframe as below

category    sub category    price     revenue
A             A1              100     1000
A             A2              110     990
A             A3              120     890
B             B1              90      1200
B             B2              100     1100
B             B3              95      1050

I need to create a new column which is the weighted avg price(weighted with revenue within a category) for all subcategories(within a category) except the one in subcategory column,i.e for the 1st row, I need the weighted avg price of A2 & A3 only since A1 is the value in sub-category column. Can someone pls help?

答案1

得分: 1

你可以手动计算加权均值，同时减去自身的值：

tmp = (df.set_index(['category', 'sub category'])
         .eval('prod=price*revenue')
       )
g = tmp.groupby(level=0)
out = (g['prod'].transform('sum')
       .sub(tmp['prod'])
       .div(g['revenue'].transform('sum').sub(tmp['revenue']))
       )

输出：

category  sub category
A         A1              114.734043
          A2              109.417989
          A3              104.974874
B         B1               97.558140
          B2               92.333333
          B3               94.782609
dtype: float64

英文:

You can compute the weighted mean manually, while subtracting the self values:

tmp = (df.set_index([&#39;category&#39;, &#39;sub category&#39;])
         .eval(&#39;prod=price*revenue&#39;)
       )
g = tmp.groupby(level=0)
out = (g[&#39;prod&#39;].transform(&#39;sum&#39;)
       .sub(tmp[&#39;prod&#39;])
       .div(g[&#39;revenue&#39;].transform(&#39;sum&#39;).sub(tmp[&#39;revenue&#39;]))
       )

Output:

category  sub category
A         A1              114.734043
          A2              109.417989
          A3              104.974874
B         B1               97.558140
          B2               92.333333
          B3               94.782609
dtype: float64

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何在Python中计算所有其他产品的加权平均值

问题

答案1

pandas：根据条件筛选整个分组。

Pandas DataFrame：选择绝对值方面的逐行最大值

Python按列分组，并确保分组中的值不会跳过另一个数据帧的顺序。

将pandas数据帧多重索引展平

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。