2023年6月15日 20:40:06go评论125阅读模式

英文:

pandas add a ranking column based on another column

问题

我有一个DataFrame：

df = pd.DataFrame({'feature':['a','b','c','d','e'],
                   'importance':[0.1, 0.5, 0.4, 0.2, 0.8]})
df
  feature  importance
0       a         0.1
1       b         0.5
2       c         0.4
3       d         0.2
4       e         0.8

我想添加一个名为ranking的列，通过以下方式为每个特征分配排名：

feature_rank = 特征的重要性 / 所有特征重要性的总和

所以特征的排名如下：

a -> 0.1 / (0.1 + 0.5 + 0.4 + 0.2 + 0.8) = 0.05
b -> 0.5 / (0.1 + 0.5 + 0.4 + 0.2 + 0.8) = 0.25
c -> 0.4 / (0.1 + 0.5 + 0.4 + 0.2 + 0.8) = 0.2
d -> 0.2 / (0.1 + 0.5 + 0.4 + 0.2 + 0.8) = 0.1
e -> 0.8 / (0.1 + 0.5 + 0.4 + 0.2 + 0.8) = 0.4

预期结果：

因此，最终的df将如下所示：

  feature  importance    ranking
0       a         0.1      5
1       b         0.5      2
2       c         0.4      3
3       d         0.2      4
4       e         0.8      1

英文:

I have the DataFrame:

df = pd.DataFrame({&#39;feature&#39;:[&#39;a&#39;,&#39;b&#39;,&#39;c&#39;,&#39;d&#39;,&#39;e&#39;],
                   &#39;importance&#39;:[0.1, 0.5, 0.4, 0.2, 0.8]})
df
  feature  importance
0       a         0.1
1       b         0.5
2       c         0.4
3       d         0.2
4       e         0.8

I want to add a column ranking, that assigns rank to each feature by evaluating:

feature_rank = feature&#39;s importance/sum of all features importance

So feature that:

a -&gt; 0.1 /(0.1 + 0.5 + 0.4 + 0.2 + 0.8) = 0.05
b -&gt; 0.5 /(0.1 + 0.5 + 0.4 + 0.2 + 0.8) = 0.25
c -&gt; 0.4 /(0.1 + 0.5 + 0.4 + 0.2 + 0.8) = 0.2
d -&gt; 0.2 /(0.1 + 0.5 + 0.4 + 0.2 + 0.8) = 0.1
e -&gt; 0.8 /(0.1 + 0.5 + 0.4 + 0.2 + 0.8) = 0.4

Expected results:

The final df will therefore be:

  feature  importance    ranking
0       a         0.1      5
1       b         0.5      2
2       c         0.4      3
3       d         0.2      4
4       e         0.8      1

答案1

得分: 2

你可以在使用Series的sum进行归一化后，使用rank方法：

df['ranking'] = (df['importance'].div(df['importance'].sum())
                 .rank(method='dense', ascending=False)
                 .astype(int) # 可选
                )

请注意，如果总和是正数，通过除以严格正整数来计算不会改变排名，所以你可以简化为：

df['ranking'] = df['importance'].rank(method='dense', ascending=False)

输出结果：

  feature  importance  ranking
0       a         0.1        5
1       b         0.5        2
2       c         0.4        3
3       d         0.2        4
4       e         0.8        1

英文:

You can use rank after normalizing with the Series' sum:

df[&#39;ranking&#39;] = (df[&#39;importance&#39;].div(df[&#39;importance&#39;].sum())
                 .rank(method=&#39;dense&#39;, ascending=False)
                 .astype(int) # optional
                )

Note that dividing by a strictly positive integer won't change the rank, so if the sum is positive, you can simplify to:

df[&#39;ranking&#39;] = df[&#39;importance&#39;].rank(method=&#39;dense&#39;, ascending=False)

Output:

  feature  importance  ranking
0       a         0.1        5
1       b         0.5        2
2       c         0.4        3
3       d         0.2        4
4       e         0.8        1

答案2

得分: 1

这可能看起来不是很高效，但这只是实现相同问题的另一种方式。

import pandas as pd
df = pd.DataFrame({'feature':['a','b','c','d','e'],
                   'importance':[0.1, 0.5, 0.4, 0.2, 0.8]})
df = df.sort_values(by='importance', ascending=False)
df["rating"] = range(1, len(df) + 1)
df = df.sort_index()

英文:

This may not seem very efficient, but this is just another way of achieveing the same problem.

import pandas as pd
df = pd.DataFrame({&#39;feature&#39;:[&#39;a&#39;,&#39;b&#39;,&#39;c&#39;,&#39;d&#39;,&#39;e&#39;],
                   &#39;importance&#39;:[0.1, 0.5, 0.4, 0.2, 0.8]})
df = df.sort_values(by=&#39;importance&#39;, ascending=False)
df[&quot;rating&quot;] = range(1, len(df) + 1)
df = df.sort_index()

答案3

得分: 1

另一种可能的解决方案：

df.assign(ranking = df.sort_values('importance', ascending=False).index + 1)

输出：

      feature  importance  ranking
    0       a         0.1        5
    1       b         0.5        2
    2       c         0.4        3
    3       d         0.2        4
    4       e         0.8        1

英文:

Another possible solution:

df.assign(ranking = df.sort_values(&#39;importance&#39;, ascending=False).index + 1)

Output:

  feature  importance  ranking
0       a         0.1        5
1       b         0.5        2
2       c         0.4        3
3       d         0.2        4
4       e         0.8        1

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

pandas基于另一列添加排名列

问题

答案1

答案2

答案3

如何根据数据框中的另一列, 使用字典来填充数据框列。

如何在函数中实现 **kwargs 来省略参数？

nquad得到的结果与tplquad在三重积分中得到的结果不同

在使用Python更改.docx文件中表格中的字母颜色时，可以使用以下方法：

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。