pandas基于另一列添加排名列

huangapple go评论71阅读模式
英文:

pandas add a ranking column based on another column

问题

我有一个DataFrame:

df = pd.DataFrame({'feature':['a','b','c','d','e'],
                   'importance':[0.1, 0.5, 0.4, 0.2, 0.8]})

df
  feature  importance
0       a         0.1
1       b         0.5
2       c         0.4
3       d         0.2
4       e         0.8

我想添加一个名为ranking的列,通过以下方式为每个特征分配排名:

feature_rank = 特征的重要性 / 所有特征重要性的总和

所以特征的排名如下:

a -> 0.1 / (0.1 + 0.5 + 0.4 + 0.2 + 0.8) = 0.05
b -> 0.5 / (0.1 + 0.5 + 0.4 + 0.2 + 0.8) = 0.25
c -> 0.4 / (0.1 + 0.5 + 0.4 + 0.2 + 0.8) = 0.2
d -> 0.2 / (0.1 + 0.5 + 0.4 + 0.2 + 0.8) = 0.1
e -> 0.8 / (0.1 + 0.5 + 0.4 + 0.2 + 0.8) = 0.4

预期结果:

因此,最终的df将如下所示:

  feature  importance    ranking
0       a         0.1      5
1       b         0.5      2
2       c         0.4      3
3       d         0.2      4
4       e         0.8      1
英文:

I have the DataFrame:

df = pd.DataFrame({'feature':['a','b','c','d','e'],
                   'importance':[0.1, 0.5, 0.4, 0.2, 0.8]})

df
  feature  importance
0       a         0.1
1       b         0.5
2       c         0.4
3       d         0.2
4       e         0.8

I want to add a column ranking, that assigns rank to each feature by evaluating:

feature_rank = feature's importance/sum of all features importance

So feature that:

a -> 0.1 /(0.1 + 0.5 + 0.4 + 0.2 + 0.8) = 0.05
b -> 0.5 /(0.1 + 0.5 + 0.4 + 0.2 + 0.8) = 0.25
c -> 0.4 /(0.1 + 0.5 + 0.4 + 0.2 + 0.8) = 0.2
d -> 0.2 /(0.1 + 0.5 + 0.4 + 0.2 + 0.8) = 0.1
e -> 0.8 /(0.1 + 0.5 + 0.4 + 0.2 + 0.8) = 0.4

Expected results:

The final df will therefore be:

  feature  importance    ranking
0       a         0.1      5
1       b         0.5      2
2       c         0.4      3
3       d         0.2      4
4       e         0.8      1

答案1

得分: 2

你可以在使用Series的sum进行归一化后,使用rank方法:

df['ranking'] = (df['importance'].div(df['importance'].sum())
                 .rank(method='dense', ascending=False)
                 .astype(int) # 可选
                )

请注意,如果总和是正数,通过除以严格正整数来计算不会改变排名,所以你可以简化为:

df['ranking'] = df['importance'].rank(method='dense', ascending=False)

输出结果:

  feature  importance  ranking
0       a         0.1        5
1       b         0.5        2
2       c         0.4        3
3       d         0.2        4
4       e         0.8        1
英文:

You can use rank after normalizing with the Series' sum:

df['ranking'] = (df['importance'].div(df['importance'].sum())
                 .rank(method='dense', ascending=False)
                 .astype(int) # optional
                )

Note that dividing by a strictly positive integer won't change the rank, so if the sum is positive, you can simplify to:

df['ranking'] = df['importance'].rank(method='dense', ascending=False)

Output:

  feature  importance  ranking
0       a         0.1        5
1       b         0.5        2
2       c         0.4        3
3       d         0.2        4
4       e         0.8        1

答案2

得分: 1

这可能看起来不是很高效,但这只是实现相同问题的另一种方式。

import pandas as pd
df = pd.DataFrame({'feature':['a','b','c','d','e'],
                   'importance':[0.1, 0.5, 0.4, 0.2, 0.8]})
df = df.sort_values(by='importance', ascending=False)
df["rating"] = range(1, len(df) + 1)
df = df.sort_index()

pandas基于另一列添加排名列

英文:

This may not seem very efficient, but this is just another way of achieveing the same problem.

import pandas as pd
df = pd.DataFrame({'feature':['a','b','c','d','e'],
                   'importance':[0.1, 0.5, 0.4, 0.2, 0.8]})
df = df.sort_values(by='importance', ascending=False)
df["rating"] = range(1, len(df) + 1)
df = df.sort_index()

pandas基于另一列添加排名列

答案3

得分: 1

另一种可能的解决方案:

df.assign(ranking = df.sort_values('importance', ascending=False).index + 1)

输出:

      feature  importance  ranking
    0       a         0.1        5
    1       b         0.5        2
    2       c         0.4        3
    3       d         0.2        4
    4       e         0.8        1
英文:

Another possible solution:

df.assign(ranking = df.sort_values('importance', ascending=False).index + 1)

Output:

  feature  importance  ranking
0       a         0.1        5
1       b         0.5        2
2       c         0.4        3
3       d         0.2        4
4       e         0.8        1

huangapple
  • 本文由 发表于 2023年6月15日 20:40:06
  • 转载请务必保留本文链接:https://go.coder-hub.com/76482580.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定