英文:
pandas add a ranking column based on another column
问题
我有一个DataFrame:
df = pd.DataFrame({'feature':['a','b','c','d','e'],
'importance':[0.1, 0.5, 0.4, 0.2, 0.8]})
df
feature importance
0 a 0.1
1 b 0.5
2 c 0.4
3 d 0.2
4 e 0.8
我想添加一个名为ranking
的列,通过以下方式为每个特征分配排名:
feature_rank = 特征的重要性 / 所有特征重要性的总和
所以特征的排名如下:
a -> 0.1 / (0.1 + 0.5 + 0.4 + 0.2 + 0.8) = 0.05
b -> 0.5 / (0.1 + 0.5 + 0.4 + 0.2 + 0.8) = 0.25
c -> 0.4 / (0.1 + 0.5 + 0.4 + 0.2 + 0.8) = 0.2
d -> 0.2 / (0.1 + 0.5 + 0.4 + 0.2 + 0.8) = 0.1
e -> 0.8 / (0.1 + 0.5 + 0.4 + 0.2 + 0.8) = 0.4
预期结果:
因此,最终的df
将如下所示:
feature importance ranking
0 a 0.1 5
1 b 0.5 2
2 c 0.4 3
3 d 0.2 4
4 e 0.8 1
英文:
I have the DataFrame:
df = pd.DataFrame({'feature':['a','b','c','d','e'],
'importance':[0.1, 0.5, 0.4, 0.2, 0.8]})
df
feature importance
0 a 0.1
1 b 0.5
2 c 0.4
3 d 0.2
4 e 0.8
I want to add a column ranking
, that assigns rank to each feature by evaluating:
feature_rank = feature's importance/sum of all features importance
So feature that:
a -> 0.1 /(0.1 + 0.5 + 0.4 + 0.2 + 0.8) = 0.05
b -> 0.5 /(0.1 + 0.5 + 0.4 + 0.2 + 0.8) = 0.25
c -> 0.4 /(0.1 + 0.5 + 0.4 + 0.2 + 0.8) = 0.2
d -> 0.2 /(0.1 + 0.5 + 0.4 + 0.2 + 0.8) = 0.1
e -> 0.8 /(0.1 + 0.5 + 0.4 + 0.2 + 0.8) = 0.4
Expected results:
The final df
will therefore be:
feature importance ranking
0 a 0.1 5
1 b 0.5 2
2 c 0.4 3
3 d 0.2 4
4 e 0.8 1
答案1
得分: 2
你可以在使用Series的sum
进行归一化后,使用rank
方法:
df['ranking'] = (df['importance'].div(df['importance'].sum())
.rank(method='dense', ascending=False)
.astype(int) # 可选
)
请注意,如果总和是正数,通过除以严格正整数来计算不会改变排名,所以你可以简化为:
df['ranking'] = df['importance'].rank(method='dense', ascending=False)
输出结果:
feature importance ranking
0 a 0.1 5
1 b 0.5 2
2 c 0.4 3
3 d 0.2 4
4 e 0.8 1
英文:
You can use rank
after normalizing with the Series' sum
:
df['ranking'] = (df['importance'].div(df['importance'].sum())
.rank(method='dense', ascending=False)
.astype(int) # optional
)
Note that dividing by a strictly positive integer won't change the rank, so if the sum is positive, you can simplify to:
df['ranking'] = df['importance'].rank(method='dense', ascending=False)
Output:
feature importance ranking
0 a 0.1 5
1 b 0.5 2
2 c 0.4 3
3 d 0.2 4
4 e 0.8 1
答案2
得分: 1
这可能看起来不是很高效,但这只是实现相同问题的另一种方式。
import pandas as pd
df = pd.DataFrame({'feature':['a','b','c','d','e'],
'importance':[0.1, 0.5, 0.4, 0.2, 0.8]})
df = df.sort_values(by='importance', ascending=False)
df["rating"] = range(1, len(df) + 1)
df = df.sort_index()
英文:
This may not seem very efficient, but this is just another way of achieveing the same problem.
import pandas as pd
df = pd.DataFrame({'feature':['a','b','c','d','e'],
'importance':[0.1, 0.5, 0.4, 0.2, 0.8]})
df = df.sort_values(by='importance', ascending=False)
df["rating"] = range(1, len(df) + 1)
df = df.sort_index()
答案3
得分: 1
另一种可能的解决方案:
df.assign(ranking = df.sort_values('importance', ascending=False).index + 1)
输出:
feature importance ranking
0 a 0.1 5
1 b 0.5 2
2 c 0.4 3
3 d 0.2 4
4 e 0.8 1
英文:
Another possible solution:
df.assign(ranking = df.sort_values('importance', ascending=False).index + 1)
Output:
feature importance ranking
0 a 0.1 5
1 b 0.5 2
2 c 0.4 3
3 d 0.2 4
4 e 0.8 1
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论