英文:
Applying within group rankings for Pandas DataFrame
问题
问题描述要求人们选择3个项目并对它们进行1到3的排名。我的Pandas DataFrame 每行都包含所选项目的一个条目,其中第一列是人的姓名,第二列是该人对所选项目的排名。
类似于:
df = pd.DataFrame.from_dict({
'Name':['Alice','Alice','Alice','Bob','Bob','Bob','Charlie','Charlie','Charlie'],
'Item':[...],
'Rank':[1,2,3,3,1,2,None,None,None]
})
问题是有些人没有为他们的项目指定排名,例如上面的DataFrame中的Charlie。对于这些人,我想要用有效的“随机”排名填充他们的排名。也就是说,只需为他们的每个项目分配1到3的唯一值。还有一些人只选择了2个项目,而且也忘了对项目进行排名,所以我需要能够处理可变数量的选择项目。
我尝试过进行累积求和,首先用1填充每个空值,然后在名字的分组中运行cumsum。
类似于(尽管我认为这远非正确方法):
df.groupby('Name')['Rank'].cumsum()
此外,我明白通过迭代行可能很容易解决此问题。然而,由于这是Pandas,我正在寻找更有效的解决方案。
英文:
The problem statement has people choosing 3 items and ranking them 1 to 3. My Pandas DataFrame contains one row for each item selected, where the first column is the person's name and the second column is the ranking the person has given to the selection.
Something like:
df = pd.DataFrame.from_dict({
'Name':['Alice','Alice','Alice','Bob','Bob','Bob','Charlie','Charlie','Charlie'],
'Item':[...],
'Rank':[1,2,3,3,1,2,None,None,None]
})
The issue is some people did not assign rankings to their items, for example Charlie in the DataFrame above. For these people, I want to fill in their rankings with a valid 'random' ranking. AKA, just give each of their items a unique value from 1 to 3. Also, some people only selected 2 items and also forgot to rank their items, so I need to be able a variable amount of items selected.
I was attempting to do a cumsum, where I first filled in each null value with 1, and then run the cumsum within each group of a groupby on the names.
Something like (although I think this is far from correct):
df.groupby('Name').cumsum('Rank')
Also, I understand this may be easy by iterating over the rows. However, this being Pandas I am looking for a more optimal solution.
答案1
得分: 0
你可以使用 groupby_cumcount
:
df['Rank'] = df['Rank'].fillna(df.groupby('Name').cumcount().add(1))
print(df)
# 输出
Name Rank
0 Alice 1.0
1 Alice 2.0
2 Alice 3.0
3 Bob 3.0
4 Bob 1.0
5 Bob 2.0
6 Charlie 1.0
7 Charlie 2.0
8 Charlie 3.0
要使用随机排名,在之前使用 sample
:
df['Rank'] = df['Rank'].fillna(df.sample(frac=1).groupby('Name').cumcount().add(1))
print(df)
# 输出
Name Rank
0 Alice 1.0
1 Alice 2.0
2 Alice 3.0
3 Bob 3.0
4 Bob 1.0
5 Bob 2.0
6 Charlie 3.0
7 Charlie 1.0
8 Charlie 2.0
英文:
You can use groupby_cumcount
:
df['Rank'] = df['Rank'].fillna(df.groupby('Name').cumcount().add(1))
print(df)
# Output
Name Rank
0 Alice 1.0
1 Alice 2.0
2 Alice 3.0
3 Bob 3.0
4 Bob 1.0
5 Bob 2.0
6 Charlie 1.0
7 Charlie 2.0
8 Charlie 3.0
To use random ranking, use sample
before:
df['Rank'] = df['Rank'].fillna(df.sample(frac=1).groupby('Name').cumcount().add(1))
print(df)
# Output
Name Rank
0 Alice 1.0
1 Alice 2.0
2 Alice 3.0
3 Bob 3.0
4 Bob 1.0
5 Bob 2.0
6 Charlie 3.0
7 Charlie 1.0
8 Charlie 2.0
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论