2023年5月30日 03:03:50go评论85阅读模式

英文:

Applying within group rankings for Pandas DataFrame

问题

问题描述要求人们选择3个项目并对它们进行1到3的排名。我的Pandas DataFrame 每行都包含所选项目的一个条目，其中第一列是人的姓名，第二列是该人对所选项目的排名。

类似于：

df = pd.DataFrame.from_dict({
  'Name':['Alice','Alice','Alice','Bob','Bob','Bob','Charlie','Charlie','Charlie'], 
  'Item':[...], 
  'Rank':[1,2,3,3,1,2,None,None,None]
})

问题是有些人没有为他们的项目指定排名，例如上面的DataFrame中的Charlie。对于这些人，我想要用有效的“随机”排名填充他们的排名。也就是说，只需为他们的每个项目分配1到3的唯一值。还有一些人只选择了2个项目，而且也忘了对项目进行排名，所以我需要能够处理可变数量的选择项目。

我尝试过进行累积求和，首先用1填充每个空值，然后在名字的分组中运行cumsum。

类似于（尽管我认为这远非正确方法）：

df.groupby('Name')['Rank'].cumsum()

此外，我明白通过迭代行可能很容易解决此问题。然而，由于这是Pandas，我正在寻找更有效的解决方案。

英文:

The problem statement has people choosing 3 items and ranking them 1 to 3. My Pandas DataFrame contains one row for each item selected, where the first column is the person's name and the second column is the ranking the person has given to the selection.

Something like:

df = pd.DataFrame.from_dict({
  &#39;Name&#39;:[&#39;Alice&#39;,&#39;Alice&#39;,&#39;Alice&#39;,&#39;Bob&#39;,&#39;Bob&#39;,&#39;Bob&#39;,&#39;Charlie&#39;,&#39;Charlie&#39;,&#39;Charlie&#39;], 
  &#39;Item&#39;:[...], 
  &#39;Rank&#39;:[1,2,3,3,1,2,None,None,None]
})

The issue is some people did not assign rankings to their items, for example Charlie in the DataFrame above. For these people, I want to fill in their rankings with a valid 'random' ranking. AKA, just give each of their items a unique value from 1 to 3. Also, some people only selected 2 items and also forgot to rank their items, so I need to be able a variable amount of items selected.

I was attempting to do a cumsum, where I first filled in each null value with 1, and then run the cumsum within each group of a groupby on the names.

Something like (although I think this is far from correct):

df.groupby(&#39;Name&#39;).cumsum(&#39;Rank&#39;)

Also, I understand this may be easy by iterating over the rows. However, this being Pandas I am looking for a more optimal solution.

答案1

得分: 0

你可以使用 groupby_cumcount：

df['Rank'] = df['Rank'].fillna(df.groupby('Name').cumcount().add(1))
print(df)
# 输出
      Name  Rank
0    Alice   1.0
1    Alice   2.0
2    Alice   3.0
3      Bob   3.0
4      Bob   1.0
5      Bob   2.0
6  Charlie   1.0
7  Charlie   2.0
8  Charlie   3.0

要使用随机排名，在之前使用 sample：

df['Rank'] = df['Rank'].fillna(df.sample(frac=1).groupby('Name').cumcount().add(1))
print(df)
# 输出
      Name  Rank
0    Alice   1.0
1    Alice   2.0
2    Alice   3.0
3      Bob   3.0
4      Bob   1.0
5      Bob   2.0
6  Charlie   3.0
7  Charlie   1.0
8  Charlie   2.0

英文:

You can use groupby_cumcount:

df[&#39;Rank&#39;] = df[&#39;Rank&#39;].fillna(df.groupby(&#39;Name&#39;).cumcount().add(1))
print(df)
# Output
      Name  Rank
0    Alice   1.0
1    Alice   2.0
2    Alice   3.0
3      Bob   3.0
4      Bob   1.0
5      Bob   2.0
6  Charlie   1.0
7  Charlie   2.0
8  Charlie   3.0

To use random ranking, use sample before:

df[&#39;Rank&#39;] = df[&#39;Rank&#39;].fillna(df.sample(frac=1).groupby(&#39;Name&#39;).cumcount().add(1))
print(df)
# Output
      Name  Rank
0    Alice   1.0
1    Alice   2.0
2    Alice   3.0
3      Bob   3.0
4      Bob   1.0
5      Bob   2.0
6  Charlie   3.0
7  Charlie   1.0
8  Charlie   2.0

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

使用Pandas DataFrame进行组内排名

问题

答案1

谷歌分析在Streamlit应用程序上无法正常工作

DynamoDB – GetItem 操作：提供的键元素与模式不匹配

Django 中的 related_name 参数名称冲突？

/v1/completions vs. /v1/chat/completions 终点

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。