2023年6月9日 08:30:17go评论95阅读模式

英文:

How to label each group with df.groupby() in Python pandas?

问题

考虑到我们有一个如下所示的pandas数据框：

   Questions  cnt similarity
0       ABC    1  [1, 2, 3]
1       abc    2  [1, 2, 3]
2       cba    3  [2, 3, 1]
3      abcd    4  [4, 5, 6]
4      dcsa    5  [2, 3, 1]
5      adcd    6  [4, 5, 6]
6      abcd    7  [1, 2, 3]
7       cba    8  [7, 8, 9]

我必须根据similarity列添加另一列cat。如果两行具有相同的similarity，则将它们归类为同一组。以下是预期输出。任何输入都是有价值的。值得一提的是，原始数据集有1M行。谢谢。

  Questions  cnt similarity  cat
0       ABC    1  [1, 2, 3]    1
1       abc    2  [1, 2, 3]    1
2       cba    3  [2, 3, 1]    2
3      abcd    4  [4, 5, 6]    3
4      dcsa    5  [2, 3, 1]    2
5      adcd    6  [4, 5, 6]    3
6      abcd    7  [1, 2, 3]    1
7       cba    8  [7, 8, 9]    4

英文:

Note: this question can be associated with one existing question here. However, my question provides a more concrete example and has broader impact.

Consider we have a pandas data frame as following:

   Questions  cnt similarity
0       ABC    1  [1, 2, 3]
1       abc    2  [1, 2, 3]
2       cba    3  [2, 3, 1]
3      abcd    4  [4, 5, 6]
4      dcsa    5  [2, 3, 1]
5      adcd    6  [4, 5, 6]
6      abcd    7  [1, 2, 3]
7       cba    8  [7, 8, 9]

I have to add another column called cat based on the similarity column. If two rows have the same similarity, then categorize them as the same group. Below is the expected output. Any input is valuable. It is worth mentioning that the original dataset has 1M rows. Thank you.

  Questions  cnt similarity  cat
0       ABC    1  [1, 2, 3]    1
1       abc    2  [1, 2, 3]    1
2       cba    3  [2, 3, 1]    2
3      abcd    4  [4, 5, 6]    3
4      dcsa    5  [2, 3, 1]    2
5      adcd    6  [4, 5, 6]    3
6      abcd    7  [1, 2, 3]    1
7       cba    8  [7, 8, 9]    4

答案1

得分: 3

IIUC，您可以使用 pd.factorize：

df["cat"] = pd.factorize(df["similarity"].astype(str))[0] + 1

输出：

print(df)

  Questions  cnt similarity  cat
0       ABC    1  [1, 2, 3]    1
1       abc    2  [1, 2, 3]    1
2       cba    3  [2, 3, 1]    2
3      abcd    4  [4, 5, 6]    3
4      dcsa    5  [2, 3, 1]    2
5      adcd    6  [4, 5, 6]    3
6      abcd    7  [1, 2, 3]    1
7       cba    8  [7, 8, 9]    4

英文:

IIUC, you can use pd.factorize :

df[&quot;cat&quot;] = pd.factorize(df[&quot;similarity&quot;].astype(str))[0] + 1

Output :

print(df)

  Questions  cnt similarity  cat
0       ABC    1  [1, 2, 3]    1
1       abc    2  [1, 2, 3]    1
2       cba    3  [2, 3, 1]    2
3      abcd    4  [4, 5, 6]    3
4      dcsa    5  [2, 3, 1]    2
5      adcd    6  [4, 5, 6]    3
6      abcd    7  [1, 2, 3]    1
7       cba    8  [7, 8, 9]    4

答案2

得分: 2

One way is to use groupby.ngroup():

df['cat'] = df.groupby('similarity').ngroup() + 1

  Questions  cnt similarity  cat
0       ABC    1  [1, 2, 3]    1
1       abc    2  [1, 2, 3]    1
2       cba    3  [2, 3, 1]    2
3      abcd    4  [4, 5, 6]    3
4      dcsa    5  [2, 3, 1]    2
5      adcd    6  [4, 5, 6]    3
6      abcd    7  [1, 2, 3]    1
7       cba    8  [7, 8, 9]    4

英文:

One way is to use groupby.ngroup():

df[&#39;cat&#39;] = df.groupby(&#39;similarity&#39;).ngroup()+1

  Questions  cnt similarity  cat
0       ABC    1  [1, 2, 3]    1
1       abc    2  [1, 2, 3]    1
2       cba    3  [2, 3, 1]    2
3      abcd    4  [4, 5, 6]    3
4      dcsa    5  [2, 3, 1]    2
5      adcd    6  [4, 5, 6]    3
6      abcd    7  [1, 2, 3]    1
7       cba    8  [7, 8, 9]    4

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何使用Python pandas中的df.groupby()为每个组添加标签？

问题

答案1

答案2

geom_raster基于特定的离散值着色

Python 请求，在 `del` 中的请求

如何删除分隔符的最后一个出现位置之后的所有内容？

Pandas多值上的日期范围合并

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论