2023年2月6日 07:25:34go评论87阅读模式

英文:

DataFrame groupby on each item within a column of lists

问题

我有一个数据框 (df)：

| A   | B     | C                       |
| --- | ----- | ----------------------- |
| CA  | Jon   | [sales, engineering]    |
| NY  | Sarah | [engineering, IT]       |
| VA  | Vox   | [services, engineering] |

我尝试根据 C 列中的每个项目（sales, engineering, IT 等）进行分组。

尝试过：

df.groupby('C')

但出现了“list not hashable”的错误，这是预期的。我看到另一个帖子中建议将 C 列转换为可散列的元组，但我需要根据每个项目进行分组，而不是组合。

我的目标是获得 df 中每行对 C 列列表中每个项目的计数。所以：

sales: 1
engineering: 3
IT: 1
services: 1

虽然可能有比使用 groupby 更简单的方法来获得这个结果，但我仍然好奇是否可以在这种情况下使用 groupby。

英文:

I have a dataframe (df):

| A   | B     | C                       |
| --- | ----- | ----------------------- |
| CA  | Jon   | [sales, engineering]    |
| NY  | Sarah | [engineering, IT]       |
| VA  | Vox   | [services, engineering] |

I am trying to group by each item in the C column list (sales, engineering, IT, etc.).

Tried:

df.groupby(&#39;C&#39;)

but got list not hashable, which is expected. I came across another post where it was recommended to convert the C column to tuple which is hashable, but I need to groupby each item and not the combination.

My goal is to get the count of each row in the df for each item in the C column list. So:

sales: 1
engineering: 3
IT: 1
services: 1

While there is probably a simpler way to obtain this than using groupby, I am still curious if groupby can be used in this case.

答案1

得分: 1

你可以使用 explode 和 value_counts ：

out = df.explode("C").value_counts("C")

输出：

print(out)
    
C          
engineering    3
IT             1
sales          1
services       1
dtype: int64

英文:

You can explode & value_counts :

out = df.explode(&quot;C&quot;).value_counts(&quot;C&quot;)

Output :

print(out)
C          
engineering    3
IT             1
sales          1
services       1
dtype: int64

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

DataFrame按列中列表的每个项目进行分组

问题

答案1

如何使用Openpyxl在Excel中使用行数据制作饼图？

pySpark的长度大于使用pandas时

为什么在scipy.optimize.minimize中约束失败？

为什么RecursiveCharacterTextSplitter没有提供任何块重叠？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。