英文:
DataFrame groupby on each item within a column of lists
问题
我有一个数据框 (df
):
| A | B | C |
| --- | ----- | ----------------------- |
| CA | Jon | [sales, engineering] |
| NY | Sarah | [engineering, IT] |
| VA | Vox | [services, engineering] |
我尝试根据 C
列中的每个项目(sales, engineering, IT 等)进行分组。
尝试过:
df.groupby('C')
但出现了“list not hashable”的错误,这是预期的。我看到另一个帖子中建议将 C
列转换为可散列的元组,但我需要根据每个项目进行分组,而不是组合。
我的目标是获得 df
中每行对 C
列列表中每个项目的计数。所以:
sales: 1
engineering: 3
IT: 1
services: 1
虽然可能有比使用 groupby
更简单的方法来获得这个结果,但我仍然好奇是否可以在这种情况下使用 groupby
。
英文:
I have a dataframe (df
):
| A | B | C |
| --- | ----- | ----------------------- |
| CA | Jon | [sales, engineering] |
| NY | Sarah | [engineering, IT] |
| VA | Vox | [services, engineering] |
I am trying to group by each item in the C
column list (sales, engineering, IT, etc.).
Tried:
df.groupby('C')
but got list not hashable, which is expected. I came across another post where it was recommended to convert the C
column to tuple which is hashable, but I need to groupby each item and not the combination.
My goal is to get the count of each row in the df
for each item in the C
column list. So:
sales: 1
engineering: 3
IT: 1
services: 1
While there is probably a simpler way to obtain this than using groupby
, I am still curious if groupby
can be used in this case.
答案1
得分: 1
你可以使用 explode
和 value_counts
:
out = df.explode("C").value_counts("C")
输出:
print(out)
C
engineering 3
IT 1
sales 1
services 1
dtype: int64
英文:
You can explode
& value_counts
:
out = df.explode("C").value_counts("C")
Output :
print(out)
C
engineering 3
IT 1
sales 1
services 1
dtype: int64
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论