DataFrame按列中列表的每个项目进行分组

huangapple go评论61阅读模式
英文:

DataFrame groupby on each item within a column of lists

问题

我有一个数据框 (df):

| A   | B     | C                       |
| --- | ----- | ----------------------- |
| CA  | Jon   | [sales, engineering]    |
| NY  | Sarah | [engineering, IT]       |
| VA  | Vox   | [services, engineering] |

我尝试根据 C 列中的每个项目(sales, engineering, IT 等)进行分组。

尝试过:

df.groupby('C')

但出现了“list not hashable”的错误,这是预期的。我看到另一个帖子中建议将 C 列转换为可散列的元组,但我需要根据每个项目进行分组,而不是组合。

我的目标是获得 df 中每行对 C 列列表中每个项目的计数。所以:

sales: 1
engineering: 3
IT: 1
services: 1

虽然可能有比使用 groupby 更简单的方法来获得这个结果,但我仍然好奇是否可以在这种情况下使用 groupby

英文:

I have a dataframe (df):

| A   | B     | C                       |
| --- | ----- | ----------------------- |
| CA  | Jon   | [sales, engineering]    |
| NY  | Sarah | [engineering, IT]       |
| VA  | Vox   | [services, engineering] |

I am trying to group by each item in the C column list (sales, engineering, IT, etc.).

Tried:

df.groupby('C')

but got list not hashable, which is expected. I came across another post where it was recommended to convert the C column to tuple which is hashable, but I need to groupby each item and not the combination.

My goal is to get the count of each row in the df for each item in the C column list. So:

sales: 1
engineering: 3
IT: 1
services: 1

While there is probably a simpler way to obtain this than using groupby, I am still curious if groupby can be used in this case.

答案1

得分: 1

你可以使用 explodevalue_counts

out = df.explode("C").value_counts("C")

输出:

print(out)
    
C          
engineering    3
IT             1
sales          1
services       1
dtype: int64
英文:

You can explode & value_counts :

out = df.explode("C").value_counts("C")


Output :

print(out)

C          
engineering    3
IT             1
sales          1
services       1
dtype: int64

huangapple
  • 本文由 发表于 2023年2月6日 07:25:34
  • 转载请务必保留本文链接:https://go.coder-hub.com/75356215.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定