DataFrame按列中列表的每个项目进行分组

huangapple go评论87阅读模式
英文:

DataFrame groupby on each item within a column of lists

问题

我有一个数据框 (df):

  1. | A | B | C |
  2. | --- | ----- | ----------------------- |
  3. | CA | Jon | [sales, engineering] |
  4. | NY | Sarah | [engineering, IT] |
  5. | VA | Vox | [services, engineering] |

我尝试根据 C 列中的每个项目(sales, engineering, IT 等)进行分组。

尝试过:

  1. df.groupby('C')

但出现了“list not hashable”的错误,这是预期的。我看到另一个帖子中建议将 C 列转换为可散列的元组,但我需要根据每个项目进行分组,而不是组合。

我的目标是获得 df 中每行对 C 列列表中每个项目的计数。所以:

  1. sales: 1
  2. engineering: 3
  3. IT: 1
  4. services: 1

虽然可能有比使用 groupby 更简单的方法来获得这个结果,但我仍然好奇是否可以在这种情况下使用 groupby

英文:

I have a dataframe (df):

  1. | A | B | C |
  2. | --- | ----- | ----------------------- |
  3. | CA | Jon | [sales, engineering] |
  4. | NY | Sarah | [engineering, IT] |
  5. | VA | Vox | [services, engineering] |

I am trying to group by each item in the C column list (sales, engineering, IT, etc.).

Tried:

  1. df.groupby('C')

but got list not hashable, which is expected. I came across another post where it was recommended to convert the C column to tuple which is hashable, but I need to groupby each item and not the combination.

My goal is to get the count of each row in the df for each item in the C column list. So:

  1. sales: 1
  2. engineering: 3
  3. IT: 1
  4. services: 1

While there is probably a simpler way to obtain this than using groupby, I am still curious if groupby can be used in this case.

答案1

得分: 1

你可以使用 explodevalue_counts

  1. out = df.explode("C").value_counts("C")

输出:

  1. print(out)
  2. C
  3. engineering 3
  4. IT 1
  5. sales 1
  6. services 1
  7. dtype: int64
英文:

You can explode & value_counts :

  1. out = df.explode("C").value_counts("C")


Output :

  1. print(out)
  2. C
  3. engineering 3
  4. IT 1
  5. sales 1
  6. services 1
  7. dtype: int64

huangapple
  • 本文由 发表于 2023年2月6日 07:25:34
  • 转载请务必保留本文链接:https://go.coder-hub.com/75356215.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定