在Python中从数据集中的列创建计数器

huangapple go评论108阅读模式
英文:

Creating a counter from columns in a dataset in Python

问题

我之前提出了这个问题,但没有正确提问,所以我会再试一次。我有一个使用pandas和numpy的数据集,主要是尝试获取包含员工ID(也包括经理ID)的列:

  1. 员工ID 经理ID
  2. 1 3
  3. 2 3
  4. 3 5
  5. 4 3
  6. 5 7
  7. 6 7
  8. 7 10
  9. 8 3
  10. 9 7
  11. 10 -

我有类似这样的数据,我想要员工的经理ID,或者不是经理的员工,以显示有多少人向他们汇报,并且我想要一个像这样的计数器:

  1. 员工ID 经理ID 员工计数
  2. 1 3 0
  3. 2 3 0
  4. 3 5 4
  5. 4 3 0
  6. 5 7 1
  7. 6 7 0
  8. 7 10 3
  9. 8 3 0
  10. 9 7 0
  11. 10 - 1

我的代码看起来是这样的:

  1. df['员工计数'] = df.groupby('经理ID').transform('count')

当我运行这个代码时,我收到了一个错误:

值错误:无法将包含多列的数据框设置为单列'员工计数'。

我不确定我做错了什么,我对这方面还是相当新手。

英文:

So I asked this question before and didn't ask it correctly so I will try it again I have a dataset that uses pandas and numpy mainly I am trying to take columns with Employee IDs (Also includes managers' IDs):

  1. emp_id mgr_id
  2. 1 3
  3. 2 3
  4. 3 5
  5. 4 3
  6. 5 7
  7. 6 7
  8. 7 10
  9. 8 3
  10. 9 7
  11. 10 -

I have something similar to this where I want the ID of the employee manager or not to show how many people report to them and I want a counter like this:

  1. emp_id mgr_id emp_count
  2. 1 3 0
  3. 2 3 0
  4. 3 5 4
  5. 4 3 0
  6. 5 7 1
  7. 6 7 0
  8. 7 10 3
  9. 8 3 0
  10. 9 7 0
  11. 10 - 1

my code looks like this:

  1. df['emp_count'] = df.groupby('mgr_id').transform('count')

when I run this I get an error of:

> Value Error: Cannot set a dataframe with multiple columns to the single column emp_count.

I am not sure what I am doing wrong I am pretty novice to this.

答案1

得分: 2

让我们来做

  1. # 统计报告给经理的独特员工数量
  2. counts = df['emp_id'].groupby(df['mgr_id'].astype(str)).nunique()
  3. # 将计数映射到 emp_id 列
  4. df['emp_count'] = df['emp_id'].astype(str).map(counts).fillna(0, downcast='infer')

结果

  1. emp_id mgr_id emp_count
  2. 0 1 3 0
  3. 1 2 3 0
  4. 2 3 5 4
  5. 3 4 3 0
  6. 4 5 7 1
  7. 5 6 7 0
  8. 6 7 10 3
  9. 7 8 3 0
  10. 8 9 7 0
  11. 9 10 - 1
英文:

Lets do

  1. # Count of unique employee reporting to a manager
  2. counts = df['emp_id'].groupby(df['mgr_id'].astype(str)).nunique()
  3. # Map the counts to emp_id column
  4. df['emp_count'] = df['emp_id'].astype(str).map(counts).fillna(0, downcast='infer')

Result

  1. emp_id mgr_id emp_count
  2. 0 1 3 0
  3. 1 2 3 0
  4. 2 3 5 4
  5. 3 4 3 0
  6. 4 5 7 1
  7. 5 6 7 0
  8. 6 7 10 3
  9. 7 8 3 0
  10. 8 9 7 0
  11. 9 10 - 1

答案2

得分: 0

另一种方法,但不如 Shubham 的答案那样优雅

  1. df['emp_id'] = df['emp_id'].astype(str)
  2. df['mgr_id'] = df['mgr_id'].astype(str)
  3. df1 = df.groupby('mgr_id').count().reset_index().rename(columns={'emp_id': 'count', 'mgr_id': 'emp_id'})
  4. df2 = df.merge(df1, how='left', on='emp_id')
  5. df2['count'] = df2['count'].fillna(0)

结果

  1. emp_id mgr_id count
  2. 0 1 3 0
  3. 1 2 3 0
  4. 2 3 5 4
  5. 3 4 3 0
  6. 4 5 7 1
  7. 5 6 7 0
  8. 6 7 10 3
  9. 7 8 3 0
  10. 8 9 7 0
  11. 9 10 - 1
英文:

Another approch, but not as elegant as Shubham's answer

  1. df['emp_id'] = df['emp_id'].astype(str)
  2. df['mgr_id'] = df['mgr_id'].astype(str)
  3. df1 = df.groupby('mgr_id').count().reset_index().rename(columns={'emp_id': 'count', 'mgr_id': 'emp_id'})
  4. df2 = df.merge(df1, how='left', on='emp_id')
  5. df2['count'] = df2['count'].fillna(0)

Result

  1. enter code here
  2. emp_id mgr_id count
  3. 0 1 3 0
  4. 1 2 3 0
  5. 2 3 5 4
  6. 3 4 3 0
  7. 4 5 7 1
  8. 5 6 7 0
  9. 6 7 10 3
  10. 7 8 3 0
  11. 8 9 7 0
  12. 9 10 - 1

huangapple
  • 本文由 发表于 2023年2月24日 00:21:05
  • 转载请务必保留本文链接:https://go.coder-hub.com/75547531.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定