如何在Python中获取值的正确计数。

huangapple go评论97阅读模式
英文:

How to Get Proper Count of Values in Python

问题

我之前尝试提出这个问题,但传达了我遇到问题的地方。我有一个在Python中使用NumPy和Pandas的数据集,我正在尝试按职位类型获取报告的数量。共有100多个职位标题,所以我将缩短示例:

  1. ID Job_Title 报告数
  2. 1 销售经理 2
  3. 1 销售经理 2
  4. 2 技术支持 0
  5. 3 技术支持 1
  6. 3 技术支持 1
  7. 4 销售主管 4
  8. 4 销售主管 4
  9. 5 销售经理 5
  10. 6 技术支持 2

我想要准确计算每个职位的报告数。类似这样:

  1. 职位标题 报告数
  2. 销售经理 7
  3. 销售主管 4
  4. 技术支持 3

到目前为止,我有这个:

  1. df.groupby('Job_Title')['报告数'].count().sort_values(ascending=False)

这是我得到的结果:

  1. 职位标题 报告数
  2. 技术支持 4
  3. 销售经理 3
  4. 销售主管 2
英文:

I tried asking this question earlier and miscommunicated what I am having trouble with. I have a dataset in python that I am using numpy and pandas in and I am trying to get a count of reports by job type. There are are 100+ titles so I will shorten it for an example:

  1. ID Job_Title reports
  2. 1 Sales Manager 2
  3. 1 Sales Manager 2
  4. 2 Tech Support 0
  5. 3 Tech Support 1
  6. 3 Tech Support 1
  7. 4 Sales Lead 4
  8. 4 Sales Lead 4
  9. 5 Sales Manager 5
  10. 6 Tech Support 2

I would like to get a accurate count of the reports by position. Something Like this:

  1. Job_Title reports
  2. Sales Manager 7
  3. Sales Lead 4
  4. Tech Support 3

So far what I have is this:

  1. df.groupby('Job_Title')['reports'].count().sort_values(ascending = False)

And this is what I am getting:

  1. Job_Title reports
  2. Tech Support 4
  3. Sales Manager 3
  4. Sales Lead 2

答案1

得分: 1

为了获得期望的结果,在按Job_Title分组并求和reports值之前,您需要通过IDJob_Title删除重复项:

  1. (df.drop_duplicates(['ID', 'Job_Title'])
  2. .groupby('Job_Title', as_index=False)['reports'].sum()
  3. .sort_values(by='reports', ascending=False, ignore_index=True))
英文:

To get the expected result, you have to drop duplicates by ID and Job_Title before grouping by Job_Title and sum reports values:

  1. >>> (df.drop_duplicates(['ID', 'Job_Title'])
  2. .groupby('Job_Title', as_index=False)['reports'].sum()
  3. .sort_values(by='reports', ascending=False, ignore_index=True))
  4. Job_Title reports
  5. 0 Sales Manager 7
  6. 1 Sales Lead 4
  7. 2 Tech Support 3

答案2

得分: 1

试试这个:

  1. df_new = (df.groupby('Job_Title', as_index=False)['reports'].apply(lambda g: sum(set(g))).sort_values('reports', ascending=False, ignore_index=True))
  2. print(df_new)

输出:

  1. Job_Title reports
  2. 0 Sales Manager 7
  3. 1 Sales Lead 4
  4. 2 Tech Support 3
英文:

Try this:

  1. df_new = (df.groupby('Job_Title', as_index=False)['reports'].apply(lambda g: sum(set(g))).sort_values('reports', ascending = False, ignore_index=True))
  2. print(df_new)

Output:

  1. Job_Title reports
  2. 0 Sales Manager 7
  3. 1 Sales Lead 4
  4. 2 Tech Support 3

huangapple
  • 本文由 发表于 2023年3月9日 20:49:48
  • 转载请务必保留本文链接:https://go.coder-hub.com/75684859.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定