英文:
How to Get Proper Count of Values in Python
问题
我之前尝试提出这个问题,但传达了我遇到问题的地方。我有一个在Python中使用NumPy和Pandas的数据集,我正在尝试按职位类型获取报告的数量。共有100多个职位标题,所以我将缩短示例:
ID Job_Title 报告数
1 销售经理 2
1 销售经理 2
2 技术支持 0
3 技术支持 1
3 技术支持 1
4 销售主管 4
4 销售主管 4
5 销售经理 5
6 技术支持 2
我想要准确计算每个职位的报告数。类似这样:
职位标题 报告数
销售经理 7
销售主管 4
技术支持 3
到目前为止,我有这个:
df.groupby('Job_Title')['报告数'].count().sort_values(ascending=False)
这是我得到的结果:
职位标题 报告数
技术支持 4
销售经理 3
销售主管 2
英文:
I tried asking this question earlier and miscommunicated what I am having trouble with. I have a dataset in python that I am using numpy and pandas in and I am trying to get a count of reports by job type. There are are 100+ titles so I will shorten it for an example:
ID Job_Title reports
1 Sales Manager 2
1 Sales Manager 2
2 Tech Support 0
3 Tech Support 1
3 Tech Support 1
4 Sales Lead 4
4 Sales Lead 4
5 Sales Manager 5
6 Tech Support 2
I would like to get a accurate count of the reports by position. Something Like this:
Job_Title reports
Sales Manager 7
Sales Lead 4
Tech Support 3
So far what I have is this:
df.groupby('Job_Title')['reports'].count().sort_values(ascending = False)
And this is what I am getting:
Job_Title reports
Tech Support 4
Sales Manager 3
Sales Lead 2
答案1
得分: 1
为了获得期望的结果,在按Job_Title
分组并求和reports
值之前,您需要通过ID
和Job_Title
删除重复项:
(df.drop_duplicates(['ID', 'Job_Title'])
.groupby('Job_Title', as_index=False)['reports'].sum()
.sort_values(by='reports', ascending=False, ignore_index=True))
英文:
To get the expected result, you have to drop duplicates by ID
and Job_Title
before grouping by Job_Title
and sum reports
values:
>>> (df.drop_duplicates(['ID', 'Job_Title'])
.groupby('Job_Title', as_index=False)['reports'].sum()
.sort_values(by='reports', ascending=False, ignore_index=True))
Job_Title reports
0 Sales Manager 7
1 Sales Lead 4
2 Tech Support 3
答案2
得分: 1
试试这个:
df_new = (df.groupby('Job_Title', as_index=False)['reports'].apply(lambda g: sum(set(g))).sort_values('reports', ascending=False, ignore_index=True))
print(df_new)
输出:
Job_Title reports
0 Sales Manager 7
1 Sales Lead 4
2 Tech Support 3
英文:
Try this:
df_new = (df.groupby('Job_Title', as_index=False)['reports'].apply(lambda g: sum(set(g))).sort_values('reports', ascending = False, ignore_index=True))
print(df_new)
Output:
Job_Title reports
0 Sales Manager 7
1 Sales Lead 4
2 Tech Support 3
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论