如何在Python中获取值的正确计数。

huangapple go评论63阅读模式
英文:

How to Get Proper Count of Values in Python

问题

我之前尝试提出这个问题,但传达了我遇到问题的地方。我有一个在Python中使用NumPy和Pandas的数据集,我正在尝试按职位类型获取报告的数量。共有100多个职位标题,所以我将缩短示例:

 ID       Job_Title        报告数  
 1        销售经理            2
 1        销售经理            2
 2        技术支持            0
 3        技术支持            1
 3        技术支持            1
 4        销售主管            4
 4        销售主管            4
 5        销售经理            5
 6        技术支持            2

我想要准确计算每个职位的报告数。类似这样:

 职位标题      报告数    
销售经理     7
销售主管     4
技术支持     3

到目前为止,我有这个:

 df.groupby('Job_Title')['报告数'].count().sort_values(ascending=False)

这是我得到的结果:

职位标题         报告数
技术支持         4
销售经理        3
销售主管        2
英文:

I tried asking this question earlier and miscommunicated what I am having trouble with. I have a dataset in python that I am using numpy and pandas in and I am trying to get a count of reports by job type. There are are 100+ titles so I will shorten it for an example:

 ID       Job_Title        reports  
 1        Sales Manager       2
 1        Sales Manager       2
 2        Tech Support        0
 3        Tech Support        1
 3        Tech Support        1
 4        Sales Lead          4
 4        Sales Lead          4
 5        Sales Manager       5
 6        Tech Support        2

I would like to get a accurate count of the reports by position. Something Like this:

 Job_Title      reports    
Sales Manager     7
Sales Lead        4
Tech Support      3

So far what I have is this:

 df.groupby('Job_Title')['reports'].count().sort_values(ascending = False) 

And this is what I am getting:

Job_Title         reports
Tech Support         4
Sales Manager        3
Sales Lead           2

答案1

得分: 1

为了获得期望的结果,在按Job_Title分组并求和reports值之前,您需要通过IDJob_Title删除重复项:

(df.drop_duplicates(['ID', 'Job_Title'])
   .groupby('Job_Title', as_index=False)['reports'].sum()
   .sort_values(by='reports', ascending=False, ignore_index=True))
英文:

To get the expected result, you have to drop duplicates by ID and Job_Title before grouping by Job_Title and sum reports values:

>>> (df.drop_duplicates(['ID', 'Job_Title'])
       .groupby('Job_Title', as_index=False)['reports'].sum()
       .sort_values(by='reports', ascending=False, ignore_index=True))

       Job_Title  reports
0  Sales Manager        7
1     Sales Lead        4
2   Tech Support        3

答案2

得分: 1

试试这个:

df_new = (df.groupby('Job_Title', as_index=False)['reports'].apply(lambda g: sum(set(g))).sort_values('reports', ascending=False, ignore_index=True))
print(df_new)

输出:

       Job_Title  reports
0  Sales Manager        7
1     Sales Lead        4
2   Tech Support        3
英文:

Try this:

df_new = (df.groupby('Job_Title', as_index=False)['reports'].apply(lambda g: sum(set(g))).sort_values('reports', ascending = False, ignore_index=True))
print(df_new)

Output:

       Job_Title  reports
0  Sales Manager        7
1     Sales Lead        4
2   Tech Support        3

huangapple
  • 本文由 发表于 2023年3月9日 20:49:48
  • 转载请务必保留本文链接:https://go.coder-hub.com/75684859.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定