2023年3月9日 20:49:48go评论98阅读模式

英文:

How to Get Proper Count of Values in Python

问题

我之前尝试提出这个问题，但传达了我遇到问题的地方。我有一个在Python中使用NumPy和Pandas的数据集，我正在尝试按职位类型获取报告的数量。共有100多个职位标题，所以我将缩短示例：

 ID       Job_Title        报告数  
 1        销售经理            2
 1        销售经理            2
 2        技术支持            0
 3        技术支持            1
 3        技术支持            1
 4        销售主管            4
 4        销售主管            4
 5        销售经理            5
 6        技术支持            2

我想要准确计算每个职位的报告数。类似这样：

 职位标题      报告数    
销售经理     7
销售主管     4
技术支持     3

到目前为止，我有这个：

 df.groupby('Job_Title')['报告数'].count().sort_values(ascending=False)

这是我得到的结果：

职位标题         报告数
技术支持         4
销售经理        3
销售主管        2

英文:

I tried asking this question earlier and miscommunicated what I am having trouble with. I have a dataset in python that I am using numpy and pandas in and I am trying to get a count of reports by job type. There are are 100+ titles so I will shorten it for an example:

 ID       Job_Title        reports  
 1        Sales Manager       2
 1        Sales Manager       2
 2        Tech Support        0
 3        Tech Support        1
 3        Tech Support        1
 4        Sales Lead          4
 4        Sales Lead          4
 5        Sales Manager       5
 6        Tech Support        2

I would like to get a accurate count of the reports by position. Something Like this:

 Job_Title      reports    
Sales Manager     7
Sales Lead        4
Tech Support      3

So far what I have is this:

 df.groupby(&#39;Job_Title&#39;)[&#39;reports&#39;].count().sort_values(ascending = False)

And this is what I am getting:

Job_Title         reports
Tech Support         4
Sales Manager        3
Sales Lead           2

答案1

得分: 1

为了获得期望的结果，在按Job_Title分组并求和reports值之前，您需要通过ID和Job_Title删除重复项：

(df.drop_duplicates(['ID', 'Job_Title'])
   .groupby('Job_Title', as_index=False)['reports'].sum()
   .sort_values(by='reports', ascending=False, ignore_index=True))

英文:

To get the expected result, you have to drop duplicates by ID and Job_Title before grouping by Job_Title and sum reports values:

&gt;&gt;&gt; (df.drop_duplicates([&#39;ID&#39;, &#39;Job_Title&#39;])
       .groupby(&#39;Job_Title&#39;, as_index=False)[&#39;reports&#39;].sum()
       .sort_values(by=&#39;reports&#39;, ascending=False, ignore_index=True))
       Job_Title  reports
0  Sales Manager        7
1     Sales Lead        4
2   Tech Support        3

答案2

得分: 1

试试这个：

df_new = (df.groupby('Job_Title', as_index=False)['reports'].apply(lambda g: sum(set(g))).sort_values('reports', ascending=False, ignore_index=True))
print(df_new)

输出：

       Job_Title  reports
0  Sales Manager        7
1     Sales Lead        4
2   Tech Support        3

英文:

Try this:

df_new = (df.groupby(&#39;Job_Title&#39;, as_index=False)[&#39;reports&#39;].apply(lambda g: sum(set(g))).sort_values(&#39;reports&#39;, ascending = False, ignore_index=True))
print(df_new)

Output:

       Job_Title  reports
0  Sales Manager        7
1     Sales Lead        4
2   Tech Support        3

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何在Python中获取值的正确计数。

问题

答案1

答案2

xlsxwriter设置num_format = “0!.0,”，但当我在Excel中打开它时，格式显示为0!!.0。

Python函数始终缺少所需的参数，尽管已编写。

导入一个带有其依赖项的类

删除列表中的每个其他元素

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。