2023年4月10日 18:47:08go评论95阅读模式

英文:

How to use groupby with filter in pandas?

问题

只统计成功通过一门考试的学生数量。成功通过的标准是获得40分或更多。

示例表格

学生  考试   分数
123     数学   42
123     IT     39
321     数学   12
321     IT     11
333     IT     66
333     数学   77

对于此示例：

学生数量 = 1 #答案
学生123成功通过1门考试
学生333成功通过2门考试
学生321没有通过任何考试

使用 groupby() 但无法想象 filter()

英文:

I have a table of students. How we can find count of students with only 1 successfully passed exam? Successfully passed - get 40 or more points.

Example table

student exam   score
123     Math   42
123     IT     39
321     Math   12
321     IT     11
333     IT     66
333     Math   77

For this example:

count of students = 1 #ans
student 123 has 1 succeeded passed exams
student 333 has 2 succeeded passed exams
student 321 0 exams passed

used groupby() but can't imagine filter()

答案1

得分: 0

你可以这样做

```python
out = (df['score'].ge(40)             # 分数是否大于等于40？
       .groupby(df['student']).sum()  # 每个学生通过了几门考试？（分数大于等于40？）
       .eq(1).sum())                  # 仅通过了1门考试的学生有多少？

print(out)

1

如果你想要获取只通过了1门考试的学生的学生ID

out = (df['score'].ge(40)
       .groupby(df['student']).sum()
       .eq(1).loc[lambda s: s].index.tolist())

print(out)

[123]

如果你想要找到只有一门考试通过的学生以及他们的考试和分数，你可以使用 groupby.filter

out = df.groupby('student').filter(lambda g: g['score'].ge(40).sum() == 1)

   student  exam  score
0      123  Math     42
1      123    IT     39

英文:

You can do

out = (df[&#39;score&#39;].ge(40)             # Is the score greater and equal than 40?
       .groupby(df[&#39;student&#39;]).sum()  # For each student, how many exams does he pass? (greater and equal than 40?)
       .eq(1).sum())                  # How many students only pass 1 exam?

print(out)

1

If you want to get the student id who only passes 1 exam

out = (df[&#39;score&#39;].ge(40)
       .groupby(df[&#39;student&#39;]).sum()
       .eq(1).loc[lambda s: s].index.tolist())

print(out)

[123]

If you want to find the student with his exam and score, you can use groupby.filter

out = df.groupby(&#39;student&#39;).filter(lambda g: g[&#39;score&#39;].ge(40).sum() == 1)

   student  exam  score
0      123  Math     42
1      123    IT     39

答案2

得分: 0

import pandas as pd

数据={
    "学生":[123, 123, 321, 321, 333, 333],
    "考试科目":["数学", "IT", "数学", "IT", "IT", "数学"],
    "分数":[42, 39, 12, 11, 66, 77],
}

df=pd.DataFrame(数据)

# 按学生分组并计算分数>=40的考试数量
通过考试数=df[df['分数']>=40].groupby('学生').size()

print(通过考试数)

结果=len(通过考试数[通过考试数==1]) 

print("只通过一门考试的学生:", 结果)

英文:

Code:

import pandas as pd

data={
    &quot;student&quot;:[123, 123, 321, 321, 333, 333],
    &quot;exam&quot;:[&quot;Math&quot;, &quot;IT&quot;, &quot;Math&quot;, &quot;IT&quot;, &quot;IT&quot;, &quot;Math&quot;],
    &quot;score&quot;:[42, 39, 12, 11, 66, 77],
}

df=pd.DataFrame(data)

# groupby() student and count the number of exams with [score&gt;=40]
num_passed_exams=df[df[&#39;score&#39;]&gt;=40].groupby(&#39;student&#39;).size()

print(num_passed_exams)

res=len(num_passed_exams[num_passed_exams==1]) 

print(&quot;Student who passed at exactly one exam:&quot;,res)

Output:

student
123    1
333    2
dtype: int64
Student who passed at exactly one exam: 1

答案3

得分: 0

这是你想要的：

df_grouped = df.groupby("student")["score"].agg(lambda x: (x > 40).sum())
df_grouped[df_grouped == 1]

如果你只想要计数：只需使用 len(df_grouped[df_grouped == 1])。

英文:

I think this is what you want:

df_grouped = df.groupby(&quot;student&quot;)[&quot;score&quot;].agg(lambda x: (x &gt; 40).sum())
df_grouped[df_grouped == 1]

If you just want the count: just use len(df_grouped[df_grouped == 1])

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何在 pandas 中使用 groupby 与 filter？

问题

答案1

答案2

答案3

最佳方法删除字典中的嵌套键：

drf_spectacular.utils.PolymorphicProxySerializer.init() got an unexpected keyword argument 'context'

重采样数据框以添加缺失的日期。

无法使用Azure注册登录IMP4 Outlook电子邮件。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论