英文:
How to use groupby with filter in pandas?
问题
只统计成功通过一门考试的学生数量。成功通过的标准是获得40分或更多。
示例表格
学生 考试 分数
123 数学 42
123 IT 39
321 数学 12
321 IT 11
333 IT 66
333 数学 77
对于此示例:
学生数量 = 1 #答案
学生123成功通过1门考试
学生333成功通过2门考试
学生321没有通过任何考试
使用 groupby()
但无法想象 filter()
英文:
I have a table of students. How we can find count of students with only 1 successfully passed exam? Successfully passed - get 40 or more points.
Example table
student exam score
123 Math 42
123 IT 39
321 Math 12
321 IT 11
333 IT 66
333 Math 77
For this example:
count of students = 1 #ans
student 123 has 1 succeeded passed exams
student 333 has 2 succeeded passed exams
student 321 0 exams passed
used groupby()
but can't imagine filter()
答案1
得分: 0
你可以这样做
```python
out = (df['score'].ge(40) # 分数是否大于等于40?
.groupby(df['student']).sum() # 每个学生通过了几门考试?(分数大于等于40?)
.eq(1).sum()) # 仅通过了1门考试的学生有多少?
print(out)
1
如果你想要获取只通过了1门考试的学生的学生ID
out = (df['score'].ge(40)
.groupby(df['student']).sum()
.eq(1).loc[lambda s: s].index.tolist())
print(out)
[123]
如果你想要找到只有一门考试通过的学生以及他们的考试和分数,你可以使用 groupby.filter
out = df.groupby('student').filter(lambda g: g['score'].ge(40).sum() == 1)
student exam score
0 123 Math 42
1 123 IT 39
英文:
You can do
out = (df['score'].ge(40) # Is the score greater and equal than 40?
.groupby(df['student']).sum() # For each student, how many exams does he pass? (greater and equal than 40?)
.eq(1).sum()) # How many students only pass 1 exam?
print(out)
1
If you want to get the student id who only passes 1 exam
out = (df['score'].ge(40)
.groupby(df['student']).sum()
.eq(1).loc[lambda s: s].index.tolist())
print(out)
[123]
If you want to find the student with his exam and score, you can use groupby.filter
out = df.groupby('student').filter(lambda g: g['score'].ge(40).sum() == 1)
student exam score
0 123 Math 42
1 123 IT 39
答案2
得分: 0
import pandas as pd
数据={
"学生":[123, 123, 321, 321, 333, 333],
"考试科目":["数学", "IT", "数学", "IT", "IT", "数学"],
"分数":[42, 39, 12, 11, 66, 77],
}
df=pd.DataFrame(数据)
# 按学生分组并计算分数>=40的考试数量
通过考试数=df[df['分数']>=40].groupby('学生').size()
print(通过考试数)
结果=len(通过考试数[通过考试数==1])
print("只通过一门考试的学生:", 结果)
英文:
Code:
import pandas as pd
data={
"student":[123, 123, 321, 321, 333, 333],
"exam":["Math", "IT", "Math", "IT", "IT", "Math"],
"score":[42, 39, 12, 11, 66, 77],
}
df=pd.DataFrame(data)
# groupby() student and count the number of exams with [score>=40]
num_passed_exams=df[df['score']>=40].groupby('student').size()
print(num_passed_exams)
res=len(num_passed_exams[num_passed_exams==1])
print("Student who passed at exactly one exam:",res)
Output:
student
123 1
333 2
dtype: int64
Student who passed at exactly one exam: 1
答案3
得分: 0
这是你想要的:
df_grouped = df.groupby("student")["score"].agg(lambda x: (x > 40).sum())
df_grouped[df_grouped == 1]
如果你只想要计数:只需使用 len(df_grouped[df_grouped == 1])
。
英文:
I think this is what you want:
df_grouped = df.groupby("student")["score"].agg(lambda x: (x > 40).sum())
df_grouped[df_grouped == 1]
If you just want the count: just use len(df_grouped[df_grouped == 1])
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论