如何在 pandas 中使用 groupby 与 filter?

huangapple go评论85阅读模式
英文:

How to use groupby with filter in pandas?

问题

只统计成功通过一门考试的学生数量。成功通过的标准是获得40分或更多。

示例表格

学生  考试   分数
123     数学   42
123     IT     39
321     数学   12
321     IT     11
333     IT     66
333     数学   77

对于此示例:

学生数量 = 1 #答案
学生123成功通过1门考试
学生333成功通过2门考试
学生321没有通过任何考试

使用 groupby() 但无法想象 filter()

英文:

I have a table of students. How we can find count of students with only 1 successfully passed exam? Successfully passed - get 40 or more points.

Example table

student exam   score
123     Math   42
123     IT     39
321     Math   12
321     IT     11
333     IT     66
333     Math   77

For this example:

count of students = 1 #ans
student 123 has 1 succeeded passed exams
student 333 has 2 succeeded passed exams
student 321 0 exams passed

used groupby() but can't imagine filter()

答案1

得分: 0

你可以这样做

```python
out = (df['score'].ge(40)             # 分数是否大于等于40?
       .groupby(df['student']).sum()  # 每个学生通过了几门考试?(分数大于等于40?)
       .eq(1).sum())                  # 仅通过了1门考试的学生有多少?
print(out)

1

如果你想要获取只通过了1门考试的学生的学生ID

out = (df['score'].ge(40)
       .groupby(df['student']).sum()
       .eq(1).loc[lambda s: s].index.tolist())
print(out)

[123]

如果你想要找到只有一门考试通过的学生以及他们的考试和分数,你可以使用 groupby.filter

out = df.groupby('student').filter(lambda g: g['score'].ge(40).sum() == 1)
   student  exam  score
0      123  Math     42
1      123    IT     39
英文:

You can do

out = (df['score'].ge(40)             # Is the score greater and equal than 40?
       .groupby(df['student']).sum()  # For each student, how many exams does he pass? (greater and equal than 40?)
       .eq(1).sum())                  # How many students only pass 1 exam?
print(out)

1

If you want to get the student id who only passes 1 exam

out = (df['score'].ge(40)
       .groupby(df['student']).sum()
       .eq(1).loc[lambda s: s].index.tolist())
print(out)

[123]

If you want to find the student with his exam and score, you can use groupby.filter

out = df.groupby('student').filter(lambda g: g['score'].ge(40).sum() == 1)
   student  exam  score
0      123  Math     42
1      123    IT     39

答案2

得分: 0

import pandas as pd

数据={
    "学生":[123, 123, 321, 321, 333, 333],
    "考试科目":["数学", "IT", "数学", "IT", "IT", "数学"],
    "分数":[42, 39, 12, 11, 66, 77],
}

df=pd.DataFrame(数据)

# 按学生分组并计算分数>=40的考试数量
通过考试数=df[df['分数']>=40].groupby('学生').size()

print(通过考试数)

结果=len(通过考试数[通过考试数==1]) 

print("只通过一门考试的学生:", 结果)
英文:

Code:

import pandas as pd

data={
    "student":[123, 123, 321, 321, 333, 333],
    "exam":["Math", "IT", "Math", "IT", "IT", "Math"],
    "score":[42, 39, 12, 11, 66, 77],
}

df=pd.DataFrame(data)

# groupby() student and count the number of exams with [score>=40]
num_passed_exams=df[df['score']>=40].groupby('student').size()

print(num_passed_exams)

res=len(num_passed_exams[num_passed_exams==1]) 

print("Student who passed at exactly one exam:",res) 

Output:

student
123    1
333    2
dtype: int64
Student who passed at exactly one exam: 1

答案3

得分: 0

这是你想要的:

df_grouped = df.groupby("student")["score"].agg(lambda x: (x > 40).sum())
df_grouped[df_grouped == 1]

如果你只想要计数:只需使用 len(df_grouped[df_grouped == 1])

英文:

I think this is what you want:

df_grouped = df.groupby("student")["score"].agg(lambda x: (x > 40).sum())
df_grouped[df_grouped == 1]

If you just want the count: just use len(df_grouped[df_grouped == 1])

huangapple
  • 本文由 发表于 2023年4月10日 18:47:08
  • 转载请务必保留本文链接:https://go.coder-hub.com/75976420.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定