英文:
Pandas change value if muliple conditions across multiple rows are met
问题
我正在使用Python和pandas来检查学生的学术历史记录。我想要计算学生获得的及格次数。历史记录可能如下所示。
df = pd.DataFrame({
'Course': ['A', 'B', 'C', 'D', 'E','F','G'],
'Result': ['Pass', 'Fail', 'Pass', 'Fail', 'Pass','Fail','Pass'],
'Superseded_by':['E','E','F','F','','','H']
})
print(df)
Course Result Superseded_by
0 A Pass E
1 B Fail E
2 C Pass F
3 D Fail F
4 E Pass
5 F Fail
6 G Pass H
问题在于较早的课程会被较新的课程所取代,所以我的代码不能同时计数两者。因此,我希望仅在以下情况下将课程的结果更改为'Superseded':
- 课程结果为Passed
- Superseded_by列中的课程存在于学生记录中
- Superseded_by课程也已通过
运行此代码的输出如下:
Course Result Superseded_by
0 A Superseded E
1 B Fail E
2 C Pass F
3 D Fail F
4 E Pass
5 F Fail
6 G Pass H
课程A通过,被E所取代,E也通过。所有其他课程未通过测试,因此保持不变。请注意,在这种情况下,我没有更改学生的正式记录,因此更改Result列是合适的。
在这里,我将提供最有效的方法来执行这个任务:
import pandas as pd
df = pd.DataFrame({
'Course': ['A', 'B', 'C', 'D', 'E', 'F', 'G'],
'Result': ['Pass', 'Fail', 'Pass', 'Fail', 'Pass', 'Fail', 'Pass'],
'Superseded_by': ['E', 'E', 'F', 'F', '', '', 'H']
})
# Create a set of courses that are both passed and have a superseding course
passed_superseded_courses = set(
df.loc[df['Result'] == 'Pass']
.loc[df['Superseded_by'].isin(df['Course'])]
.loc[df['Superseded_by'].map(df.set_index('Course')['Result']) == 'Pass']
['Course']
)
# Update the 'Result' column based on the conditions
df['Result'] = df.apply(lambda row: 'Superseded' if row['Course'] in passed_superseded_courses else row['Result'], axis=1)
print(df)
这将按照你的要求更新DataFrame,只有符合条件的课程会被更改为'Superseded'。
英文:
I am using python and pandas to check student's academic histories. I want to count the number of passes a student has got. The history might look something like this.
df = pd.DataFrame({
'Course': ['A', 'B', 'C', 'D', 'E','F','G'],
'Result': ['Pass', 'Fail', 'Pass', 'Fail', 'Pass','Fail','Pass'],
'Superseded_by':['E','E','F','F','','','H']
})
print(df)
Course Result Superseded_by
0 A Pass E
1 B Fail E
2 C Pass F
3 D Fail F
4 E Pass
5 F Fail
6 G Pass H
The problem is that older courses get superseded by newer ones and so my code mustn't count both. So I want to alter a Course's Result to 'Superseded' if and only if
- The Course Result is Passed
- The Course in the Superseded_by column exists in the student record
- The Superseded_by course is also Passed
The output of running this would look like this
Course Result Superceded_by
0 A Superceded E
1 B Fail E
2 C Pass F
3 D Fail F
4 E Pass
5 F Fail
6 G Pass H
Course A is Passed, is superseded by E, which is also Passed. All other courses fail the test so remain unchanged.
I have done multiple conditions before but not across different rows.
Note I am not changing a student's official record so changing the Result column is appropriate in this context.
What is the most efficient way I can do this?
答案1
得分: 2
代码
步骤1
创建映射器(字典)
m = dict(df[['Course', 'Result']].values)
m
{'A': 'Pass', 'B': 'Fail', 'C': 'Pass', 'D': 'Fail', 'E': 'Pass', 'F': 'Fail', 'G': 'Pass'}
步骤2
创建条件
- Result列 == 'Pass'
- 经过
m
映射后的Superseded_by列 == 'Pass'
条件 = 1 & 2
cond = df['Result'].eq('Pass') & df['Superseded_by'].map(m).eq('Pass')
步骤3
根据cond
对Result列进行布尔掩码操作
df.assign(Result=df['Result'].mask(cond, 'Superceded'))
输出:
Course Result Superseded_by
0 A Superceded E
1 B Fail E
2 C Pass F
3 D Fail F
4 E Pass
5 F Fail
6 G Pass H
英文:
Code
Step1
make mapper(dictionary)
m = dict(df[['Course', 'Result']].values)
m
{'A': 'Pass','B': 'Fail', 'C': 'Pass', 'D': 'Fail', 'E': 'Pass','F': 'Fail','G': 'Pass'}
Step2
make codition
- Result column == 'Pass'
- Superseded_by column after mapping by
m
== 'Pass'
condition = 1 & 2
<br>
cond = df['Result'].eq('Pass') & df['Superseded_by'].map(m).eq('Pass')
Step3
booleanmasking to Result column by cond
df.assign(Result=df['Result'].mask(cond, 'Superceded'))
output:
Course Result Superseded_by
0 A Superceded E
1 B Fail E
2 C Pass F
3 D Fail F
4 E Pass
5 F Fail
6 G Pass H
答案2
得分: 1
识别已通过的课程并使用布尔索引选择已通过课程且替代课程也已通过的行:
passed = df.loc[df['Result'].eq('Pass'), 'Course']
# ['A', 'C', 'E', 'G']
df.loc[df['Result'].eq('Pass')
& df['Superseded_by'].isin(passed),
'Result'] = 'Superseded'
输出:
Course Result Superseded_by
0 A Superseded E
1 B Fail E
2 C Pass F
3 D Fail F
4 E Pass
5 F Fail
6 G Pass H
英文:
Identify passed courses and use boolean indexing to select rows in which the course is passed and the superseding course is also passed:
passed = df.loc[df['Result'].eq('Pass'), 'Course']
# ['A', 'C', 'E', 'G']
df.loc[df['Result'].eq('Pass')
& df['Superseded_by'].isin(passed),
'Result'] = 'Superseded'
Output:
Course Result Superseded_by
0 A Superseded E
1 B Fail E
2 C Pass F
3 D Fail F
4 E Pass
5 F Fail
6 G Pass H
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论