Pandas 如果多行之间满足多个条件,则更改值

huangapple go评论89阅读模式
英文:

Pandas change value if muliple conditions across multiple rows are met

问题

我正在使用Python和pandas来检查学生的学术历史记录。我想要计算学生获得的及格次数。历史记录可能如下所示。

df = pd.DataFrame({
        'Course': ['A', 'B', 'C', 'D', 'E','F','G'],
        'Result': ['Pass', 'Fail', 'Pass', 'Fail', 'Pass','Fail','Pass'],
        'Superseded_by':['E','E','F','F','','','H']        
     })
print(df)

  Course Result Superseded_by
0      A   Pass             E
1      B   Fail             E
2      C   Pass             F
3      D   Fail             F
4      E   Pass              
5      F   Fail              
6      G   Pass             H             

问题在于较早的课程会被较新的课程所取代,所以我的代码不能同时计数两者。因此,我希望仅在以下情况下将课程的结果更改为'Superseded':

  • 课程结果为Passed
  • Superseded_by列中的课程存在于学生记录中
  • Superseded_by课程也已通过

运行此代码的输出如下:

  Course      Result Superseded_by
0      A  Superseded             E
1      B        Fail             E
2      C        Pass             F
3      D        Fail             F
4      E        Pass              
5      F        Fail              
6      G        Pass             H

课程A通过,被E所取代,E也通过。所有其他课程未通过测试,因此保持不变。请注意,在这种情况下,我没有更改学生的正式记录,因此更改Result列是合适的。

在这里,我将提供最有效的方法来执行这个任务:

import pandas as pd

df = pd.DataFrame({
    'Course': ['A', 'B', 'C', 'D', 'E', 'F', 'G'],
    'Result': ['Pass', 'Fail', 'Pass', 'Fail', 'Pass', 'Fail', 'Pass'],
    'Superseded_by': ['E', 'E', 'F', 'F', '', '', 'H']
})

# Create a set of courses that are both passed and have a superseding course
passed_superseded_courses = set(
    df.loc[df['Result'] == 'Pass']
      .loc[df['Superseded_by'].isin(df['Course'])]
      .loc[df['Superseded_by'].map(df.set_index('Course')['Result']) == 'Pass']
      ['Course']
)

# Update the 'Result' column based on the conditions
df['Result'] = df.apply(lambda row: 'Superseded' if row['Course'] in passed_superseded_courses else row['Result'], axis=1)

print(df)

这将按照你的要求更新DataFrame,只有符合条件的课程会被更改为'Superseded'。

英文:

I am using python and pandas to check student's academic histories. I want to count the number of passes a student has got. The history might look something like this.

df = pd.DataFrame({
        'Course': ['A', 'B', 'C', 'D', 'E','F','G'],
        'Result': ['Pass', 'Fail', 'Pass', 'Fail', 'Pass','Fail','Pass'],
        'Superseded_by':['E','E','F','F','','','H']        
     })
print(df)

  Course Result Superseded_by
0      A   Pass             E
1      B   Fail             E
2      C   Pass             F
3      D   Fail             F
4      E   Pass              
5      F   Fail              
6      G   Pass             H             

The problem is that older courses get superseded by newer ones and so my code mustn't count both. So I want to alter a Course's Result to 'Superseded' if and only if

  • The Course Result is Passed
  • The Course in the Superseded_by column exists in the student record
  • The Superseded_by course is also Passed

The output of running this would look like this

  Course      Result Superceded_by
0      A  Superceded             E
1      B        Fail             E
2      C        Pass             F
3      D        Fail             F
4      E        Pass              
5      F        Fail              
6      G        Pass             H

Course A is Passed, is superseded by E, which is also Passed. All other courses fail the test so remain unchanged.
I have done multiple conditions before but not across different rows.
Note I am not changing a student's official record so changing the Result column is appropriate in this context.
What is the most efficient way I can do this?

答案1

得分: 2

代码

步骤1

创建映射器(字典)

m = dict(df[['Course', 'Result']].values)

m

{'A': 'Pass', 'B': 'Fail', 'C': 'Pass', 'D': 'Fail', 'E': 'Pass', 'F': 'Fail', 'G': 'Pass'}

步骤2

创建条件

  1. Result列 == 'Pass'
  2. 经过m映射后的Superseded_by列 == 'Pass'

条件 = 1 & 2

cond = df['Result'].eq('Pass') & df['Superseded_by'].map(m).eq('Pass')

步骤3

根据cond对Result列进行布尔掩码操作

df.assign(Result=df['Result'].mask(cond, 'Superceded'))

输出:

  Course     Result Superseded_by
0      A  Superceded             E
1      B        Fail             E
2      C        Pass             F
3      D        Fail             F
4      E        Pass              
5      F        Fail              
6      G        Pass             H
英文:

Code

Step1

make mapper(dictionary)

m = dict(df[['Course', 'Result']].values)

m

{'A': 'Pass','B': 'Fail', 'C': 'Pass', 'D': 'Fail', 'E': 'Pass','F': 'Fail','G': 'Pass'}

Step2

make codition

  1. Result column == 'Pass'
  2. Superseded_by column after mapping by m == 'Pass'

condition = 1 & 2

<br>

cond = df[&#39;Result&#39;].eq(&#39;Pass&#39;) &amp; df[&#39;Superseded_by&#39;].map(m).eq(&#39;Pass&#39;)

Step3

booleanmasking to Result column by cond

df.assign(Result=df[&#39;Result&#39;].mask(cond, &#39;Superceded&#39;))

output:

Course	Result	Superseded_by
0	A	Superceded	E
1	B	Fail	    E
2	C	Pass	    F
3	D	Fail	    F
4	E	Pass	
5	F	Fail	
6	G	Pass	    H

答案2

得分: 1

识别已通过的课程并使用布尔索引选择已通过课程且替代课程也已通过的行:

passed = df.loc[df['Result'].eq('Pass'), 'Course']
# ['A', 'C', 'E', 'G']

df.loc[df['Result'].eq('Pass')
     & df['Superseded_by'].isin(passed),
     'Result'] = 'Superseded'

输出:

   Course       Result Superseded_by
0       A   Superseded             E
1       B         Fail             E
2       C         Pass             F
3       D         Fail             F
4       E         Pass              
5       F         Fail              
6       G         Pass             H
英文:

Identify passed courses and use boolean indexing to select rows in which the course is passed and the superseding course is also passed:

passed = df.loc[df[&#39;Result&#39;].eq(&#39;Pass&#39;), &#39;Course&#39;]
# [&#39;A&#39;, &#39;C&#39;, &#39;E&#39;, &#39;G&#39;]
          
df.loc[df[&#39;Result&#39;].eq(&#39;Pass&#39;)
     &amp; df[&#39;Superseded_by&#39;].isin(passed),
     &#39;Result&#39;] = &#39;Superseded&#39;

Output:

   Course       Result Superseded_by
0       A   Superseded             E
1       B         Fail             E
2       C         Pass             F
3       D         Fail             F
4       E         Pass              
5       F         Fail              
6       G         Pass             H

huangapple
  • 本文由 发表于 2023年8月10日 10:31:03
  • 转载请务必保留本文链接:https://go.coder-hub.com/76872302.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定