2023年8月10日 10:31:03go评论207阅读模式

英文:

Pandas change value if muliple conditions across multiple rows are met

问题

我正在使用Python和pandas来检查学生的学术历史记录。我想要计算学生获得的及格次数。历史记录可能如下所示。

df = pd.DataFrame({
        'Course': ['A', 'B', 'C', 'D', 'E','F','G'],
        'Result': ['Pass', 'Fail', 'Pass', 'Fail', 'Pass','Fail','Pass'],
        'Superseded_by':['E','E','F','F','','','H']        
     })
print(df)

  Course Result Superseded_by
0      A   Pass             E
1      B   Fail             E
2      C   Pass             F
3      D   Fail             F
4      E   Pass              
5      F   Fail              
6      G   Pass             H

问题在于较早的课程会被较新的课程所取代，所以我的代码不能同时计数两者。因此，我希望仅在以下情况下将课程的结果更改为'Superseded'：

课程结果为Passed
Superseded_by列中的课程存在于学生记录中
Superseded_by课程也已通过

运行此代码的输出如下：

  Course      Result Superseded_by
0      A  Superseded             E
1      B        Fail             E
2      C        Pass             F
3      D        Fail             F
4      E        Pass              
5      F        Fail              
6      G        Pass             H

课程A通过，被E所取代，E也通过。所有其他课程未通过测试，因此保持不变。请注意，在这种情况下，我没有更改学生的正式记录，因此更改Result列是合适的。

在这里，我将提供最有效的方法来执行这个任务：

import pandas as pd

df = pd.DataFrame({
    'Course': ['A', 'B', 'C', 'D', 'E', 'F', 'G'],
    'Result': ['Pass', 'Fail', 'Pass', 'Fail', 'Pass', 'Fail', 'Pass'],
    'Superseded_by': ['E', 'E', 'F', 'F', '', '', 'H']
})

# Create a set of courses that are both passed and have a superseding course
passed_superseded_courses = set(
    df.loc[df['Result'] == 'Pass']
      .loc[df['Superseded_by'].isin(df['Course'])]
      .loc[df['Superseded_by'].map(df.set_index('Course')['Result']) == 'Pass']
      ['Course']
)

# Update the 'Result' column based on the conditions
df['Result'] = df.apply(lambda row: 'Superseded' if row['Course'] in passed_superseded_courses else row['Result'], axis=1)

print(df)

这将按照你的要求更新DataFrame，只有符合条件的课程会被更改为'Superseded'。

英文:

I am using python and pandas to check student's academic histories. I want to count the number of passes a student has got. The history might look something like this.

df = pd.DataFrame({
        &#39;Course&#39;: [&#39;A&#39;, &#39;B&#39;, &#39;C&#39;, &#39;D&#39;, &#39;E&#39;,&#39;F&#39;,&#39;G&#39;],
        &#39;Result&#39;: [&#39;Pass&#39;, &#39;Fail&#39;, &#39;Pass&#39;, &#39;Fail&#39;, &#39;Pass&#39;,&#39;Fail&#39;,&#39;Pass&#39;],
        &#39;Superseded_by&#39;:[&#39;E&#39;,&#39;E&#39;,&#39;F&#39;,&#39;F&#39;,&#39;&#39;,&#39;&#39;,&#39;H&#39;]        
     })
print(df)

  Course Result Superseded_by
0      A   Pass             E
1      B   Fail             E
2      C   Pass             F
3      D   Fail             F
4      E   Pass              
5      F   Fail              
6      G   Pass             H

The problem is that older courses get superseded by newer ones and so my code mustn't count both. So I want to alter a Course's Result to 'Superseded' if and only if

The Course Result is Passed
The Course in the Superseded_by column exists in the student record
The Superseded_by course is also Passed

The output of running this would look like this

  Course      Result Superceded_by
0      A  Superceded             E
1      B        Fail             E
2      C        Pass             F
3      D        Fail             F
4      E        Pass              
5      F        Fail              
6      G        Pass             H

Course A is Passed, is superseded by E, which is also Passed. All other courses fail the test so remain unchanged.
I have done multiple conditions before but not across different rows.
Note I am not changing a student's official record so changing the Result column is appropriate in this context.
What is the most efficient way I can do this?

答案1

得分: 2

代码

步骤1

创建映射器（字典）

m = dict(df[['Course', 'Result']].values)

m

{'A': 'Pass', 'B': 'Fail', 'C': 'Pass', 'D': 'Fail', 'E': 'Pass', 'F': 'Fail', 'G': 'Pass'}

步骤2

创建条件

Result列 == 'Pass'
经过m映射后的Superseded_by列 == 'Pass'

条件 = 1 & 2

cond = df['Result'].eq('Pass') & df['Superseded_by'].map(m).eq('Pass')

步骤3

根据cond对Result列进行布尔掩码操作

df.assign(Result=df['Result'].mask(cond, 'Superceded'))

输出：

  Course     Result Superseded_by
0      A  Superceded             E
1      B        Fail             E
2      C        Pass             F
3      D        Fail             F
4      E        Pass              
5      F        Fail              
6      G        Pass             H

英文:

Code

Step1

make mapper(dictionary)

m = dict(df[[&#39;Course&#39;, &#39;Result&#39;]].values)

m

{&#39;A&#39;: &#39;Pass&#39;,&#39;B&#39;: &#39;Fail&#39;, &#39;C&#39;: &#39;Pass&#39;, &#39;D&#39;: &#39;Fail&#39;, &#39;E&#39;: &#39;Pass&#39;,&#39;F&#39;: &#39;Fail&#39;,&#39;G&#39;: &#39;Pass&#39;}

Step2

make codition

Result column == 'Pass'
Superseded_by column after mapping by m == 'Pass'

condition = 1 & 2

<br>

cond = df[&#39;Result&#39;].eq(&#39;Pass&#39;) &amp; df[&#39;Superseded_by&#39;].map(m).eq(&#39;Pass&#39;)

Step3

booleanmasking to Result column by cond

df.assign(Result=df[&#39;Result&#39;].mask(cond, &#39;Superceded&#39;))

output:

Course	Result	Superseded_by
0	A	Superceded	E
1	B	Fail	    E
2	C	Pass	    F
3	D	Fail	    F
4	E	Pass	
5	F	Fail	
6	G	Pass	    H

答案2

得分: 1

识别已通过的课程并使用布尔索引选择已通过课程且替代课程也已通过的行：

passed = df.loc[df['Result'].eq('Pass'), 'Course']
# ['A', 'C', 'E', 'G']

df.loc[df['Result'].eq('Pass')
     & df['Superseded_by'].isin(passed),
     'Result'] = 'Superseded'

输出：

   Course       Result Superseded_by
0       A   Superseded             E
1       B         Fail             E
2       C         Pass             F
3       D         Fail             F
4       E         Pass              
5       F         Fail              
6       G         Pass             H

英文:

Identify passed courses and use boolean indexing to select rows in which the course is passed and the superseding course is also passed:

passed = df.loc[df[&#39;Result&#39;].eq(&#39;Pass&#39;), &#39;Course&#39;]
# [&#39;A&#39;, &#39;C&#39;, &#39;E&#39;, &#39;G&#39;]
          
df.loc[df[&#39;Result&#39;].eq(&#39;Pass&#39;)
     &amp; df[&#39;Superseded_by&#39;].isin(passed),
     &#39;Result&#39;] = &#39;Superseded&#39;

Output:

   Course       Result Superseded_by
0       A   Superseded             E
1       B         Fail             E
2       C         Pass             F
3       D         Fail             F
4       E         Pass              
5       F         Fail              
6       G         Pass             H

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Pandas 如果多行之间满足多个条件，则更改值

问题

答案1

答案2

使用Go（Golang）编写Python扩展

PySide6 widgets not showing when using pyside6-uic

从元组中获取除已知元素以外的值如何操作？

使用pandas的.astype在后面跟着.replace时，强制数据类型不像预期那样工作。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论