英文:
Retrieve last row of data with conditions
问题
我有以下的大型数据集,记录了数学竞赛结果,按日期降序排列:例如,学生1在比赛1中获得第三名,而学生3在比赛2中获得第一名,依此类推。
Race_ID Date Student_ID Rank
21 1/1/2023 1 3
21 1/1/2023 2 2
21 1/1/2023 3 1
21 1/1/2023 4 4
25 11/9/2022 1 2
25 11/9/2022 2 3
25 11/9/2022 3 1
3 17/4/2022 5 4
3 17/4/2022 2 1
3 17/4/2022 3 2
3 17/4/2022 4 3
14 1/3/2022 1 1
14 1/3/2022 2 2
85 1/1/2021 1 2
85 1/1/2021 2 3
85 1/1/2021 3 1
我想创建一个名为```Last_win```的新列,它返回学生上次获胜的```Race_ID```(即排名第一的比赛)。所以结果应该如下:
Race_ID Date Student_ID Rank Last_win
21 1/1/2023 1 3 14
21 1/1/2023 2 2 3
21 1/1/2023 3 1 25
21 1/1/2023 4 4 NaN
25 11/9/2022 1 2 14
25 11/9/2022 2 3 3
25 11/9/2022 3 1 85
3 17/4/2022 5 4 NaN
3 17/4/2022 2 1 NaN
3 17/4/2022 3 2 85
3 17/4/2022 4 3 NaN
14 1/3/2022 1 1 NaN
14 1/3/2022 2 2 NaN
85 1/1/2021 1 2 NaN
85 1/1/2021 2 3 NaN
85 1/1/2021 3 1 NaN
非常感谢您的提前帮助。
英文:
I have the following large dataset recording the result of a math competition among students in descending order of date: So for example, student 1 comes third in Race 1 while student 3 won Race 2, etc.
Race_ID Date Student_ID Rank
21 1/1/2023 1 3
21 1/1/2023 2 2
21 1/1/2023 3 1
21 1/1/2023 4 4
25 11/9/2022 1 2
25 11/9/2022 2 3
25 11/9/2022 3 1
3 17/4/2022 5 4
3 17/4/2022 2 1
3 17/4/2022 3 2
3 17/4/2022 4 3
14 1/3/2022 1 1
14 1/3/2022 2 2
85 1/1/2021 1 2
85 1/1/2021 2 3
85 1/1/2021 3 1
And I want to create a new column called Last_win
which returns the Race_ID
of the last time that student won (i.e. rank number 1). So the outcome should look like
Race_ID Date Student_ID Rank Last_win
21 1/1/2023 1 3 14
21 1/1/2023 2 2 3
21 1/1/2023 3 1 25
21 1/1/2023 4 4 NaN
25 11/9/2022 1 2 14
25 11/9/2022 2 3 3
25 11/9/2022 3 1 85
3 17/4/2022 5 4 NaN
3 17/4/2022 2 1 NaN
3 17/4/2022 3 2 85
3 17/4/2022 4 3 NaN
14 1/3/2022 1 1 NaN
14 1/3/2022 2 2 NaN
85 1/1/2021 1 2 NaN
85 1/1/2021 2 3 NaN
85 1/1/2021 3 1 NaN
Thank you so much inadvacne.
答案1
得分: 2
保留Rank为1的行,隐藏其他行,然后按学生分组。最后,向后填充Race_ID,不要忘记移动结果:
# 如果需要,将Date列转换为DatetimeIndex
# df['Date'] = pd.to_datetime(df['Date'], dayfirst=True)
df['Last_win'] = (df.where(df['Rank'] == 1)
.groupby(df['Student_ID'])['Race_ID']
.transform(lambda x: x.bfill().shift(-1)))
print(df)
# 输出
Race_ID Date Student_ID Rank Last_win
0 21 2023-01-01 1 3 14.0
1 21 2023-01-01 2 2 3.0
2 21 2023-01-01 3 1 25.0
3 21 2023-01-01 4 4 NaN
4 25 2022-09-11 1 2 14.0
5 25 2022-09-11 2 3 3.0
6 25 2022-09-11 3 1 85.0
7 3 2022-04-17 5 4 NaN
8 3 2022-04-17 2 1 NaN
9 3 2022-04-17 3 2 85.0
10 3 2022-04-17 4 3 NaN
11 14 2022-03-01 1 1 NaN
12 14 2022-03-01 2 2 NaN
13 85 2021-01-01 1 2 NaN
14 85 2021-01-01 2 3 NaN
15 85 2021-01-01 3 1 NaN
英文:
Keep rows where Rank is 1 and hide others then group by Student. Finally, fill backward Race_ID and don't forget to shift the result:
# Convert Date column to DatetimeIndex if needed
# df['Date'] = pd.to_datetime(df['Date'], dayfirst=True)
df['Last_win'] = (df.where(df['Rank'] == 1)
.groupby(df['Student_ID'])['Race_ID']
.transform(lambda x: x.bfill().shift(-1)))
print(df)
# Output
Race_ID Date Student_ID Rank Last_win
0 21 2023-01-01 1 3 14.0
1 21 2023-01-01 2 2 3.0
2 21 2023-01-01 3 1 25.0
3 21 2023-01-01 4 4 NaN
4 25 2022-09-11 1 2 14.0
5 25 2022-09-11 2 3 3.0
6 25 2022-09-11 3 1 85.0
7 3 2022-04-17 5 4 NaN
8 3 2022-04-17 2 1 NaN
9 3 2022-04-17 3 2 85.0
10 3 2022-04-17 4 3 NaN
11 14 2022-03-01 1 1 NaN
12 14 2022-03-01 2 2 NaN
13 85 2021-01-01 1 2 NaN
14 85 2021-01-01 2 3 NaN
15 85 2021-01-01 3 1 NaN
答案2
得分: 1
使用 merge_asof
来合并先前日期,通过参数 allow_exact_matches=False
:
df['Date'] = pd.to_datetime(df['Date'], dayfirst=True)
df = df.sort_values('Date')
df1 = (df.loc[df['Rank'].eq(1), ['Date','Student_ID','Race_ID']]
.rename(columns={'Race_ID':'Last_win'}))
df = (pd.merge_asof(df.reset_index(),
df1, on='Date', allow_exact_matches=False, by='Student_ID')
.sort_values('index', ignore_index=True)
.drop('index', axis=1))
print(df)
输出结果如下:
Race_ID Date Student_ID Rank Last_win
0 21 2023-01-01 1 3 14.0
1 21 2023-01-01 2 2 3.0
2 21 2023-01-01 3 1 25.0
3 21 2023-01-01 4 4 NaN
4 25 2022-09-11 1 2 14.0
5 25 2022-09-11 2 3 3.0
6 25 2022-09-11 3 1 85.0
7 3 2022-04-17 5 4 NaN
8 3 2022-04-17 2 1 NaN
9 3 2022-04-17 3 2 85.0
10 3 2022-04-17 4 3 NaN
11 14 2022-03-01 1 1 NaN
12 14 2022-03-01 2 2 NaN
13 85 2021-01-01 1 2 NaN
14 85 2021-01-01 2 3 NaN
15 85 2021-01-01 3 1 NaN
英文:
Use merge_asof
for merge previous datetimes by parameter allow_exact_matches=False
:
df['Date'] = pd.to_datetime(df['Date'], dayfirst=True)
df = df.sort_values('Date')
df1 = (df.loc[df['Rank'].eq(1), ['Date','Student_ID','Race_ID']]
.rename(columns={'Race_ID':'Last_win'}))
df = (pd.merge_asof(df.reset_index(),
df1, on='Date', allow_exact_matches=False, by='Student_ID')
.sort_values('index', ignore_index=True)
.drop('index', axis=1))
print (df)
Race_ID Date Student_ID Rank Last_win
0 21 2023-01-01 1 3 14.0
1 21 2023-01-01 2 2 3.0
2 21 2023-01-01 3 1 25.0
3 21 2023-01-01 4 4 NaN
4 25 2022-09-11 1 2 14.0
5 25 2022-09-11 2 3 3.0
6 25 2022-09-11 3 1 85.0
7 3 2022-04-17 5 4 NaN
8 3 2022-04-17 2 1 NaN
9 3 2022-04-17 3 2 85.0
10 3 2022-04-17 4 3 NaN
11 14 2022-03-01 1 1 NaN
12 14 2022-03-01 2 2 NaN
13 85 2021-01-01 1 2 NaN
14 85 2021-01-01 2 3 NaN
15 85 2021-01-01 3 1 NaN
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论