检索符合条件的最后一行数据。

huangapple go评论107阅读模式
英文:

Retrieve last row of data with conditions

问题

  1. 我有以下的大型数据集,记录了数学竞赛结果,按日期降序排列:例如,学生1在比赛1中获得第三名,而学生3在比赛2中获得第一名,依此类推。
  2. Race_ID Date Student_ID Rank
  3. 21 1/1/2023 1 3
  4. 21 1/1/2023 2 2
  5. 21 1/1/2023 3 1
  6. 21 1/1/2023 4 4
  7. 25 11/9/2022 1 2
  8. 25 11/9/2022 2 3
  9. 25 11/9/2022 3 1
  10. 3 17/4/2022 5 4
  11. 3 17/4/2022 2 1
  12. 3 17/4/2022 3 2
  13. 3 17/4/2022 4 3
  14. 14 1/3/2022 1 1
  15. 14 1/3/2022 2 2
  16. 85 1/1/2021 1 2
  17. 85 1/1/2021 2 3
  18. 85 1/1/2021 3 1
  19. 我想创建一个名为```Last_win```的新列,它返回学生上次获胜的```Race_ID```(即排名第一的比赛)。所以结果应该如下:
  20. Race_ID Date Student_ID Rank Last_win
  21. 21 1/1/2023 1 3 14
  22. 21 1/1/2023 2 2 3
  23. 21 1/1/2023 3 1 25
  24. 21 1/1/2023 4 4 NaN
  25. 25 11/9/2022 1 2 14
  26. 25 11/9/2022 2 3 3
  27. 25 11/9/2022 3 1 85
  28. 3 17/4/2022 5 4 NaN
  29. 3 17/4/2022 2 1 NaN
  30. 3 17/4/2022 3 2 85
  31. 3 17/4/2022 4 3 NaN
  32. 14 1/3/2022 1 1 NaN
  33. 14 1/3/2022 2 2 NaN
  34. 85 1/1/2021 1 2 NaN
  35. 85 1/1/2021 2 3 NaN
  36. 85 1/1/2021 3 1 NaN
  37. 非常感谢您的提前帮助。
英文:

I have the following large dataset recording the result of a math competition among students in descending order of date: So for example, student 1 comes third in Race 1 while student 3 won Race 2, etc.

  1. Race_ID Date Student_ID Rank
  2. 21 1/1/2023 1 3
  3. 21 1/1/2023 2 2
  4. 21 1/1/2023 3 1
  5. 21 1/1/2023 4 4
  6. 25 11/9/2022 1 2
  7. 25 11/9/2022 2 3
  8. 25 11/9/2022 3 1
  9. 3 17/4/2022 5 4
  10. 3 17/4/2022 2 1
  11. 3 17/4/2022 3 2
  12. 3 17/4/2022 4 3
  13. 14 1/3/2022 1 1
  14. 14 1/3/2022 2 2
  15. 85 1/1/2021 1 2
  16. 85 1/1/2021 2 3
  17. 85 1/1/2021 3 1

And I want to create a new column called Last_win which returns the Race_ID of the last time that student won (i.e. rank number 1). So the outcome should look like

  1. Race_ID Date Student_ID Rank Last_win
  2. 21 1/1/2023 1 3 14
  3. 21 1/1/2023 2 2 3
  4. 21 1/1/2023 3 1 25
  5. 21 1/1/2023 4 4 NaN
  6. 25 11/9/2022 1 2 14
  7. 25 11/9/2022 2 3 3
  8. 25 11/9/2022 3 1 85
  9. 3 17/4/2022 5 4 NaN
  10. 3 17/4/2022 2 1 NaN
  11. 3 17/4/2022 3 2 85
  12. 3 17/4/2022 4 3 NaN
  13. 14 1/3/2022 1 1 NaN
  14. 14 1/3/2022 2 2 NaN
  15. 85 1/1/2021 1 2 NaN
  16. 85 1/1/2021 2 3 NaN
  17. 85 1/1/2021 3 1 NaN

Thank you so much inadvacne.

答案1

得分: 2

保留Rank为1的行,隐藏其他行,然后按学生分组。最后,向后填充Race_ID,不要忘记移动结果:

  1. # 如果需要,将Date列转换为DatetimeIndex
  2. # df['Date'] = pd.to_datetime(df['Date'], dayfirst=True)
  3. df['Last_win'] = (df.where(df['Rank'] == 1)
  4. .groupby(df['Student_ID'])['Race_ID']
  5. .transform(lambda x: x.bfill().shift(-1)))
  6. print(df)
  7. # 输出
  8. Race_ID Date Student_ID Rank Last_win
  9. 0 21 2023-01-01 1 3 14.0
  10. 1 21 2023-01-01 2 2 3.0
  11. 2 21 2023-01-01 3 1 25.0
  12. 3 21 2023-01-01 4 4 NaN
  13. 4 25 2022-09-11 1 2 14.0
  14. 5 25 2022-09-11 2 3 3.0
  15. 6 25 2022-09-11 3 1 85.0
  16. 7 3 2022-04-17 5 4 NaN
  17. 8 3 2022-04-17 2 1 NaN
  18. 9 3 2022-04-17 3 2 85.0
  19. 10 3 2022-04-17 4 3 NaN
  20. 11 14 2022-03-01 1 1 NaN
  21. 12 14 2022-03-01 2 2 NaN
  22. 13 85 2021-01-01 1 2 NaN
  23. 14 85 2021-01-01 2 3 NaN
  24. 15 85 2021-01-01 3 1 NaN
英文:

Keep rows where Rank is 1 and hide others then group by Student. Finally, fill backward Race_ID and don't forget to shift the result:

  1. # Convert Date column to DatetimeIndex if needed
  2. # df['Date'] = pd.to_datetime(df['Date'], dayfirst=True)
  3. df['Last_win'] = (df.where(df['Rank'] == 1)
  4. .groupby(df['Student_ID'])['Race_ID']
  5. .transform(lambda x: x.bfill().shift(-1)))
  6. print(df)
  7. # Output
  8. Race_ID Date Student_ID Rank Last_win
  9. 0 21 2023-01-01 1 3 14.0
  10. 1 21 2023-01-01 2 2 3.0
  11. 2 21 2023-01-01 3 1 25.0
  12. 3 21 2023-01-01 4 4 NaN
  13. 4 25 2022-09-11 1 2 14.0
  14. 5 25 2022-09-11 2 3 3.0
  15. 6 25 2022-09-11 3 1 85.0
  16. 7 3 2022-04-17 5 4 NaN
  17. 8 3 2022-04-17 2 1 NaN
  18. 9 3 2022-04-17 3 2 85.0
  19. 10 3 2022-04-17 4 3 NaN
  20. 11 14 2022-03-01 1 1 NaN
  21. 12 14 2022-03-01 2 2 NaN
  22. 13 85 2021-01-01 1 2 NaN
  23. 14 85 2021-01-01 2 3 NaN
  24. 15 85 2021-01-01 3 1 NaN

答案2

得分: 1

使用 merge_asof 来合并先前日期,通过参数 allow_exact_matches=False

  1. df['Date'] = pd.to_datetime(df['Date'], dayfirst=True)
  2. df = df.sort_values('Date')
  3. df1 = (df.loc[df['Rank'].eq(1), ['Date','Student_ID','Race_ID']]
  4. .rename(columns={'Race_ID':'Last_win'}))
  5. df = (pd.merge_asof(df.reset_index(),
  6. df1, on='Date', allow_exact_matches=False, by='Student_ID')
  7. .sort_values('index', ignore_index=True)
  8. .drop('index', axis=1))
  9. print(df)

输出结果如下:

  1. Race_ID Date Student_ID Rank Last_win
  2. 0 21 2023-01-01 1 3 14.0
  3. 1 21 2023-01-01 2 2 3.0
  4. 2 21 2023-01-01 3 1 25.0
  5. 3 21 2023-01-01 4 4 NaN
  6. 4 25 2022-09-11 1 2 14.0
  7. 5 25 2022-09-11 2 3 3.0
  8. 6 25 2022-09-11 3 1 85.0
  9. 7 3 2022-04-17 5 4 NaN
  10. 8 3 2022-04-17 2 1 NaN
  11. 9 3 2022-04-17 3 2 85.0
  12. 10 3 2022-04-17 4 3 NaN
  13. 11 14 2022-03-01 1 1 NaN
  14. 12 14 2022-03-01 2 2 NaN
  15. 13 85 2021-01-01 1 2 NaN
  16. 14 85 2021-01-01 2 3 NaN
  17. 15 85 2021-01-01 3 1 NaN
英文:

Use merge_asof for merge previous datetimes by parameter allow_exact_matches=False:

  1. df['Date'] = pd.to_datetime(df['Date'], dayfirst=True)
  2. df = df.sort_values('Date')
  3. df1 = (df.loc[df['Rank'].eq(1), ['Date','Student_ID','Race_ID']]
  4. .rename(columns={'Race_ID':'Last_win'}))
  5. df = (pd.merge_asof(df.reset_index(),
  6. df1, on='Date', allow_exact_matches=False, by='Student_ID')
  7. .sort_values('index', ignore_index=True)
  8. .drop('index', axis=1))
  9. print (df)
  10. Race_ID Date Student_ID Rank Last_win
  11. 0 21 2023-01-01 1 3 14.0
  12. 1 21 2023-01-01 2 2 3.0
  13. 2 21 2023-01-01 3 1 25.0
  14. 3 21 2023-01-01 4 4 NaN
  15. 4 25 2022-09-11 1 2 14.0
  16. 5 25 2022-09-11 2 3 3.0
  17. 6 25 2022-09-11 3 1 85.0
  18. 7 3 2022-04-17 5 4 NaN
  19. 8 3 2022-04-17 2 1 NaN
  20. 9 3 2022-04-17 3 2 85.0
  21. 10 3 2022-04-17 4 3 NaN
  22. 11 14 2022-03-01 1 1 NaN
  23. 12 14 2022-03-01 2 2 NaN
  24. 13 85 2021-01-01 1 2 NaN
  25. 14 85 2021-01-01 2 3 NaN
  26. 15 85 2021-01-01 3 1 NaN

huangapple
  • 本文由 发表于 2023年2月8日 18:02:31
  • 转载请务必保留本文链接:https://go.coder-hub.com/75384112.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定