计算在Pandas中特定列满足特定条件之前的天数

huangapple go评论84阅读模式
英文:

Count days till a specific column meets specific criteria in Pandas

问题

我的数据框看起来像这样

Serialnr Date Event occured
1 10.01.22 0
1 05.02.22 0
1 12.02.22 1
1 20.04.22 0
1 27.04.22 0
1 01.05.22 0
1 05.05.22 1
2 10.01.22 0
2 05.02.22 0
2 12.02.22 1
2 20.04.22 0
2 27.04.22 0
2 01.05.22 0
2 05.05.22 1

我需要计算每一行中事件发生前剩余的天数,对于特定的序列号。我想要的列是'Days_to_event',如下所示:

Serialnr Date Event occured Days_to_event
1 10.01.22 0 33
1 05.02.22 0 7
1 12.02.22 1 82
1 20.04.22 0 15
1 27.04.22 0 ...
1 01.05.22 0 ...
1 05.05.22 1 ...
2 10.01.22 0 ...
2 05.02.22 0 ...
2 12.02.22 1 ...
2 20.04.22 0 ...
2 27.04.22 0 ...
2 01.05.22 0 ...
2 05.05.22 1 ...
英文:

My Dataframe looks like this

Serialnr Date Event occured
1 10.01.22 0
1 05.02.22 0
1 12.02.22 1
1 20.04.22 0
1 27.04.22 0
1 01.05.22 0
1 05.05.22 1
2 10.01.22 0
2 05.02.22 0
2 12.02.22 1
2 20.04.22 0
2 27.04.22 0
2 01.05.22 0
2 05.05.22 1

And I have to count the days in each line the number of days left until the event occurs for the specific serialnumber.
I want the column 'Days_to_event', like this:

Serialnr Date Event occured Days_to_event
1 10.01.22 0 33
1 05.02.22 0 7
1 12.02.22 1 82
1 20.04.22 0 15
1 27.04.22 0 ...
1 01.05.22 0 ...
1 05.05.22 1 ...
2 10.01.22 0 ...
2 05.02.22 0 ...
2 12.02.22 1 ...
2 20.04.22 0 ...
2 27.04.22 0 ...
2 01.05.22 0 ...
2 05.05.22 1 ...

答案1

得分: 1

使用自定义 groupby.transformmask/bfill/shift

  1. df['Date'] = pd.to_datetime(df['Date'], dayfirst=True)
  2. m = df['Event occured'].eq(1)
  3. next_event = (df['Date'].where(m).groupby(df['Serialnr'])
  4. .transform(lambda g: g.bfill().shift(-1))
  5. )
  6. df['Days_to_event'] = df['Date'].rsub(next_event).dt.days

输出结果:

  1. Serialnr Date Event occured Days_to_event
  2. 0 1 2022-01-10 0 33.0
  3. 1 1 2022-02-05 0 7.0
  4. 2 1 2022-02-12 1 82.0
  5. 3 1 2022-04-20 0 15.0
  6. 4 1 2022-04-27 0 8.0
  7. 5 1 2022-05-01 0 4.0
  8. 6 1 2022-05-05 1 NaN
  9. 7 2 2022-01-10 0 33.0
  10. 8 2 2022-02-05 0 7.0
  11. 9 2 2022-02-12 1 82.0
  12. 10 2 2022-04-20 0 15.0
  13. 11 2 2022-04-27 0 8.0
  14. 12 2 2022-05-01 0 4.0
  15. 13 2 2022-05-05 1 NaN

处理最后一个事件:

如果您希望最后一个引用是最大日期而不是 NaN:

  1. df['Date'] = pd.to_datetime(df['Date'], dayfirst=True)
  2. m = df['Event occured'].eq(1)
  3. next_event = (df.groupby(df['Serialnr'])['Date']
  4. .transform(lambda g: g.where(m).bfill().shift(-1).fillna(g.max()))
  5. )
  6. df['Days_to_event'] = df['Date'].rsub(next_event).dt.days

输出结果:

  1. Serialnr Date Event occured Days_to_event
  2. 0 1 2022-01-10 0 33
  3. 1 1 2022-02-05 0 7
  4. 2 1 2022-02-12 1 82
  5. 3 1 2022-04-20 0 15
  6. 4 1 2022-04-27 0 8
  7. 5 1 2022-05-01 0 4
  8. 6 1 2022-05-05 1 2
  9. 7 1 2022-05-07 0 0
  10. 8 2 2022-01-10 0 33
  11. 9 2 2022-02-05 0 7
  12. 10 2 2022-02-12 1 82
  13. 11 2 2022-04-20 0 15
  14. 12 2 2022-04-27 0 8
  15. 13 2 2022-05-01 0 4
  16. 14 2 2022-05-05 1 0
英文:

Use a custom groupby.transform with mask/bfill/shift:

  1. df['Date'] = pd.to_datetime(df['Date'], dayfirst=True)
  2. m = df['Event occured'].eq(1)
  3. next_event = (df['Date'].where(m).groupby(df['Serialnr'])
  4. .transform(lambda g: g.bfill().shift(-1))
  5. )
  6. df['Days_to_event'] = df['Date'].rsub(next_event).dt.days

Output:

  1. Serialnr Date Event occured Days_to_event
  2. 0 1 2022-01-10 0 33.0
  3. 1 1 2022-02-05 0 7.0
  4. 2 1 2022-02-12 1 82.0
  5. 3 1 2022-04-20 0 15.0
  6. 4 1 2022-04-27 0 8.0
  7. 5 1 2022-05-01 0 4.0
  8. 6 1 2022-05-05 1 NaN
  9. 7 2 2022-01-10 0 33.0
  10. 8 2 2022-02-05 0 7.0
  11. 9 2 2022-02-12 1 82.0
  12. 10 2 2022-04-20 0 15.0
  13. 11 2 2022-04-27 0 8.0
  14. 12 2 2022-05-01 0 4.0
  15. 13 2 2022-05-05 1 NaN

handling the last event

If you want the last reference to be the max date instead of NaNs:

  1. df['Date'] = pd.to_datetime(df['Date'], dayfirst=True)
  2. m = df['Event occured'].eq(1)
  3. next_event = (df.groupby(df['Serialnr'])['Date']
  4. .transform(lambda g: g.where(m).bfill().shift(-1).fillna(g.max()))
  5. )
  6. df['Days_to_event'] = df['Date'].rsub(next_event).dt.days

Output:

  1. Serialnr Date Event occured Days_to_event
  2. 0 1 2022-01-10 0 33
  3. 1 1 2022-02-05 0 7
  4. 2 1 2022-02-12 1 82
  5. 3 1 2022-04-20 0 15
  6. 4 1 2022-04-27 0 8
  7. 5 1 2022-05-01 0 4
  8. 6 1 2022-05-05 1 2
  9. 7 1 2022-05-07 0 0
  10. 8 2 2022-01-10 0 33
  11. 9 2 2022-02-05 0 7
  12. 10 2 2022-02-12 1 82
  13. 11 2 2022-04-20 0 15
  14. 12 2 2022-04-27 0 8
  15. 13 2 2022-05-01 0 4
  16. 14 2 2022-05-05 1 0

huangapple
  • 本文由 发表于 2023年7月17日 17:43:05
  • 转载请务必保留本文链接:https://go.coder-hub.com/76703208.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定