计算在Pandas中特定列满足特定条件之前的天数

huangapple go评论58阅读模式
英文:

Count days till a specific column meets specific criteria in Pandas

问题

我的数据框看起来像这样

Serialnr Date Event occured
1 10.01.22 0
1 05.02.22 0
1 12.02.22 1
1 20.04.22 0
1 27.04.22 0
1 01.05.22 0
1 05.05.22 1
2 10.01.22 0
2 05.02.22 0
2 12.02.22 1
2 20.04.22 0
2 27.04.22 0
2 01.05.22 0
2 05.05.22 1

我需要计算每一行中事件发生前剩余的天数,对于特定的序列号。我想要的列是'Days_to_event',如下所示:

Serialnr Date Event occured Days_to_event
1 10.01.22 0 33
1 05.02.22 0 7
1 12.02.22 1 82
1 20.04.22 0 15
1 27.04.22 0 ...
1 01.05.22 0 ...
1 05.05.22 1 ...
2 10.01.22 0 ...
2 05.02.22 0 ...
2 12.02.22 1 ...
2 20.04.22 0 ...
2 27.04.22 0 ...
2 01.05.22 0 ...
2 05.05.22 1 ...
英文:

My Dataframe looks like this

Serialnr Date Event occured
1 10.01.22 0
1 05.02.22 0
1 12.02.22 1
1 20.04.22 0
1 27.04.22 0
1 01.05.22 0
1 05.05.22 1
2 10.01.22 0
2 05.02.22 0
2 12.02.22 1
2 20.04.22 0
2 27.04.22 0
2 01.05.22 0
2 05.05.22 1

And I have to count the days in each line the number of days left until the event occurs for the specific serialnumber.
I want the column 'Days_to_event', like this:

Serialnr Date Event occured Days_to_event
1 10.01.22 0 33
1 05.02.22 0 7
1 12.02.22 1 82
1 20.04.22 0 15
1 27.04.22 0 ...
1 01.05.22 0 ...
1 05.05.22 1 ...
2 10.01.22 0 ...
2 05.02.22 0 ...
2 12.02.22 1 ...
2 20.04.22 0 ...
2 27.04.22 0 ...
2 01.05.22 0 ...
2 05.05.22 1 ...

答案1

得分: 1

使用自定义 groupby.transformmask/bfill/shift

df['Date'] = pd.to_datetime(df['Date'], dayfirst=True)

m = df['Event occured'].eq(1)
next_event = (df['Date'].where(m).groupby(df['Serialnr'])
              .transform(lambda g: g.bfill().shift(-1))
             )

df['Days_to_event'] = df['Date'].rsub(next_event).dt.days

输出结果:

    Serialnr       Date  Event occured  Days_to_event
0          1 2022-01-10              0           33.0
1          1 2022-02-05              0            7.0
2          1 2022-02-12              1           82.0
3          1 2022-04-20              0           15.0
4          1 2022-04-27              0            8.0
5          1 2022-05-01              0            4.0
6          1 2022-05-05              1            NaN
7          2 2022-01-10              0           33.0
8          2 2022-02-05              0            7.0
9          2 2022-02-12              1           82.0
10         2 2022-04-20              0           15.0
11         2 2022-04-27              0            8.0
12         2 2022-05-01              0            4.0
13         2 2022-05-05              1            NaN

处理最后一个事件:

如果您希望最后一个引用是最大日期而不是 NaN:

df['Date'] = pd.to_datetime(df['Date'], dayfirst=True)

m = df['Event occured'].eq(1)
next_event = (df.groupby(df['Serialnr'])['Date']
              .transform(lambda g: g.where(m).bfill().shift(-1).fillna(g.max()))
             )

df['Days_to_event'] = df['Date'].rsub(next_event).dt.days

输出结果:

    Serialnr       Date  Event occured  Days_to_event
0          1 2022-01-10              0             33
1          1 2022-02-05              0              7
2          1 2022-02-12              1             82
3          1 2022-04-20              0             15
4          1 2022-04-27              0              8
5          1 2022-05-01              0              4
6          1 2022-05-05              1              2
7          1 2022-05-07              0              0
8          2 2022-01-10              0             33
9          2 2022-02-05              0              7
10         2 2022-02-12              1             82
11         2 2022-04-20              0             15
12         2 2022-04-27              0              8
13         2 2022-05-01              0              4
14         2 2022-05-05              1              0
英文:

Use a custom groupby.transform with mask/bfill/shift:

df['Date'] = pd.to_datetime(df['Date'], dayfirst=True)

m = df['Event occured'].eq(1)
next_event = (df['Date'].where(m).groupby(df['Serialnr'])
              .transform(lambda g: g.bfill().shift(-1))
             )

df['Days_to_event'] = df['Date'].rsub(next_event).dt.days

Output:

    Serialnr       Date  Event occured  Days_to_event
0          1 2022-01-10              0           33.0
1          1 2022-02-05              0            7.0
2          1 2022-02-12              1           82.0
3          1 2022-04-20              0           15.0
4          1 2022-04-27              0            8.0
5          1 2022-05-01              0            4.0
6          1 2022-05-05              1            NaN
7          2 2022-01-10              0           33.0
8          2 2022-02-05              0            7.0
9          2 2022-02-12              1           82.0
10         2 2022-04-20              0           15.0
11         2 2022-04-27              0            8.0
12         2 2022-05-01              0            4.0
13         2 2022-05-05              1            NaN

handling the last event

If you want the last reference to be the max date instead of NaNs:

df['Date'] = pd.to_datetime(df['Date'], dayfirst=True)

m = df['Event occured'].eq(1)
next_event = (df.groupby(df['Serialnr'])['Date']
              .transform(lambda g: g.where(m).bfill().shift(-1).fillna(g.max()))
             )

df['Days_to_event'] = df['Date'].rsub(next_event).dt.days

Output:

    Serialnr       Date  Event occured  Days_to_event
0          1 2022-01-10              0             33
1          1 2022-02-05              0              7
2          1 2022-02-12              1             82
3          1 2022-04-20              0             15
4          1 2022-04-27              0              8
5          1 2022-05-01              0              4
6          1 2022-05-05              1              2
7          1 2022-05-07              0              0
8          2 2022-01-10              0             33
9          2 2022-02-05              0              7
10         2 2022-02-12              1             82
11         2 2022-04-20              0             15
12         2 2022-04-27              0              8
13         2 2022-05-01              0              4
14         2 2022-05-05              1              0

huangapple
  • 本文由 发表于 2023年7月17日 17:43:05
  • 转载请务必保留本文链接:https://go.coder-hub.com/76703208.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定