英文:
Count days till a specific column meets specific criteria in Pandas
问题
我的数据框看起来像这样
Serialnr | Date | Event occured |
---|---|---|
1 | 10.01.22 | 0 |
1 | 05.02.22 | 0 |
1 | 12.02.22 | 1 |
1 | 20.04.22 | 0 |
1 | 27.04.22 | 0 |
1 | 01.05.22 | 0 |
1 | 05.05.22 | 1 |
2 | 10.01.22 | 0 |
2 | 05.02.22 | 0 |
2 | 12.02.22 | 1 |
2 | 20.04.22 | 0 |
2 | 27.04.22 | 0 |
2 | 01.05.22 | 0 |
2 | 05.05.22 | 1 |
我需要计算每一行中事件发生前剩余的天数,对于特定的序列号。我想要的列是'Days_to_event',如下所示:
Serialnr | Date | Event occured | Days_to_event |
---|---|---|---|
1 | 10.01.22 | 0 | 33 |
1 | 05.02.22 | 0 | 7 |
1 | 12.02.22 | 1 | 82 |
1 | 20.04.22 | 0 | 15 |
1 | 27.04.22 | 0 | ... |
1 | 01.05.22 | 0 | ... |
1 | 05.05.22 | 1 | ... |
2 | 10.01.22 | 0 | ... |
2 | 05.02.22 | 0 | ... |
2 | 12.02.22 | 1 | ... |
2 | 20.04.22 | 0 | ... |
2 | 27.04.22 | 0 | ... |
2 | 01.05.22 | 0 | ... |
2 | 05.05.22 | 1 | ... |
英文:
My Dataframe looks like this
Serialnr | Date | Event occured |
---|---|---|
1 | 10.01.22 | 0 |
1 | 05.02.22 | 0 |
1 | 12.02.22 | 1 |
1 | 20.04.22 | 0 |
1 | 27.04.22 | 0 |
1 | 01.05.22 | 0 |
1 | 05.05.22 | 1 |
2 | 10.01.22 | 0 |
2 | 05.02.22 | 0 |
2 | 12.02.22 | 1 |
2 | 20.04.22 | 0 |
2 | 27.04.22 | 0 |
2 | 01.05.22 | 0 |
2 | 05.05.22 | 1 |
And I have to count the days in each line the number of days left until the event occurs for the specific serialnumber.
I want the column 'Days_to_event', like this:
Serialnr | Date | Event occured | Days_to_event |
---|---|---|---|
1 | 10.01.22 | 0 | 33 |
1 | 05.02.22 | 0 | 7 |
1 | 12.02.22 | 1 | 82 |
1 | 20.04.22 | 0 | 15 |
1 | 27.04.22 | 0 | ... |
1 | 01.05.22 | 0 | ... |
1 | 05.05.22 | 1 | ... |
2 | 10.01.22 | 0 | ... |
2 | 05.02.22 | 0 | ... |
2 | 12.02.22 | 1 | ... |
2 | 20.04.22 | 0 | ... |
2 | 27.04.22 | 0 | ... |
2 | 01.05.22 | 0 | ... |
2 | 05.05.22 | 1 | ... |
答案1
得分: 1
使用自定义 groupby.transform
与 mask
/bfill
/shift
:
df['Date'] = pd.to_datetime(df['Date'], dayfirst=True)
m = df['Event occured'].eq(1)
next_event = (df['Date'].where(m).groupby(df['Serialnr'])
.transform(lambda g: g.bfill().shift(-1))
)
df['Days_to_event'] = df['Date'].rsub(next_event).dt.days
输出结果:
Serialnr Date Event occured Days_to_event
0 1 2022-01-10 0 33.0
1 1 2022-02-05 0 7.0
2 1 2022-02-12 1 82.0
3 1 2022-04-20 0 15.0
4 1 2022-04-27 0 8.0
5 1 2022-05-01 0 4.0
6 1 2022-05-05 1 NaN
7 2 2022-01-10 0 33.0
8 2 2022-02-05 0 7.0
9 2 2022-02-12 1 82.0
10 2 2022-04-20 0 15.0
11 2 2022-04-27 0 8.0
12 2 2022-05-01 0 4.0
13 2 2022-05-05 1 NaN
处理最后一个事件:
如果您希望最后一个引用是最大日期而不是 NaN:
df['Date'] = pd.to_datetime(df['Date'], dayfirst=True)
m = df['Event occured'].eq(1)
next_event = (df.groupby(df['Serialnr'])['Date']
.transform(lambda g: g.where(m).bfill().shift(-1).fillna(g.max()))
)
df['Days_to_event'] = df['Date'].rsub(next_event).dt.days
输出结果:
Serialnr Date Event occured Days_to_event
0 1 2022-01-10 0 33
1 1 2022-02-05 0 7
2 1 2022-02-12 1 82
3 1 2022-04-20 0 15
4 1 2022-04-27 0 8
5 1 2022-05-01 0 4
6 1 2022-05-05 1 2
7 1 2022-05-07 0 0
8 2 2022-01-10 0 33
9 2 2022-02-05 0 7
10 2 2022-02-12 1 82
11 2 2022-04-20 0 15
12 2 2022-04-27 0 8
13 2 2022-05-01 0 4
14 2 2022-05-05 1 0
英文:
Use a custom groupby.transform
with mask
/bfill
/shift
:
df['Date'] = pd.to_datetime(df['Date'], dayfirst=True)
m = df['Event occured'].eq(1)
next_event = (df['Date'].where(m).groupby(df['Serialnr'])
.transform(lambda g: g.bfill().shift(-1))
)
df['Days_to_event'] = df['Date'].rsub(next_event).dt.days
Output:
Serialnr Date Event occured Days_to_event
0 1 2022-01-10 0 33.0
1 1 2022-02-05 0 7.0
2 1 2022-02-12 1 82.0
3 1 2022-04-20 0 15.0
4 1 2022-04-27 0 8.0
5 1 2022-05-01 0 4.0
6 1 2022-05-05 1 NaN
7 2 2022-01-10 0 33.0
8 2 2022-02-05 0 7.0
9 2 2022-02-12 1 82.0
10 2 2022-04-20 0 15.0
11 2 2022-04-27 0 8.0
12 2 2022-05-01 0 4.0
13 2 2022-05-05 1 NaN
handling the last event
If you want the last reference to be the max date instead of NaNs:
df['Date'] = pd.to_datetime(df['Date'], dayfirst=True)
m = df['Event occured'].eq(1)
next_event = (df.groupby(df['Serialnr'])['Date']
.transform(lambda g: g.where(m).bfill().shift(-1).fillna(g.max()))
)
df['Days_to_event'] = df['Date'].rsub(next_event).dt.days
Output:
Serialnr Date Event occured Days_to_event
0 1 2022-01-10 0 33
1 1 2022-02-05 0 7
2 1 2022-02-12 1 82
3 1 2022-04-20 0 15
4 1 2022-04-27 0 8
5 1 2022-05-01 0 4
6 1 2022-05-05 1 2
7 1 2022-05-07 0 0
8 2 2022-01-10 0 33
9 2 2022-02-05 0 7
10 2 2022-02-12 1 82
11 2 2022-04-20 0 15
12 2 2022-04-27 0 8
13 2 2022-05-01 0 4
14 2 2022-05-05 1 0
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论