英文:
How to assign new column value based on max value in another column preceding date
问题
我想在以下数据框中创建一个名为CDAT
的新列。对于"PREG"
事件,CDAT
等于与相同的ID
、LACT
和FDAT
组合的最后一个"BRED"
事件的"DATE"
,该事件在"PREG"
事件之前。
ID LACT FDAT EVENT DATE
0 46 1 2011-09-23 BRED 2012-03-02
1 46 1 2011-09-23 PREG 2012-04-03
2 46 1 2011-09-23 PREG 2012-05-22
3 46 1 2011-09-23 PREG 2012-10-09
4 46 2 2012-11-15 FRESH 2012-11-15
5 46 2 2012-11-15 LUT 2013-01-08
6 46 2 2012-11-15 OS 2013-01-15
7 46 2 2012-11-15 BRED 2013-01-01
8 46 2 2012-11-15 BRED 2013-01-24
9 46 2 2012-11-15 PREG 2013-02-26
10 46 2 2012-11-16 BRED 2013-03-10
我想要的输出是
ID LACT FDAT EVENT DATE CDAT
0 46 1 2011-09-23 BRED 2012-03-02
1 46 1 2011-09-23 PREG 2012-04-03 2012-03-02
2 46 1 2011-09-23 PREG 2012-05-22 2012-03-02
3 46 1 2011-09-23 PREG 2012-10-09 2012-03-02
4 46 2 2012-11-15 FRESH 2012-11-15
5 46 2 2012-11-15 LUT 2013-01-08
6 46 2 2012-11-15 OS 2013-01-15
7 46 2 2012-11-15 BRED 2013-01-01
8 46 2 2012-11-15 BRED 2013-01-24
9 46 2 2012-11-15 PREG 2013-02-26 2013-01-24
10 46 2 2012-11-16 BRED 2013-03-10
英文:
I would like to create a new column called CDAT
in the following dataframe. With CDAT
equal to the "DATE"
of the last "BRED" EVENT
from the same ID
, LACT
and FDAT
combination that preceded the "PREG" Event
Effectively I need to groupby on ID
, LACT
and FDAT
and then when there is a "PREG" Event
fill the New CDAT
column with the "DATE"
from the most recent "BRED" Event
that precedes the date of the "PREG" EVENT
.
An example of the data is presented below
ID LACT FDAT EVENT DATE
0 46 1 2011-09-23 BRED 2012-03-02
1 46 1 2011-09-23 PREG 2012-04-03
2 46 1 2011-09-23 PREG 2012-05-22
3 46 1 2011-09-23 PREG 2012-10-09
4 46 2 2012-11-15 FRESH 2012-11-15
5 46 2 2012-11-15 LUT 2013-01-08
6 46 2 2012-11-15 OS 2013-01-15
7 46 2 2012-11-15 BRED 2013-01-01
8 46 2 2012-11-15 BRED 2013-01-24
9 46 2 2012-11-15 PREG 2013-02-26
10 46 2 2012-11-16 BRED 2013-03-10
The Output I would like to achieve is
ID LACT FDAT EVENT DATE CDAT
0 46 1 2011-09-23 BRED 2012-03-02
1 46 1 2011-09-23 PREG 2012-04-03 2012-03-02
2 46 1 2011-09-23 PREG 2012-05-22 2012-03-02
3 46 1 2011-09-23 PREG 2012-10-09 2012-03-02
4 46 2 2012-11-15 FRESH 2012-11-15
5 46 2 2012-11-15 LUT 2013-01-08
6 46 2 2012-11-15 OS 2013-01-15
7 46 2 2012-11-15 BRED 2013-01-01
8 46 2 2012-11-15 BRED 2013-01-24
9 46 2 2012-11-15 PREG 2013-02-26 2013-01-24
10 46 2 2012-11-16 BRED 2013-03-10
I cannot think of a way to incorporate the date and EVENT selection into a groupby statement that would achieve what I would like to do.
A list of the sample data is presented below
[[46,1,Timestamp('2011-09-23 00:00:00'),'BRED',Timestamp('2012-03-02 00:00:00')],
[46,1,Timestamp('2011-09-23 00:00:00'),'PREG',Timestamp('2012-04-03 00:00:00')],
[46,1,Timestamp('2011-09-23 00:00:00'),'PREG',Timestamp('2012-05-22 00:00:00')],
[46,1,Timestamp('2011-09-23 00:00:00'),'PREG',Timestamp('2012-10-09 00:00:00')],
[46,2,Timestamp('2012-11-15 00:00:00'),'FRESH',Timestamp('2012-11-15 00:00:00')],
[46,2,Timestamp('2012-11-15 00:00:00'),'LUT',Timestamp('2013-01-08 00:00:00')],
[46,2,Timestamp('2012-11-15 00:00:00'),'OS',Timestamp('2013-01-15 00:00:00')],
[46,2,Timestamp('2012-11-15 00:00:00'),'BRED',Timestamp('2013-01-01 00:00:00')],
[46,2,Timestamp('2012-11-15 00:00:00'),'BRED',Timestamp('2013-01-24 00:00:00')],
[46,2,Timestamp('2012-11-15 00:00:00'),'PREG',Timestamp('2013-02-26 00:00:00')],
[46,2,Timestamp('2012-11-16 00:00:00'),'BRED',Timestamp('2013-03-10 00:00:00')],
[46,2,Timestamp('2012-11-15 00:00:00'),'PREG',Timestamp('2013-04-16 00:00:00')],
[46,2,Timestamp('2001-11-15 00:00:00'),'PREG',Timestamp('2013-08-06 00:00:00')]]
答案1
得分: 1
这应该可以运行。
import pandas as pd
import numpy as np
df = pd.DataFrame([[46, 1, pd.Timestamp('2011-09-23'), 'BRED', pd.Timestamp('2012-03-02')],
[46, 1, pd.Timestamp('2011-09-23'), 'PREG', pd.Timestamp('2012-04-03')],
[46, 1, pd.Timestamp('2011-09-23'), 'PREG', pd.Timestamp('2012-05-22')],
[46, 1, pd.Timestamp('2011-09-23'), 'PREG', pd.Timestamp('2012-10-09')],
[46, 2, pd.Timestamp('2012-11-15'), 'FRESH', pd.Timestamp('2012-11-15')],
[46, 2, pd.Timestamp('2012-11-15'), 'LUT', pd.Timestamp('2013-01-08')],
[46, 2, pd.Timestamp('2012-11-15'), 'OS', pd.Timestamp('2013-01-15')],
[46, 2, pd.Timestamp('2012-11-15'), 'BRED', pd.Timestamp('2013-01-01')],
[46, 2, pd.Timestamp('2012-11-15'), 'BRED', pd.Timestamp('2013-01-24')],
[46, 2, pd.Timestamp('2012-11-15'), 'PREG', pd.Timestamp('2013-02-26')],
[46, 2, pd.Timestamp('2012-11-16'), 'BRED', pd.Timestamp('2013-03-10')]],
columns=['ID', 'LACT', 'FDAT', 'EVENT', 'DATE'])
df = df.sort_values(['ID', 'LACT', 'FDAT', 'DATE'])
last_bred_dates = []
for name, group in df.groupby(['ID', 'LACT', 'FDAT']):
last_bred_date = np.nan
for i, row in group.iterrows():
if row['EVENT'] == 'BRED':
last_bred_date = row['DATE']
last_bred_dates.append(np.nan)
elif row['EVENT'] == 'PREG':
last_bred_dates.append(last_bred_date)
else:
last_bred_dates.append(np.nan)
df['CDAT'] = pd.Series(last_bred_dates)
输出:
ID | LACT | FDAT | EVENT | DATE | CDAT | |
---|---|---|---|---|---|---|
0 | 46 | 1 | 2011-09-23 00:00:00 | BRED | 2012-03-02 00:00:00 | NaT |
1 | 46 | 1 | 2011-09-23 00:00:00 | PREG | 2012-04-03 00:00:00 | 2012-03-02 00:00:00 |
2 | 46 | 1 | 2011-09-23 00:00:00 | PREG | 2012-05-22 00:00:00 | 2012-03:02 00:00:00 |
3 | 46 | 1 | 2011-09-23 00:00:00 | PREG | 2012-10-09 00:00:00 | 2012-03-02 00:00:00 |
4 | 46 | 2 | 2012-11-15 00:00:00 | FRESH | 2012-11-15 00:00:00 | NaT |
7 | 46 | 2 | 2012-11-15 00:00:00 | BRED | 2013-01-01 00:00:00 | NaT |
5 | 46 | 2 | 2012-11-15 00:00:00 | LUT | 2013-01-08 00:00:00 | NaT |
6 | 46 | 2 | 2012-11-15 00:00:00 | OS | 2013-01-15 00:00:00 | NaT |
8 | 46 | 2 | 2012-11-15 00:00:00 | BRED | 2013-01-24 00:00:00 | NaT |
9 | 46 | 2 | 2012-11-15 00:00:00 | PREG | 2013-02-26 00:00:00 | 2013-01-24 00:00:00 |
10 | 46 | 2 | 2012-11-16 00:00:00 | BRED | 2013-03-10 00:00:00 | NaT |
英文:
This should work..
import pandas as pd
import numpy as np
df = pd.DataFrame([[46,1,pd.Timestamp('2011-09-23'),'BRED',pd.Timestamp('2012-03-02')],
[46,1,pd.Timestamp('2011-09-23'),'PREG',pd.Timestamp('2012-04-03')],
[46,1,pd.Timestamp('2011-09-23'),'PREG',pd.Timestamp('2012-05-22')],
[46,1,pd.Timestamp('2011-09-23'),'PREG',pd.Timestamp('2012-10-09')],
[46,2,pd.Timestamp('2012-11-15'),'FRESH',pd.Timestamp('2012-11-15')],
[46,2,pd.Timestamp('2012-11-15'),'LUT',pd.Timestamp('2013-01-08')],
[46,2,pd.Timestamp('2012-11-15'),'OS',pd.Timestamp('2013-01-15')],
[46,2,pd.Timestamp('2012-11-15'),'BRED',pd.Timestamp('2013-01-01')],
[46,2,pd.Timestamp('2012-11-15'),'BRED',pd.Timestamp('2013-01-24')],
[46,2,pd.Timestamp('2012-11-15'),'PREG',pd.Timestamp('2013-02-26')],
[46,2,pd.Timestamp('2012-11-16'),'BRED',pd.Timestamp('2013-03-10')]],
columns=['ID', 'LACT', 'FDAT', 'EVENT', 'DATE'])
df = df.sort_values(['ID', 'LACT', 'FDAT', 'DATE'])
last_bred_dates = []
for name, group in df.groupby(['ID', 'LACT', 'FDAT']):
last_bred_date = np.nan
for i, row in group.iterrows():
if row['EVENT'] == 'BRED':
last_bred_date = row['DATE']
last_bred_dates.append(np.nan)
elif row['EVENT'] == 'PREG':
last_bred_dates.append(last_bred_date)
else:
last_bred_dates.append(np.nan)
df['CDAT'] = pd.Series(last_bred_dates)
Output:
ID | LACT | FDAT | EVENT | DATE | CDAT | |
---|---|---|---|---|---|---|
0 | 46 | 1 | 2011-09-23 00:00:00 | BRED | 2012-03-02 00:00:00 | NaT |
1 | 46 | 1 | 2011-09-23 00:00:00 | PREG | 2012-04-03 00:00:00 | 2012-03-02 00:00:00 |
2 | 46 | 1 | 2011-09-23 00:00:00 | PREG | 2012-05-22 00:00:00 | 2012-03-02 00:00:00 |
3 | 46 | 1 | 2011-09-23 00:00:00 | PREG | 2012-10-09 00:00:00 | 2012-03-02 00:00:00 |
4 | 46 | 2 | 2012-11-15 00:00:00 | FRESH | 2012-11-15 00:00:00 | NaT |
7 | 46 | 2 | 2012-11-15 00:00:00 | BRED | 2013-01-01 00:00:00 | NaT |
5 | 46 | 2 | 2012-11-15 00:00:00 | LUT | 2013-01-08 00:00:00 | NaT |
6 | 46 | 2 | 2012-11-15 00:00:00 | OS | 2013-01-15 00:00:00 | NaT |
8 | 46 | 2 | 2012-11-15 00:00:00 | BRED | 2013-01-24 00:00:00 | NaT |
9 | 46 | 2 | 2012-11-15 00:00:00 | PREG | 2013-02-26 00:00:00 | 2013-01-24 00:00:00 |
10 | 46 | 2 | 2012-11-16 00:00:00 | BRED | 2013-03-10 00:00:00 | NaT |
Explanation:
Group the df based on ['ID', 'LACT', 'FDAT']
to get the desired groups. Then create an empty list and iterate on that groups, if the EVENT
of that row is a BRED Event
save the DATE
value and append a NaN to the list, if the EVENT
of that row is a PREG Event
append the saved value to the list, with any other event append a NaN to the list. Finally use that list to create the new CDAT
column.
Note that before iterating on each group the variable last_bred_date
is assigned with a NaN in order to append to the list only the dates of that group.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论