如何根据先前日期的另一列中的最大值分配新列值

huangapple go评论49阅读模式
英文:

How to assign new column value based on max value in another column preceding date

问题

我想在以下数据框中创建一个名为CDAT的新列。对于"PREG"事件,CDAT等于与相同的IDLACTFDAT组合的最后一个"BRED"事件的"DATE",该事件在"PREG"事件之前。

	ID	LACT	FDAT	EVENT	DATE
0	46	1	2011-09-23	BRED	2012-03-02
1	46	1	2011-09-23	PREG	2012-04-03
2	46	1	2011-09-23	PREG	2012-05-22
3	46	1	2011-09-23	PREG	2012-10-09
4	46	2	2012-11-15	FRESH	2012-11-15
5	46	2	2012-11-15	LUT	    2013-01-08
6	46	2	2012-11-15	OS	    2013-01-15
7	46	2	2012-11-15	BRED	2013-01-01
8	46	2	2012-11-15	BRED	2013-01-24
9	46	2	2012-11-15	PREG	2013-02-26
10	46	2	2012-11-16	BRED	2013-03-10

我想要的输出是

	ID	LACT	FDAT	EVENT	DATE         CDAT
0	46	1	2011-09-23	BRED	2012-03-02
1	46	1	2011-09-23	PREG	2012-04-03   2012-03-02
2	46	1	2011-09-23	PREG	2012-05-22   2012-03-02
3	46	1	2011-09-23	PREG	2012-10-09   2012-03-02
4	46	2	2012-11-15	FRESH	2012-11-15
5	46	2	2012-11-15	LUT	    2013-01-08
6	46	2	2012-11-15	OS	    2013-01-15
7	46	2	2012-11-15	BRED	2013-01-01
8	46	2	2012-11-15	BRED	2013-01-24
9	46	2	2012-11-15	PREG	2013-02-26   2013-01-24
10	46	2	2012-11-16	BRED	2013-03-10
英文:

I would like to create a new column called CDAT in the following dataframe. With CDAT equal to the "DATE" of the last "BRED" EVENT from the same ID, LACT and FDAT combination that preceded the "PREG" Event

Effectively I need to groupby on ID, LACT and FDAT and then when there is a "PREG" Event fill the New CDAT column with the "DATE" from the most recent "BRED" Event that precedes the date of the "PREG" EVENT.

An example of the data is presented below

	ID	LACT	FDAT	EVENT	DATE
0	46	1	2011-09-23	BRED	2012-03-02
1	46	1	2011-09-23	PREG	2012-04-03
2	46	1	2011-09-23	PREG	2012-05-22
3	46	1	2011-09-23	PREG	2012-10-09
4	46	2	2012-11-15	FRESH	2012-11-15
5	46	2	2012-11-15	LUT	    2013-01-08
6	46	2	2012-11-15	OS	    2013-01-15
7	46	2	2012-11-15	BRED	2013-01-01
8	46	2	2012-11-15	BRED	2013-01-24
9	46	2	2012-11-15	PREG	2013-02-26
10	46	2	2012-11-16	BRED	2013-03-10

The Output I would like to achieve is

	ID	LACT	FDAT	EVENT	DATE         CDAT
0	46	1	2011-09-23	BRED	2012-03-02
1	46	1	2011-09-23	PREG	2012-04-03   2012-03-02
2	46	1	2011-09-23	PREG	2012-05-22   2012-03-02
3	46	1	2011-09-23	PREG	2012-10-09   2012-03-02
4	46	2	2012-11-15	FRESH	2012-11-15
5	46	2	2012-11-15	LUT	    2013-01-08
6	46	2	2012-11-15	OS	    2013-01-15
7	46	2	2012-11-15	BRED	2013-01-01
8	46	2	2012-11-15	BRED	2013-01-24
9	46	2	2012-11-15	PREG	2013-02-26   2013-01-24
10	46	2	2012-11-16	BRED	2013-03-10

I cannot think of a way to incorporate the date and EVENT selection into a groupby statement that would achieve what I would like to do.

A list of the sample data is presented below

[[46,1,Timestamp('2011-09-23 00:00:00'),'BRED',Timestamp('2012-03-02 00:00:00')],
 [46,1,Timestamp('2011-09-23 00:00:00'),'PREG',Timestamp('2012-04-03 00:00:00')],
 [46,1,Timestamp('2011-09-23 00:00:00'),'PREG',Timestamp('2012-05-22 00:00:00')],
 [46,1,Timestamp('2011-09-23 00:00:00'),'PREG',Timestamp('2012-10-09 00:00:00')],
 [46,2,Timestamp('2012-11-15 00:00:00'),'FRESH',Timestamp('2012-11-15 00:00:00')],
 [46,2,Timestamp('2012-11-15 00:00:00'),'LUT',Timestamp('2013-01-08 00:00:00')],
 [46,2,Timestamp('2012-11-15 00:00:00'),'OS',Timestamp('2013-01-15 00:00:00')],
 [46,2,Timestamp('2012-11-15 00:00:00'),'BRED',Timestamp('2013-01-01 00:00:00')],
 [46,2,Timestamp('2012-11-15 00:00:00'),'BRED',Timestamp('2013-01-24 00:00:00')],
 [46,2,Timestamp('2012-11-15 00:00:00'),'PREG',Timestamp('2013-02-26 00:00:00')],
 [46,2,Timestamp('2012-11-16 00:00:00'),'BRED',Timestamp('2013-03-10 00:00:00')],
 [46,2,Timestamp('2012-11-15 00:00:00'),'PREG',Timestamp('2013-04-16 00:00:00')],
 [46,2,Timestamp('2001-11-15 00:00:00'),'PREG',Timestamp('2013-08-06 00:00:00')]]

答案1

得分: 1

这应该可以运行。

import pandas as pd
import numpy as np

df = pd.DataFrame([[46, 1, pd.Timestamp('2011-09-23'), 'BRED', pd.Timestamp('2012-03-02')],
                   [46, 1, pd.Timestamp('2011-09-23'), 'PREG', pd.Timestamp('2012-04-03')],
                   [46, 1, pd.Timestamp('2011-09-23'), 'PREG', pd.Timestamp('2012-05-22')],
                   [46, 1, pd.Timestamp('2011-09-23'), 'PREG', pd.Timestamp('2012-10-09')],
                   [46, 2, pd.Timestamp('2012-11-15'), 'FRESH', pd.Timestamp('2012-11-15')],
                   [46, 2, pd.Timestamp('2012-11-15'), 'LUT', pd.Timestamp('2013-01-08')],
                   [46, 2, pd.Timestamp('2012-11-15'), 'OS', pd.Timestamp('2013-01-15')],
                   [46, 2, pd.Timestamp('2012-11-15'), 'BRED', pd.Timestamp('2013-01-01')],
                   [46, 2, pd.Timestamp('2012-11-15'), 'BRED', pd.Timestamp('2013-01-24')],
                   [46, 2, pd.Timestamp('2012-11-15'), 'PREG', pd.Timestamp('2013-02-26')],
                   [46, 2, pd.Timestamp('2012-11-16'), 'BRED', pd.Timestamp('2013-03-10')]],
                  columns=['ID', 'LACT', 'FDAT', 'EVENT', 'DATE'])

df = df.sort_values(['ID', 'LACT', 'FDAT', 'DATE'])

last_bred_dates = []
for name, group in df.groupby(['ID', 'LACT', 'FDAT']):
    last_bred_date = np.nan
    for i, row in group.iterrows():
        if row['EVENT'] == 'BRED':
            last_bred_date = row['DATE']
            last_bred_dates.append(np.nan)
        elif row['EVENT'] == 'PREG':
            last_bred_dates.append(last_bred_date)
        else:
            last_bred_dates.append(np.nan)

df['CDAT'] = pd.Series(last_bred_dates)

输出:

ID LACT FDAT EVENT DATE CDAT
0 46 1 2011-09-23 00:00:00 BRED 2012-03-02 00:00:00 NaT
1 46 1 2011-09-23 00:00:00 PREG 2012-04-03 00:00:00 2012-03-02 00:00:00
2 46 1 2011-09-23 00:00:00 PREG 2012-05-22 00:00:00 2012-03:02 00:00:00
3 46 1 2011-09-23 00:00:00 PREG 2012-10-09 00:00:00 2012-03-02 00:00:00
4 46 2 2012-11-15 00:00:00 FRESH 2012-11-15 00:00:00 NaT
7 46 2 2012-11-15 00:00:00 BRED 2013-01-01 00:00:00 NaT
5 46 2 2012-11-15 00:00:00 LUT 2013-01-08 00:00:00 NaT
6 46 2 2012-11-15 00:00:00 OS 2013-01-15 00:00:00 NaT
8 46 2 2012-11-15 00:00:00 BRED 2013-01-24 00:00:00 NaT
9 46 2 2012-11-15 00:00:00 PREG 2013-02-26 00:00:00 2013-01-24 00:00:00
10 46 2 2012-11-16 00:00:00 BRED 2013-03-10 00:00:00 NaT
英文:

This should work..

import pandas as pd
import numpy as np

df = pd.DataFrame([[46,1,pd.Timestamp('2011-09-23'),'BRED',pd.Timestamp('2012-03-02')],
                     [46,1,pd.Timestamp('2011-09-23'),'PREG',pd.Timestamp('2012-04-03')],
                     [46,1,pd.Timestamp('2011-09-23'),'PREG',pd.Timestamp('2012-05-22')],
                     [46,1,pd.Timestamp('2011-09-23'),'PREG',pd.Timestamp('2012-10-09')],
                     [46,2,pd.Timestamp('2012-11-15'),'FRESH',pd.Timestamp('2012-11-15')],
                     [46,2,pd.Timestamp('2012-11-15'),'LUT',pd.Timestamp('2013-01-08')],
                     [46,2,pd.Timestamp('2012-11-15'),'OS',pd.Timestamp('2013-01-15')],
                     [46,2,pd.Timestamp('2012-11-15'),'BRED',pd.Timestamp('2013-01-01')],
                     [46,2,pd.Timestamp('2012-11-15'),'BRED',pd.Timestamp('2013-01-24')],
                     [46,2,pd.Timestamp('2012-11-15'),'PREG',pd.Timestamp('2013-02-26')],
                     [46,2,pd.Timestamp('2012-11-16'),'BRED',pd.Timestamp('2013-03-10')]],
                    columns=['ID', 'LACT', 'FDAT', 'EVENT', 'DATE'])

df = df.sort_values(['ID', 'LACT', 'FDAT', 'DATE'])

last_bred_dates = []
for name, group in df.groupby(['ID', 'LACT', 'FDAT']):
    last_bred_date = np.nan
    for i, row in group.iterrows():
        if row['EVENT'] == 'BRED':
            last_bred_date = row['DATE']
            last_bred_dates.append(np.nan)
        elif row['EVENT'] == 'PREG':
            last_bred_dates.append(last_bred_date)
        else:
            last_bred_dates.append(np.nan)
        
df['CDAT'] = pd.Series(last_bred_dates)

Output:

ID LACT FDAT EVENT DATE CDAT
0 46 1 2011-09-23 00:00:00 BRED 2012-03-02 00:00:00 NaT
1 46 1 2011-09-23 00:00:00 PREG 2012-04-03 00:00:00 2012-03-02 00:00:00
2 46 1 2011-09-23 00:00:00 PREG 2012-05-22 00:00:00 2012-03-02 00:00:00
3 46 1 2011-09-23 00:00:00 PREG 2012-10-09 00:00:00 2012-03-02 00:00:00
4 46 2 2012-11-15 00:00:00 FRESH 2012-11-15 00:00:00 NaT
7 46 2 2012-11-15 00:00:00 BRED 2013-01-01 00:00:00 NaT
5 46 2 2012-11-15 00:00:00 LUT 2013-01-08 00:00:00 NaT
6 46 2 2012-11-15 00:00:00 OS 2013-01-15 00:00:00 NaT
8 46 2 2012-11-15 00:00:00 BRED 2013-01-24 00:00:00 NaT
9 46 2 2012-11-15 00:00:00 PREG 2013-02-26 00:00:00 2013-01-24 00:00:00
10 46 2 2012-11-16 00:00:00 BRED 2013-03-10 00:00:00 NaT

Explanation:

Group the df based on ['ID', 'LACT', 'FDAT'] to get the desired groups. Then create an empty list and iterate on that groups, if the EVENTof that row is a BRED Event save the DATE value and append a NaN to the list, if the EVENTof that row is a PREG Event append the saved value to the list, with any other event append a NaN to the list. Finally use that list to create the new CDAT column.
Note that before iterating on each group the variable last_bred_date is assigned with a NaN in order to append to the list only the dates of that group.

huangapple
  • 本文由 发表于 2023年2月27日 08:07:36
  • 转载请务必保留本文链接:https://go.coder-hub.com/75575796.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定