2023年2月27日 08:07:36go评论66阅读模式

英文:

How to assign new column value based on max value in another column preceding date

问题

我想在以下数据框中创建一个名为CDAT的新列。对于"PREG"事件，CDAT等于与相同的ID、LACT和FDAT组合的最后一个"BRED"事件的"DATE"，该事件在"PREG"事件之前。

	ID	LACT	FDAT	EVENT	DATE
0	46	1	2011-09-23	BRED	2012-03-02
1	46	1	2011-09-23	PREG	2012-04-03
2	46	1	2011-09-23	PREG	2012-05-22
3	46	1	2011-09-23	PREG	2012-10-09
4	46	2	2012-11-15	FRESH	2012-11-15
5	46	2	2012-11-15	LUT	    2013-01-08
6	46	2	2012-11-15	OS	    2013-01-15
7	46	2	2012-11-15	BRED	2013-01-01
8	46	2	2012-11-15	BRED	2013-01-24
9	46	2	2012-11-15	PREG	2013-02-26
10	46	2	2012-11-16	BRED	2013-03-10

我想要的输出是

	ID	LACT	FDAT	EVENT	DATE         CDAT
0	46	1	2011-09-23	BRED	2012-03-02
1	46	1	2011-09-23	PREG	2012-04-03   2012-03-02
2	46	1	2011-09-23	PREG	2012-05-22   2012-03-02
3	46	1	2011-09-23	PREG	2012-10-09   2012-03-02
4	46	2	2012-11-15	FRESH	2012-11-15
5	46	2	2012-11-15	LUT	    2013-01-08
6	46	2	2012-11-15	OS	    2013-01-15
7	46	2	2012-11-15	BRED	2013-01-01
8	46	2	2012-11-15	BRED	2013-01-24
9	46	2	2012-11-15	PREG	2013-02-26   2013-01-24
10	46	2	2012-11-16	BRED	2013-03-10

英文:

I would like to create a new column called CDAT in the following dataframe. With CDAT equal to the "DATE" of the last "BRED" EVENT from the same ID, LACT and FDAT combination that preceded the "PREG" Event

Effectively I need to groupby on ID, LACT and FDAT and then when there is a "PREG" Event fill the New CDAT column with the "DATE" from the most recent "BRED" Event that precedes the date of the "PREG" EVENT.

An example of the data is presented below

	ID	LACT	FDAT	EVENT	DATE
0	46	1	2011-09-23	BRED	2012-03-02
1	46	1	2011-09-23	PREG	2012-04-03
2	46	1	2011-09-23	PREG	2012-05-22
3	46	1	2011-09-23	PREG	2012-10-09
4	46	2	2012-11-15	FRESH	2012-11-15
5	46	2	2012-11-15	LUT	    2013-01-08
6	46	2	2012-11-15	OS	    2013-01-15
7	46	2	2012-11-15	BRED	2013-01-01
8	46	2	2012-11-15	BRED	2013-01-24
9	46	2	2012-11-15	PREG	2013-02-26
10	46	2	2012-11-16	BRED	2013-03-10

The Output I would like to achieve is

	ID	LACT	FDAT	EVENT	DATE         CDAT
0	46	1	2011-09-23	BRED	2012-03-02
1	46	1	2011-09-23	PREG	2012-04-03   2012-03-02
2	46	1	2011-09-23	PREG	2012-05-22   2012-03-02
3	46	1	2011-09-23	PREG	2012-10-09   2012-03-02
4	46	2	2012-11-15	FRESH	2012-11-15
5	46	2	2012-11-15	LUT	    2013-01-08
6	46	2	2012-11-15	OS	    2013-01-15
7	46	2	2012-11-15	BRED	2013-01-01
8	46	2	2012-11-15	BRED	2013-01-24
9	46	2	2012-11-15	PREG	2013-02-26   2013-01-24
10	46	2	2012-11-16	BRED	2013-03-10

I cannot think of a way to incorporate the date and EVENT selection into a groupby statement that would achieve what I would like to do.

A list of the sample data is presented below

[[46,1,Timestamp(&#39;2011-09-23 00:00:00&#39;),&#39;BRED&#39;,Timestamp(&#39;2012-03-02 00:00:00&#39;)],
 [46,1,Timestamp(&#39;2011-09-23 00:00:00&#39;),&#39;PREG&#39;,Timestamp(&#39;2012-04-03 00:00:00&#39;)],
 [46,1,Timestamp(&#39;2011-09-23 00:00:00&#39;),&#39;PREG&#39;,Timestamp(&#39;2012-05-22 00:00:00&#39;)],
 [46,1,Timestamp(&#39;2011-09-23 00:00:00&#39;),&#39;PREG&#39;,Timestamp(&#39;2012-10-09 00:00:00&#39;)],
 [46,2,Timestamp(&#39;2012-11-15 00:00:00&#39;),&#39;FRESH&#39;,Timestamp(&#39;2012-11-15 00:00:00&#39;)],
 [46,2,Timestamp(&#39;2012-11-15 00:00:00&#39;),&#39;LUT&#39;,Timestamp(&#39;2013-01-08 00:00:00&#39;)],
 [46,2,Timestamp(&#39;2012-11-15 00:00:00&#39;),&#39;OS&#39;,Timestamp(&#39;2013-01-15 00:00:00&#39;)],
 [46,2,Timestamp(&#39;2012-11-15 00:00:00&#39;),&#39;BRED&#39;,Timestamp(&#39;2013-01-01 00:00:00&#39;)],
 [46,2,Timestamp(&#39;2012-11-15 00:00:00&#39;),&#39;BRED&#39;,Timestamp(&#39;2013-01-24 00:00:00&#39;)],
 [46,2,Timestamp(&#39;2012-11-15 00:00:00&#39;),&#39;PREG&#39;,Timestamp(&#39;2013-02-26 00:00:00&#39;)],
 [46,2,Timestamp(&#39;2012-11-16 00:00:00&#39;),&#39;BRED&#39;,Timestamp(&#39;2013-03-10 00:00:00&#39;)],
 [46,2,Timestamp(&#39;2012-11-15 00:00:00&#39;),&#39;PREG&#39;,Timestamp(&#39;2013-04-16 00:00:00&#39;)],
 [46,2,Timestamp(&#39;2001-11-15 00:00:00&#39;),&#39;PREG&#39;,Timestamp(&#39;2013-08-06 00:00:00&#39;)]]

答案1

得分: 1

这应该可以运行。

import pandas as pd
import numpy as np

df = pd.DataFrame([[46, 1, pd.Timestamp('2011-09-23'), 'BRED', pd.Timestamp('2012-03-02')],
                   [46, 1, pd.Timestamp('2011-09-23'), 'PREG', pd.Timestamp('2012-04-03')],
                   [46, 1, pd.Timestamp('2011-09-23'), 'PREG', pd.Timestamp('2012-05-22')],
                   [46, 1, pd.Timestamp('2011-09-23'), 'PREG', pd.Timestamp('2012-10-09')],
                   [46, 2, pd.Timestamp('2012-11-15'), 'FRESH', pd.Timestamp('2012-11-15')],
                   [46, 2, pd.Timestamp('2012-11-15'), 'LUT', pd.Timestamp('2013-01-08')],
                   [46, 2, pd.Timestamp('2012-11-15'), 'OS', pd.Timestamp('2013-01-15')],
                   [46, 2, pd.Timestamp('2012-11-15'), 'BRED', pd.Timestamp('2013-01-01')],
                   [46, 2, pd.Timestamp('2012-11-15'), 'BRED', pd.Timestamp('2013-01-24')],
                   [46, 2, pd.Timestamp('2012-11-15'), 'PREG', pd.Timestamp('2013-02-26')],
                   [46, 2, pd.Timestamp('2012-11-16'), 'BRED', pd.Timestamp('2013-03-10')]],
                  columns=['ID', 'LACT', 'FDAT', 'EVENT', 'DATE'])

df = df.sort_values(['ID', 'LACT', 'FDAT', 'DATE'])

last_bred_dates = []
for name, group in df.groupby(['ID', 'LACT', 'FDAT']):
    last_bred_date = np.nan
    for i, row in group.iterrows():
        if row['EVENT'] == 'BRED':
            last_bred_date = row['DATE']
            last_bred_dates.append(np.nan)
        elif row['EVENT'] == 'PREG':
            last_bred_dates.append(last_bred_date)
        else:
            last_bred_dates.append(np.nan)

df['CDAT'] = pd.Series(last_bred_dates)

输出：

	ID	LACT	FDAT	EVENT	DATE	CDAT
0	46	1	2011-09-23 00:00:00	BRED	2012-03-02 00:00:00	NaT
1	46	1	2011-09-23 00:00:00	PREG	2012-04-03 00:00:00	2012-03-02 00:00:00
2	46	1	2011-09-23 00:00:00	PREG	2012-05-22 00:00:00	2012-03:02 00:00:00
3	46	1	2011-09-23 00:00:00	PREG	2012-10-09 00:00:00	2012-03-02 00:00:00
4	46	2	2012-11-15 00:00:00	FRESH	2012-11-15 00:00:00	NaT
7	46	2	2012-11-15 00:00:00	BRED	2013-01-01 00:00:00	NaT
5	46	2	2012-11-15 00:00:00	LUT	2013-01-08 00:00:00	NaT
6	46	2	2012-11-15 00:00:00	OS	2013-01-15 00:00:00	NaT
8	46	2	2012-11-15 00:00:00	BRED	2013-01-24 00:00:00	NaT
9	46	2	2012-11-15 00:00:00	PREG	2013-02-26 00:00:00	2013-01-24 00:00:00
10	46	2	2012-11-16 00:00:00	BRED	2013-03-10 00:00:00	NaT

英文:

This should work..

import pandas as pd
import numpy as np

df = pd.DataFrame([[46,1,pd.Timestamp(&#39;2011-09-23&#39;),&#39;BRED&#39;,pd.Timestamp(&#39;2012-03-02&#39;)],
                     [46,1,pd.Timestamp(&#39;2011-09-23&#39;),&#39;PREG&#39;,pd.Timestamp(&#39;2012-04-03&#39;)],
                     [46,1,pd.Timestamp(&#39;2011-09-23&#39;),&#39;PREG&#39;,pd.Timestamp(&#39;2012-05-22&#39;)],
                     [46,1,pd.Timestamp(&#39;2011-09-23&#39;),&#39;PREG&#39;,pd.Timestamp(&#39;2012-10-09&#39;)],
                     [46,2,pd.Timestamp(&#39;2012-11-15&#39;),&#39;FRESH&#39;,pd.Timestamp(&#39;2012-11-15&#39;)],
                     [46,2,pd.Timestamp(&#39;2012-11-15&#39;),&#39;LUT&#39;,pd.Timestamp(&#39;2013-01-08&#39;)],
                     [46,2,pd.Timestamp(&#39;2012-11-15&#39;),&#39;OS&#39;,pd.Timestamp(&#39;2013-01-15&#39;)],
                     [46,2,pd.Timestamp(&#39;2012-11-15&#39;),&#39;BRED&#39;,pd.Timestamp(&#39;2013-01-01&#39;)],
                     [46,2,pd.Timestamp(&#39;2012-11-15&#39;),&#39;BRED&#39;,pd.Timestamp(&#39;2013-01-24&#39;)],
                     [46,2,pd.Timestamp(&#39;2012-11-15&#39;),&#39;PREG&#39;,pd.Timestamp(&#39;2013-02-26&#39;)],
                     [46,2,pd.Timestamp(&#39;2012-11-16&#39;),&#39;BRED&#39;,pd.Timestamp(&#39;2013-03-10&#39;)]],
                    columns=[&#39;ID&#39;, &#39;LACT&#39;, &#39;FDAT&#39;, &#39;EVENT&#39;, &#39;DATE&#39;])

df = df.sort_values([&#39;ID&#39;, &#39;LACT&#39;, &#39;FDAT&#39;, &#39;DATE&#39;])

last_bred_dates = []
for name, group in df.groupby([&#39;ID&#39;, &#39;LACT&#39;, &#39;FDAT&#39;]):
    last_bred_date = np.nan
    for i, row in group.iterrows():
        if row[&#39;EVENT&#39;] == &#39;BRED&#39;:
            last_bred_date = row[&#39;DATE&#39;]
            last_bred_dates.append(np.nan)
        elif row[&#39;EVENT&#39;] == &#39;PREG&#39;:
            last_bred_dates.append(last_bred_date)
        else:
            last_bred_dates.append(np.nan)
        
df[&#39;CDAT&#39;] = pd.Series(last_bred_dates)

Output:

	ID	LACT	FDAT	EVENT	DATE	CDAT
0	46	1	2011-09-23 00:00:00	BRED	2012-03-02 00:00:00	NaT
1	46	1	2011-09-23 00:00:00	PREG	2012-04-03 00:00:00	2012-03-02 00:00:00
2	46	1	2011-09-23 00:00:00	PREG	2012-05-22 00:00:00	2012-03-02 00:00:00
3	46	1	2011-09-23 00:00:00	PREG	2012-10-09 00:00:00	2012-03-02 00:00:00
4	46	2	2012-11-15 00:00:00	FRESH	2012-11-15 00:00:00	NaT
7	46	2	2012-11-15 00:00:00	BRED	2013-01-01 00:00:00	NaT
5	46	2	2012-11-15 00:00:00	LUT	2013-01-08 00:00:00	NaT
6	46	2	2012-11-15 00:00:00	OS	2013-01-15 00:00:00	NaT
8	46	2	2012-11-15 00:00:00	BRED	2013-01-24 00:00:00	NaT
9	46	2	2012-11-15 00:00:00	PREG	2013-02-26 00:00:00	2013-01-24 00:00:00
10	46	2	2012-11-16 00:00:00	BRED	2013-03-10 00:00:00	NaT

Explanation:

Group the df based on ['ID', 'LACT', 'FDAT'] to get the desired groups. Then create an empty list and iterate on that groups, if the EVENTof that row is a BRED Event save the DATE value and append a NaN to the list, if the EVENTof that row is a PREG Event append the saved value to the list, with any other event append a NaN to the list. Finally use that list to create the new CDAT column.
Note that before iterating on each group the variable last_bred_date is assigned with a NaN in order to append to the list only the dates of that group.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何根据先前日期的另一列中的最大值分配新列值

问题

答案1

如何从一个字符串创建一个LangChain文档

错误: “无法执行操作：另一个操作正在进行” 仅在从pytest/TestClient运行时发生

使用字符串而不是数值在Python中创建3D散点图

Python在循环中增加参数数量

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论