英文:
Merging two dataframes based on last date
问题
我有两个数据框,我想根据ID和日期合并它们。
第一个数据框如下所示:
ID Date EoM_Val
---------------------------------------
AAA 2021-06-30 1417744
BBB 2021-06-30 3946750
AAA 2021-07-31 2792182
BBB 2021-07-31 81073822
而第二个数据框类似于这样:
ID Date Day_Val
---------------------------------------
AAA 2021-06-05 14
AAA 2021-06-12 11
AAA 2021-06-21 15
BBB 2021-06-06 33
BBB 2021-06-18 35
BBB 2021-06-27 55
AAA 2021-07-08 6
AAA 2021-07-12 8
BBB 2021-07-15 9
BBB 2021-07-31 10
(还请注意Date
列是String
类型)
我想要做的是将这两个数据框合并在一起,以便每个月和每个ID的Date
列的最后一个值将具有EoM_Val
,以便最终的合并看起来像这样:
ID Date Day_Val EoM_Val
----------------------------------------------
AAA 2021-06-05 14
AAA 2021-06-12 11
AAA 2021-06-21 15 1417744
BBB 2021-06-06 33
BBB 2021-06-18 35
BBB 2021-06-27 55 3946750
AAA 2021-07-08 6
AAA 2021-07-12 8 2792182
BBB 2021-07-15 9
BBB 2021-07-31 10 81073822
不幸的是,我在这方面遇到了相当大的困难,所以如果有人可以帮助我,我将非常感激。谢谢!
英文:
I have two dataframes which I would like to merge together based on the ids and the dates
The first dataframe looks like this:
ID Date EoM_Val
---------------------------------------
AAA 2021-06-30 1417744
BBB 2021-06-30 3946750
AAA 2021-07-31 2792182
BBB 2021-07-31 81073822
While the second dataframe looks similar to this:
ID Date Day_Val
---------------------------------------
AAA 2021-06-05 14
AAA 2021-06-12 11
AAA 2021-06-21 15
BBB 2021-06-06 33
BBB 2021-06-18 35
BBB 2021-06-27 55
AAA 2021-07-08 6
AAA 2021-07-12 8
BBB 2021-07-15 9
BBB 2021-07-31 10
(Note too that the Date
column are String
types)
What I would like to do is merge the two dataframes together such that the last value in the Date
column for each month and for each ID
will have the EoM_Val
, so that the final merge looks like this:
ID Date Day_Val EoM_Val
----------------------------------------------
AAA 2021-06-05 14
AAA 2021-06-12 11
AAA 2021-06-21 15 1417744
BBB 2021-06-06 33
BBB 2021-06-18 35
BBB 2021-06-27 55 3946750
AAA 2021-07-08 6
AAA 2021-07-12 8 2792182
BBB 2021-07-15 9
BBB 2021-07-31 10 81073822
Unfortunately, I'm having quite a bit of difficulty with it, so if anyone could help I would greatly appreciate it. Thanks!
答案1
得分: 1
假设你的第一个数据框命名为 df1
,第二个数据框命名为 df2
。
为每个数据框创建一个月份列:
df1['月份'] = pd.to_datetime(df1['日期']).dt.month
然后,将两个数据框按照 ID
和 月份
分组,取每组中的 最后一次
出现的值,例如对于 df2
:
df2_分组 = df2.groupby(['ID', '月份']).last()
这将得到:
日期 Day_Val
ID 月份
AAA 6 2021-06-21 15
BBB 6 2021-06-27 55
AAA 7 2021-07-12 8
BBB 7 2021-07-31 10
这样你可以识别出需要显示 EoM_Value
的行。
然后,你可以将 df1_分组
和 df2_分组
合并成一个名为 df_merged
的数据框,其中包含 ID
,Day_Val
和 EoM_Value
。
最后,在 df2
中创建一个名为 EoM_Val
的列,并用 NaN 值填充。唯一需要做的是将这个更新后的 df2
与 df_merged
合并,使用外连接以保留最终数据框中的所有行。
英文:
Let us assume that your first dataframe is named df1
and you second one, df2
.
Create a month column for each dataframe:
df1['Month'] = pd.to_datetime(df1['Date']).dt.month
Then group both dataframes by ID
and Month
, taking the last
occurrence in each group, e.g. for df2
:
df2_grouped = df2.groupby(['ID', 'Month']).last()
This yields:
Date Day_Val
ID Month
AAA 6 2021-06-21 15
BBB 6 2021-06-27 55
AAA 7 2021-07-12 8
BBB 7 2021-07-31 10
This allows you to identify the rows for which you want the EoM_value
to be displayed.
You can then merge df1_grouped
and df2_grouped
into df_merged
, which will contain ID
, Day_Val
and EoM_Value
.
Finally, create a EoM_Val
column in df2
, and populate it with NaN values. The only thing left to do is to merge this updated df2
with df_merged
, using an outer merge to retain all rows in the final dataframe.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论