英文:
Merging two dataframes based on last date
问题
我有两个数据框,我想根据ID和日期合并它们。
第一个数据框如下所示:
   ID      Date       EoM_Val
---------------------------------------
  AAA   2021-06-30    1417744
  BBB   2021-06-30    3946750
  AAA   2021-07-31    2792182
  BBB   2021-07-31    81073822
而第二个数据框类似于这样:
   ID      Date       Day_Val
---------------------------------------
  AAA   2021-06-05    14
  AAA   2021-06-12    11
  AAA   2021-06-21    15
  BBB   2021-06-06    33
  BBB   2021-06-18    35
  BBB   2021-06-27    55
  AAA   2021-07-08    6
  AAA   2021-07-12    8
  BBB   2021-07-15    9
  BBB   2021-07-31    10
(还请注意Date列是String类型)
我想要做的是将这两个数据框合并在一起,以便每个月和每个ID的Date列的最后一个值将具有EoM_Val,以便最终的合并看起来像这样:
   ID      Date       Day_Val    EoM_Val
----------------------------------------------
  AAA   2021-06-05    14
  AAA   2021-06-12    11
  AAA   2021-06-21    15         1417744
  BBB   2021-06-06    33
  BBB   2021-06-18    35
  BBB   2021-06-27    55         3946750
  AAA   2021-07-08    6
  AAA   2021-07-12    8          2792182
  BBB   2021-07-15    9
  BBB   2021-07-31    10         81073822
不幸的是,我在这方面遇到了相当大的困难,所以如果有人可以帮助我,我将非常感激。谢谢!
英文:
I have two dataframes which I would like to merge together based on the ids and the dates
The first dataframe looks like this:
   ID      Date       EoM_Val
---------------------------------------
  AAA   2021-06-30    1417744
  BBB   2021-06-30    3946750
  AAA   2021-07-31    2792182
  BBB   2021-07-31    81073822
While the second dataframe looks similar to this:
   ID      Date       Day_Val
---------------------------------------
  AAA   2021-06-05    14
  AAA   2021-06-12    11
  AAA   2021-06-21    15
  BBB   2021-06-06    33
  BBB   2021-06-18    35
  BBB   2021-06-27    55
  AAA   2021-07-08    6
  AAA   2021-07-12    8
  BBB   2021-07-15    9
  BBB   2021-07-31    10
(Note too that the Date column are String types)
What I would like to do is merge the two dataframes together such that the last value in the Date column for each month and for each ID will have the EoM_Val, so that the final merge looks like this:
   ID      Date       Day_Val    EoM_Val
----------------------------------------------
  AAA   2021-06-05    14
  AAA   2021-06-12    11
  AAA   2021-06-21    15         1417744
  BBB   2021-06-06    33
  BBB   2021-06-18    35
  BBB   2021-06-27    55         3946750
  AAA   2021-07-08    6
  AAA   2021-07-12    8          2792182
  BBB   2021-07-15    9
  BBB   2021-07-31    10         81073822
Unfortunately, I'm having quite a bit of difficulty with it, so if anyone could help I would greatly appreciate it. Thanks!
答案1
得分: 1
假设你的第一个数据框命名为 df1,第二个数据框命名为 df2。
为每个数据框创建一个月份列:
df1['月份'] = pd.to_datetime(df1['日期']).dt.month
然后,将两个数据框按照 ID 和 月份 分组,取每组中的 最后一次 出现的值,例如对于 df2:
df2_分组 = df2.groupby(['ID', '月份']).last()
这将得到:
日期      	Day_Val
ID 	    月份		
AAA 	6 	2021-06-21    15
BBB 	6 	2021-06-27    55
AAA     7   2021-07-12    8
BBB     7   2021-07-31    10
这样你可以识别出需要显示 EoM_Value 的行。
然后,你可以将 df1_分组 和 df2_分组 合并成一个名为 df_merged 的数据框,其中包含 ID,Day_Val 和 EoM_Value。
最后,在 df2 中创建一个名为 EoM_Val 的列,并用 NaN 值填充。唯一需要做的是将这个更新后的 df2 与 df_merged 合并,使用外连接以保留最终数据框中的所有行。
英文:
Let us assume that your first dataframe is named df1 and you second one, df2.
Create a month column for each dataframe:
df1['Month'] = pd.to_datetime(df1['Date']).dt.month
Then group both dataframes by ID and Month, taking the last occurrence in each group, e.g. for df2:
df2_grouped = df2.groupby(['ID', 'Month']).last()
This yields:
Date 	Day_Val
ID 	Month 		
AAA 	6 	2021-06-21    15
BBB 	6 	2021-06-27    55
AAA     7   2021-07-12    8
BBB     7   2021-07-31    10
This allows you to identify the rows for which you want the EoM_value to be displayed.
You can then merge df1_grouped and df2_grouped into df_merged, which will contain ID, Day_Val and EoM_Value.
Finally, create a EoM_Val column in df2, and populate it with NaN values. The only thing left to do is to merge this updated df2 with df_merged, using an outer merge to retain all rows in the final dataframe.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。


评论