英文:
Can I joint 2 dataframes in R by an ID column and a nearest date value
问题
我尝试将两个数据框中的共同ID值与它们最接近的日期相匹配在R中。我有一个包含人口数据的列表,另一个包含已经采集的血液样本。不幸的是,两次访问的日期并不完全匹配。因此,我希望通过日期将每个ID的最接近的条目配对。
以下代码代表了我想要做的事情。我希望按照共同的ID值和每个条目的最接近匹配日期合并df1和df2的所有列。在这种情况下,例如ID编号1将有2个条目。
library(lubridate)
df1 <- data.frame(ID = c(1, 1, 3, 3, 5),
var2 = c("Dog", "Dog", "Cat", "Cat", "Hamster"),
Date = c("01-01-2022", "02-01-2022", "07-12-2022", "03-02-2023", "05-12-2022"))
df1$Date <- dmy(df1$Date)
df2 <- data.frame(ID = c(1, 1, 3, 3, 5),
Date = c("07-01-2022", "04-12-2022", "10-12-2022", "04-02-2023", "15-12-2022"))
df2$Date <- dmy(df2$Date)
请注意,我已将日期从字符格式转换为日期格式以便进行日期匹配。
英文:
I am trying to join common ID values in 2 dataframes to their most closely matched date within R. I have a list with demographic data and another with blood samples that have been collected. Unfortunately the dates of both visits do not match exactly. Therefore i would like to pair up the most closely occuring entries for each ID by date.
The following code represents what I would like to do. I want to merge all columns from df1 and df2 by common ID values and the most closely matching date per entry. ID no 1 in this case would have 2 entries for example.
library(lubridate)
df1 <- data.frame(ID = c(1, 1, 3, 3, 5),
var2 = c("Dog", "Dog", "Cat", "Cat", "Hamster"),
Date = c("01-01-2022","02-01-2022" , "07-12-2022", "03-02-2023", "05-12-2022"))
df1$Date <- dmy(df1$Date)
df2 <- data.frame(ID = c(1, 1, 3, 3, 5),
Date = c("07-01-2022","04-12-2022" , "10-12-2022", "04-02-2023", "15-12-2022"))
df2$Date <- dmy(df2$Date)
答案1
得分: 2
你可以合并,然后计算差异并保留最接近的部分:
library(dplyr)
df1 %>%
inner_join(df2, by = "ID", multiple = "all") %>%
slice_min(abs(Date.x - Date.y), n = 1, by = c(ID, Date.x))
ID var2 Date.x Date.y
1 1 Dog 2022-01-01 2022-01-07
2 1 Dog 2022-01-02 2022-01-07
3 3 Cat 2022-12-07 2022-12-10
4 3 Cat 2023-02-03 2023-02-04
5 5 Hamster 2022-12-05 2022-12-15
请注意,join_by
在这里并不能真正帮助,因为closest
不接受 ==
。
英文:
You can merge, then compute the diff
erence and keep the closest:
library(dplyr)
df1 %>%
inner_join(df2, by = "ID", multiple = "all") %>%
slice_min(abs(Date.x - Date.y), n = 1, by = c(ID, Date.x))
ID var2 Date.x Date.y
1 1 Dog 2022-01-01 2022-01-07
2 1 Dog 2022-01-02 2022-01-07
3 3 Cat 2022-12-07 2022-12-10
4 3 Cat 2023-02-03 2023-02-04
5 5 Hamster 2022-12-05 2022-12-15
Note that join_by
cannot really help here since closest
does not accept ==
.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论