可以通过一个ID列和最近的日期值来合并两个数据框在R中。

huangapple go评论65阅读模式
英文:

Can I joint 2 dataframes in R by an ID column and a nearest date value

问题

我尝试将两个数据框中的共同ID值与它们最接近的日期相匹配在R中。我有一个包含人口数据的列表,另一个包含已经采集的血液样本。不幸的是,两次访问的日期并不完全匹配。因此,我希望通过日期将每个ID的最接近的条目配对。

以下代码代表了我想要做的事情。我希望按照共同的ID值和每个条目的最接近匹配日期合并df1和df2的所有列。在这种情况下,例如ID编号1将有2个条目。

library(lubridate)

df1 <- data.frame(ID = c(1, 1, 3, 3, 5),
                  var2 = c("Dog", "Dog", "Cat", "Cat", "Hamster"),
                  Date = c("01-01-2022", "02-01-2022", "07-12-2022", "03-02-2023", "05-12-2022"))
df1$Date <- dmy(df1$Date)

df2 <- data.frame(ID = c(1, 1, 3, 3, 5),
                  Date = c("07-01-2022", "04-12-2022", "10-12-2022", "04-02-2023", "15-12-2022"))
df2$Date <- dmy(df2$Date)

请注意,我已将日期从字符格式转换为日期格式以便进行日期匹配。

英文:

I am trying to join common ID values in 2 dataframes to their most closely matched date within R. I have a list with demographic data and another with blood samples that have been collected. Unfortunately the dates of both visits do not match exactly. Therefore i would like to pair up the most closely occuring entries for each ID by date.

The following code represents what I would like to do. I want to merge all columns from df1 and df2 by common ID values and the most closely matching date per entry. ID no 1 in this case would have 2 entries for example.

library(lubridate)


df1 &lt;- data.frame(ID = c(1, 1, 3, 3, 5),
              var2 = c(&quot;Dog&quot;, &quot;Dog&quot;, &quot;Cat&quot;, &quot;Cat&quot;, &quot;Hamster&quot;),
                  Date = c(&quot;01-01-2022&quot;,&quot;02-01-2022&quot; , &quot;07-12-2022&quot;, &quot;03-02-2023&quot;, &quot;05-12-2022&quot;))
df1$Date &lt;- dmy(df1$Date)


df2 &lt;- data.frame(ID = c(1, 1, 3, 3, 5),
              Date = c(&quot;07-01-2022&quot;,&quot;04-12-2022&quot; , &quot;10-12-2022&quot;, &quot;04-02-2023&quot;, &quot;15-12-2022&quot;))

df2$Date &lt;- dmy(df2$Date)

答案1

得分: 2

你可以合并,然后计算差异并保留最接近的部分:

library(dplyr)
df1 %>% 
  inner_join(df2, by = "ID", multiple = "all") %>% 
  slice_min(abs(Date.x - Date.y), n = 1, by = c(ID, Date.x))

  ID    var2     Date.x     Date.y
1  1     Dog 2022-01-01 2022-01-07
2  1     Dog 2022-01-02 2022-01-07
3  3     Cat 2022-12-07 2022-12-10
4  3     Cat 2023-02-03 2023-02-04
5  5 Hamster 2022-12-05 2022-12-15

请注意,join_by 在这里并不能真正帮助,因为closest 不接受 ==

英文:

You can merge, then compute the difference and keep the closest:

library(dplyr)
df1 %&gt;% 
  inner_join(df2, by = &quot;ID&quot;, multiple = &quot;all&quot;) %&gt;% 
  slice_min(abs(Date.x - Date.y), n = 1, by = c(ID, Date.x))

  ID    var2     Date.x     Date.y
1  1     Dog 2022-01-01 2022-01-07
2  1     Dog 2022-01-02 2022-01-07
3  3     Cat 2022-12-07 2022-12-10
4  3     Cat 2023-02-03 2023-02-04
5  5 Hamster 2022-12-05 2022-12-15

Note that join_by cannot really help here since closest does not accept ==.

huangapple
  • 本文由 发表于 2023年2月6日 17:41:14
  • 转载请务必保留本文链接:https://go.coder-hub.com/75359588.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定