可以通过一个ID列和最近的日期值来合并两个数据框在R中。

huangapple go评论87阅读模式
英文:

Can I joint 2 dataframes in R by an ID column and a nearest date value

问题

我尝试将两个数据框中的共同ID值与它们最接近的日期相匹配在R中。我有一个包含人口数据的列表,另一个包含已经采集的血液样本。不幸的是,两次访问的日期并不完全匹配。因此,我希望通过日期将每个ID的最接近的条目配对。

以下代码代表了我想要做的事情。我希望按照共同的ID值和每个条目的最接近匹配日期合并df1和df2的所有列。在这种情况下,例如ID编号1将有2个条目。

  1. library(lubridate)
  2. df1 <- data.frame(ID = c(1, 1, 3, 3, 5),
  3. var2 = c("Dog", "Dog", "Cat", "Cat", "Hamster"),
  4. Date = c("01-01-2022", "02-01-2022", "07-12-2022", "03-02-2023", "05-12-2022"))
  5. df1$Date <- dmy(df1$Date)
  6. df2 <- data.frame(ID = c(1, 1, 3, 3, 5),
  7. Date = c("07-01-2022", "04-12-2022", "10-12-2022", "04-02-2023", "15-12-2022"))
  8. df2$Date <- dmy(df2$Date)

请注意,我已将日期从字符格式转换为日期格式以便进行日期匹配。

英文:

I am trying to join common ID values in 2 dataframes to their most closely matched date within R. I have a list with demographic data and another with blood samples that have been collected. Unfortunately the dates of both visits do not match exactly. Therefore i would like to pair up the most closely occuring entries for each ID by date.

The following code represents what I would like to do. I want to merge all columns from df1 and df2 by common ID values and the most closely matching date per entry. ID no 1 in this case would have 2 entries for example.

  1. library(lubridate)
  2. df1 &lt;- data.frame(ID = c(1, 1, 3, 3, 5),
  3. var2 = c(&quot;Dog&quot;, &quot;Dog&quot;, &quot;Cat&quot;, &quot;Cat&quot;, &quot;Hamster&quot;),
  4. Date = c(&quot;01-01-2022&quot;,&quot;02-01-2022&quot; , &quot;07-12-2022&quot;, &quot;03-02-2023&quot;, &quot;05-12-2022&quot;))
  5. df1$Date &lt;- dmy(df1$Date)
  6. df2 &lt;- data.frame(ID = c(1, 1, 3, 3, 5),
  7. Date = c(&quot;07-01-2022&quot;,&quot;04-12-2022&quot; , &quot;10-12-2022&quot;, &quot;04-02-2023&quot;, &quot;15-12-2022&quot;))
  8. df2$Date &lt;- dmy(df2$Date)

答案1

得分: 2

你可以合并,然后计算差异并保留最接近的部分:

  1. library(dplyr)
  2. df1 %>%
  3. inner_join(df2, by = "ID", multiple = "all") %>%
  4. slice_min(abs(Date.x - Date.y), n = 1, by = c(ID, Date.x))
  5. ID var2 Date.x Date.y
  6. 1 1 Dog 2022-01-01 2022-01-07
  7. 2 1 Dog 2022-01-02 2022-01-07
  8. 3 3 Cat 2022-12-07 2022-12-10
  9. 4 3 Cat 2023-02-03 2023-02-04
  10. 5 5 Hamster 2022-12-05 2022-12-15

请注意,join_by 在这里并不能真正帮助,因为closest 不接受 ==

英文:

You can merge, then compute the difference and keep the closest:

  1. library(dplyr)
  2. df1 %&gt;%
  3. inner_join(df2, by = &quot;ID&quot;, multiple = &quot;all&quot;) %&gt;%
  4. slice_min(abs(Date.x - Date.y), n = 1, by = c(ID, Date.x))
  5. ID var2 Date.x Date.y
  6. 1 1 Dog 2022-01-01 2022-01-07
  7. 2 1 Dog 2022-01-02 2022-01-07
  8. 3 3 Cat 2022-12-07 2022-12-10
  9. 4 3 Cat 2023-02-03 2023-02-04
  10. 5 5 Hamster 2022-12-05 2022-12-15

Note that join_by cannot really help here since closest does not accept ==.

huangapple
  • 本文由 发表于 2023年2月6日 17:41:14
  • 转载请务必保留本文链接:https://go.coder-hub.com/75359588.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定