2023年2月6日 17:41:14go评论87阅读模式

英文:

Can I joint 2 dataframes in R by an ID column and a nearest date value

问题

我尝试将两个数据框中的共同ID值与它们最接近的日期相匹配在R中。我有一个包含人口数据的列表，另一个包含已经采集的血液样本。不幸的是，两次访问的日期并不完全匹配。因此，我希望通过日期将每个ID的最接近的条目配对。

以下代码代表了我想要做的事情。我希望按照共同的ID值和每个条目的最接近匹配日期合并df1和df2的所有列。在这种情况下，例如ID编号1将有2个条目。

library(lubridate)
df1 <- data.frame(ID = c(1, 1, 3, 3, 5),
                  var2 = c("Dog", "Dog", "Cat", "Cat", "Hamster"),
                  Date = c("01-01-2022", "02-01-2022", "07-12-2022", "03-02-2023", "05-12-2022"))
df1$Date <- dmy(df1$Date)
df2 <- data.frame(ID = c(1, 1, 3, 3, 5),
                  Date = c("07-01-2022", "04-12-2022", "10-12-2022", "04-02-2023", "15-12-2022"))
df2$Date <- dmy(df2$Date)

请注意，我已将日期从字符格式转换为日期格式以便进行日期匹配。

英文:

I am trying to join common ID values in 2 dataframes to their most closely matched date within R. I have a list with demographic data and another with blood samples that have been collected. Unfortunately the dates of both visits do not match exactly. Therefore i would like to pair up the most closely occuring entries for each ID by date.

The following code represents what I would like to do. I want to merge all columns from df1 and df2 by common ID values and the most closely matching date per entry. ID no 1 in this case would have 2 entries for example.

library(lubridate)
df1 &lt;- data.frame(ID = c(1, 1, 3, 3, 5),
              var2 = c(&quot;Dog&quot;, &quot;Dog&quot;, &quot;Cat&quot;, &quot;Cat&quot;, &quot;Hamster&quot;),
                  Date = c(&quot;01-01-2022&quot;,&quot;02-01-2022&quot; , &quot;07-12-2022&quot;, &quot;03-02-2023&quot;, &quot;05-12-2022&quot;))
df1$Date &lt;- dmy(df1$Date)
df2 &lt;- data.frame(ID = c(1, 1, 3, 3, 5),
              Date = c(&quot;07-01-2022&quot;,&quot;04-12-2022&quot; , &quot;10-12-2022&quot;, &quot;04-02-2023&quot;, &quot;15-12-2022&quot;))
df2$Date &lt;- dmy(df2$Date)

答案1

得分: 2

你可以合并，然后计算差异并保留最接近的部分：

library(dplyr)
df1 %>% 
  inner_join(df2, by = "ID", multiple = "all") %>% 
  slice_min(abs(Date.x - Date.y), n = 1, by = c(ID, Date.x))
  ID    var2     Date.x     Date.y
1  1     Dog 2022-01-01 2022-01-07
2  1     Dog 2022-01-02 2022-01-07
3  3     Cat 2022-12-07 2022-12-10
4  3     Cat 2023-02-03 2023-02-04
5  5 Hamster 2022-12-05 2022-12-15

请注意，join_by 在这里并不能真正帮助，因为closest 不接受 ==。

英文:

You can merge, then compute the difference and keep the closest:

library(dplyr)
df1 %&gt;% 
  inner_join(df2, by = &quot;ID&quot;, multiple = &quot;all&quot;) %&gt;% 
  slice_min(abs(Date.x - Date.y), n = 1, by = c(ID, Date.x))
  ID    var2     Date.x     Date.y
1  1     Dog 2022-01-01 2022-01-07
2  1     Dog 2022-01-02 2022-01-07
3  3     Cat 2022-12-07 2022-12-10
4  3     Cat 2023-02-03 2023-02-04
5  5 Hamster 2022-12-05 2022-12-15

Note that join_by cannot really help here since closest does not accept ==.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

可以通过一个ID列和最近的日期值来合并两个数据框在R中。

问题

答案1

在 for 循环中重命名多个列表项。

将一个列中的枢轴ID从一个变为多个，并将它们与另一个列中的字符配对

数据框中的新列不保留 POSIXct 类。

根据其他行的值更改一行中后续条目的不同数量。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。