2023年5月17日 21:08:32go评论82阅读模式

英文:

How to perform a filtering join with fixed date range in R?

问题

我尝试使用dplyr执行一个带有固定日期范围（例如2天）的过滤连接。请参见下面的示例：

library(tibble)
# 创建 tibble1
patient_id <- c("patient1", "patient2")
laterality <- c("L", "R")
date <- as.Date(c("2020-10-24", "2010-09-24"))
tibble1 <- tibble(patient_id, laterality, date)
# 创建 tibble2
patient_id <- c("patient1", "patient2", "patient1", "patient1")
laterality <- c("L", "R", "R", "L")
date <- as.Date(c("2020-10-24", "2010-09-24", "2010-09-18", "2020-10-25"))
type <- c("dark", "light", "dark", "light")
tibble2 <- tibble(patient_id, laterality, date, type)
# 创建输出
patient_id <- c("patient1", "patient2", "patient1")
laterality <- c("L", "R", "L")
date <- as.Date(c("2020-10-24", "2010-09-24", "2020-10-25"))
type <- c("dark", "light", "light")
output <- tibble(patient_id, laterality, date, type)

我想要使用tibble1过滤tibble2，并使用固定的日期范围（+/- 2天），应该得到output。我尝试使用semi_join，但不确定如何结合日期范围。

semi_join(tibble2, tibble1)

我已经看到其他解决方案，它们具有单独的from和to列来定义范围，但希望对所有行使用单一的固定范围。

英文:

I am trying to use dplyr to perform a filtering join with a fixed date range e.g. 2 days. Please see example below:

library(tibble)
#Create tibble1
patient_id &lt;- c(&quot;patient1&quot;, &quot;patient2&quot;)
laterality &lt;- c(&quot;L&quot;, &quot;R&quot;)
date &lt;- as.Date(c(&quot;2020-10-24&quot;, &quot;2010-09-24&quot;))
tibble1 &lt;- tibble(patient_id, laterality, date)
#Create tibble2
patient_id &lt;- c(&quot;patient1&quot;, &quot;patient2&quot;, &quot;patient1&quot;, &quot;patient1&quot;)
laterality &lt;- c(&quot;L&quot;, &quot;R&quot;, &quot;R&quot;, &quot;L&quot;)
date &lt;- as.Date(c(&quot;2020-10-24&quot;, &quot;2010-09-24&quot;, &quot;2010-09-18&quot;, &quot;2020-10-25&quot;))
type &lt;- c(&quot;dark&quot;, &quot;light&quot;, &quot;dark&quot;, &quot;light&quot;)
tibble2 &lt;- tibble(patient_id, laterality, date, type)
#Create output
patient_id &lt;- c(&quot;patient1&quot;, &quot;patient2&quot;, &quot;patient1&quot;)
laterality &lt;- c(&quot;L&quot;, &quot;R&quot;, &quot;L&quot;)
date &lt;- as.Date(c(&quot;2020-10-24&quot;, &quot;2010-09-24&quot;, &quot;2020-10-25&quot;))
type &lt;- c(&quot;dark&quot;, &quot;light&quot;, &quot;light&quot;)
output &lt;- tibble(patient_id, laterality, date, type)

I would like to filter tibble2 using tibble1, with a fixed date range of +/- 2 days, which should give output. I have tried using semi_join but not sure how to incorporate the date range.

semi_join(tibble2,tibble1)

I have seen other solution which have separate from and to columns to define the range, but was hoping to do it with a single fixed range for all rows.

Thank you!

答案1

得分: 2

使用 mutate 来为每一行创建起始/结束值，然后连接数据。

library(dplyr)
tibble2 %>%
  mutate(start = date - 2, end = date + 2, date = NULL) %>%
  right_join(tibble1, by = c("patient_id", "laterality")) %>%
  filter(date >= start & date <= end) %>%
  select(-start, -end)
#   patient_id laterality type  date      
#   <chr>      <chr>      <chr> <date>    
# 1 patient1   L          dark  2020-10-24
# 2 patient2   R          light 2010-09-24
# 3 patient1   L          light 2020-10-24

这是给定代码的翻译。

英文:

Use mutate to create the start/end values for each row then join the data

library(dplyr)
tibble2 %&gt;% 
  mutate(start=date-2, end=date+2, date=NULL) %&gt;% 
  right_join(tibble1, join_by(patient_id, laterality, between(y$date, x$start, x$end))) %&gt;% 
  select(-start, -end)
#   patient_id laterality type  date      
#   &lt;chr&gt;      &lt;chr&gt;      &lt;chr&gt; &lt;date&gt;    
# 1 patient1   L          dark  2020-10-24
# 2 patient2   R          light 2010-09-24
# 3 patient1   L          light 2020-10-24

答案2

得分: 0

你可以使用dplyr::filter()和dplyr::between()来实现这样的结果：

filter(tibble2, between(date,
                        min(tibble1$date) - 2, 
                        max(tibble1$date) + 2))

英文:

I presume you want the range to be specified by the max and min dates in tibble1?

You can achieve such result using dplyr::filter() and dplyr::between():

filter(tibble2, between(date,
                        min(tibble1$date) - 2, 
                        max(tibble1$date) + 2))

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何在R中执行具有固定日期范围的筛选连接？

问题

答案1

答案2

避免在左连接中出现重复列，使用sqldf。

r RVEST爬取URL相关数据不再工作

在R中删除字符串中的特定数据

使用ggplot2的组合图例时，其中一个组不显示。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。