如何在R中执行具有固定日期范围的筛选连接?

huangapple go评论56阅读模式
英文:

How to perform a filtering join with fixed date range in R?

问题

我尝试使用dplyr执行一个带有固定日期范围(例如2天)的过滤连接。请参见下面的示例:

library(tibble)

# 创建 tibble1
patient_id <- c("patient1", "patient2")
laterality <- c("L", "R")
date <- as.Date(c("2020-10-24", "2010-09-24"))
tibble1 <- tibble(patient_id, laterality, date)

# 创建 tibble2
patient_id <- c("patient1", "patient2", "patient1", "patient1")
laterality <- c("L", "R", "R", "L")
date <- as.Date(c("2020-10-24", "2010-09-24", "2010-09-18", "2020-10-25"))
type <- c("dark", "light", "dark", "light")
tibble2 <- tibble(patient_id, laterality, date, type)

# 创建输出
patient_id <- c("patient1", "patient2", "patient1")
laterality <- c("L", "R", "L")
date <- as.Date(c("2020-10-24", "2010-09-24", "2020-10-25"))
type <- c("dark", "light", "light")
output <- tibble(patient_id, laterality, date, type)

我想要使用tibble1过滤tibble2,并使用固定的日期范围(+/- 2天),应该得到output。我尝试使用semi_join,但不确定如何结合日期范围。

semi_join(tibble2, tibble1)

我已经看到其他解决方案,它们具有单独的fromto列来定义范围,但希望对所有行使用单一的固定范围。

英文:

I am trying to use dplyr to perform a filtering join with a fixed date range e.g. 2 days. Please see example below:

library(tibble)

#Create tibble1
patient_id &lt;- c(&quot;patient1&quot;, &quot;patient2&quot;)
laterality &lt;- c(&quot;L&quot;, &quot;R&quot;)
date &lt;- as.Date(c(&quot;2020-10-24&quot;, &quot;2010-09-24&quot;))
tibble1 &lt;- tibble(patient_id, laterality, date)

#Create tibble2
patient_id &lt;- c(&quot;patient1&quot;, &quot;patient2&quot;, &quot;patient1&quot;, &quot;patient1&quot;)
laterality &lt;- c(&quot;L&quot;, &quot;R&quot;, &quot;R&quot;, &quot;L&quot;)
date &lt;- as.Date(c(&quot;2020-10-24&quot;, &quot;2010-09-24&quot;, &quot;2010-09-18&quot;, &quot;2020-10-25&quot;))
type &lt;- c(&quot;dark&quot;, &quot;light&quot;, &quot;dark&quot;, &quot;light&quot;)
tibble2 &lt;- tibble(patient_id, laterality, date, type)

#Create output
patient_id &lt;- c(&quot;patient1&quot;, &quot;patient2&quot;, &quot;patient1&quot;)
laterality &lt;- c(&quot;L&quot;, &quot;R&quot;, &quot;L&quot;)
date &lt;- as.Date(c(&quot;2020-10-24&quot;, &quot;2010-09-24&quot;, &quot;2020-10-25&quot;))
type &lt;- c(&quot;dark&quot;, &quot;light&quot;, &quot;light&quot;)
output &lt;- tibble(patient_id, laterality, date, type)

I would like to filter tibble2 using tibble1, with a fixed date range of +/- 2 days, which should give output. I have tried using semi_join but not sure how to incorporate the date range.

semi_join(tibble2,tibble1)

I have seen other solution which have separate from and to columns to define the range, but was hoping to do it with a single fixed range for all rows.

Thank you!

答案1

得分: 2

使用 mutate 来为每一行创建起始/结束值,然后连接数据。

library(dplyr)
tibble2 %>%
  mutate(start = date - 2, end = date + 2, date = NULL) %>%
  right_join(tibble1, by = c("patient_id", "laterality")) %>%
  filter(date >= start & date <= end) %>%
  select(-start, -end)
#   patient_id laterality type  date      
#   <chr>      <chr>      <chr> <date>    
# 1 patient1   L          dark  2020-10-24
# 2 patient2   R          light 2010-09-24
# 3 patient1   L          light 2020-10-24

这是给定代码的翻译。

英文:

Use mutate to create the start/end values for each row then join the data

library(dplyr)
tibble2 %&gt;% 
  mutate(start=date-2, end=date+2, date=NULL) %&gt;% 
  right_join(tibble1, join_by(patient_id, laterality, between(y$date, x$start, x$end))) %&gt;% 
  select(-start, -end)
#   patient_id laterality type  date      
#   &lt;chr&gt;      &lt;chr&gt;      &lt;chr&gt; &lt;date&gt;    
# 1 patient1   L          dark  2020-10-24
# 2 patient2   R          light 2010-09-24
# 3 patient1   L          light 2020-10-24

答案2

得分: 0

你可以使用dplyr::filter()和dplyr::between()来实现这样的结果:

filter(tibble2, between(date,
                        min(tibble1$date) - 2, 
                        max(tibble1$date) + 2))
英文:

I presume you want the range to be specified by the max and min dates in tibble1?

You can achieve such result using dplyr::filter() and dplyr::between():

filter(tibble2, between(date,
                        min(tibble1$date) - 2, 
                        max(tibble1$date) + 2))

huangapple
  • 本文由 发表于 2023年5月17日 21:08:32
  • 转载请务必保留本文链接:https://go.coder-hub.com/76272470.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定