英文:
How to perform a filtering join with fixed date range in R?
问题
我尝试使用dplyr执行一个带有固定日期范围(例如2天)的过滤连接。请参见下面的示例:
library(tibble)
# 创建 tibble1
patient_id <- c("patient1", "patient2")
laterality <- c("L", "R")
date <- as.Date(c("2020-10-24", "2010-09-24"))
tibble1 <- tibble(patient_id, laterality, date)
# 创建 tibble2
patient_id <- c("patient1", "patient2", "patient1", "patient1")
laterality <- c("L", "R", "R", "L")
date <- as.Date(c("2020-10-24", "2010-09-24", "2010-09-18", "2020-10-25"))
type <- c("dark", "light", "dark", "light")
tibble2 <- tibble(patient_id, laterality, date, type)
# 创建输出
patient_id <- c("patient1", "patient2", "patient1")
laterality <- c("L", "R", "L")
date <- as.Date(c("2020-10-24", "2010-09-24", "2020-10-25"))
type <- c("dark", "light", "light")
output <- tibble(patient_id, laterality, date, type)
我想要使用tibble1
过滤tibble2
,并使用固定的日期范围(+/- 2天),应该得到output
。我尝试使用semi_join
,但不确定如何结合日期范围。
semi_join(tibble2, tibble1)
我已经看到其他解决方案,它们具有单独的from
和to
列来定义范围,但希望对所有行使用单一的固定范围。
英文:
I am trying to use dplyr to perform a filtering join with a fixed date range e.g. 2 days. Please see example below:
library(tibble)
#Create tibble1
patient_id <- c("patient1", "patient2")
laterality <- c("L", "R")
date <- as.Date(c("2020-10-24", "2010-09-24"))
tibble1 <- tibble(patient_id, laterality, date)
#Create tibble2
patient_id <- c("patient1", "patient2", "patient1", "patient1")
laterality <- c("L", "R", "R", "L")
date <- as.Date(c("2020-10-24", "2010-09-24", "2010-09-18", "2020-10-25"))
type <- c("dark", "light", "dark", "light")
tibble2 <- tibble(patient_id, laterality, date, type)
#Create output
patient_id <- c("patient1", "patient2", "patient1")
laterality <- c("L", "R", "L")
date <- as.Date(c("2020-10-24", "2010-09-24", "2020-10-25"))
type <- c("dark", "light", "light")
output <- tibble(patient_id, laterality, date, type)
I would like to filter tibble2
using tibble1
, with a fixed date range of +/- 2 days, which should give output
. I have tried using semi_join but not sure how to incorporate the date range.
semi_join(tibble2,tibble1)
I have seen other solution which have separate from
and to
columns to define the range, but was hoping to do it with a single fixed range for all rows.
Thank you!
答案1
得分: 2
使用 mutate 来为每一行创建起始/结束值,然后连接数据。
library(dplyr)
tibble2 %>%
mutate(start = date - 2, end = date + 2, date = NULL) %>%
right_join(tibble1, by = c("patient_id", "laterality")) %>%
filter(date >= start & date <= end) %>%
select(-start, -end)
# patient_id laterality type date
# <chr> <chr> <chr> <date>
# 1 patient1 L dark 2020-10-24
# 2 patient2 R light 2010-09-24
# 3 patient1 L light 2020-10-24
这是给定代码的翻译。
英文:
Use mutate to create the start/end values for each row then join the data
library(dplyr)
tibble2 %>%
mutate(start=date-2, end=date+2, date=NULL) %>%
right_join(tibble1, join_by(patient_id, laterality, between(y$date, x$start, x$end))) %>%
select(-start, -end)
# patient_id laterality type date
# <chr> <chr> <chr> <date>
# 1 patient1 L dark 2020-10-24
# 2 patient2 R light 2010-09-24
# 3 patient1 L light 2020-10-24
答案2
得分: 0
你可以使用dplyr::filter()和dplyr::between()来实现这样的结果:
filter(tibble2, between(date,
min(tibble1$date) - 2,
max(tibble1$date) + 2))
英文:
I presume you want the range to be specified by the max and min dates in tibble1?
You can achieve such result using dplyr::filter() and dplyr::between():
filter(tibble2, between(date,
min(tibble1$date) - 2,
max(tibble1$date) + 2))
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论