英文:
How to mutate a complex variable involving dates?
问题
我有一个tibble,其中每一行代表一个眼睛的图像,并包含以下相关变量:patientId
,laterality
(左眼或右眼),date
,imageId
。
我想对此进行操作,创建另一个tibble,显示每只眼睛(patientId,laterality)的followUpYears
数量。followUpYears
的定义方式有些不同:
- 为了满足特定年份的随访要求,在那一年内必须有两个不同的成像日期,即在年份1的0-365天之间,年份2的366-730天之间等。第一个图像日期始终是基线,
followUpYears
始终是整数。 - 每天只考虑一个图像。
- 只要在一年内没有满足两个图像日期的要求,随访就会停止,即如果第一年只有一个图像日期,
followUpYears
就是0,不管随后拍摄了多少图像。 - 对于眼睛的
followUpYears
,第一个和最后一个图像日期之间没有至少n年的要求。
以下是演示这些观点的虚拟数据:
data <- tibble(patientId = c('A','A','A','A','A','A','B','B','B','B','B','B','B'),
laterality = c('L','L','L','L','L','L','R','R','R','R','L','L','L'),
date = as.Date(c('2000-05-05','2000-05-05','2001-05-06','2001-05-07','2002-05-06','2002-05-07','2000-09-08','2001-09-07','2001-09-09','2001-09-10','2000-09-08','2001-09-07','2001-09-10')),
imageId = 1:13)
expected_output <- tibble(patientId = c('A','B','B'),
laterality = c('L','R','L'),
followUpYears = c(0, 2, 1))
患者A的左眼由于第2和第3点的原因,followUpYears
为0。患者B的右眼由于第4点的原因有2个followUpYears
(尽管第一个和最后一个图像日期之间只有略微超过1年)。患者B的左眼只有1年的随访,因为它不符合第2年有2个图像日期的要求。
我熟悉基本的dplyr动词,但是我不知道如何处理这种类型的变量。请注意,患者可能会包括一个或两只眼睛,并且有些人可能有10多年的随访。最后,考虑1年为365天的解决方案是可以的。
谢谢!
英文:
I have a tibble in which each row represents an image of an eye and contains the following relevant variables: patientId
, laterality
(left or right), date
, imageId
.
I would like to manipulate this to create another tibble showing the number of followUpYears
for each eye (patientId, laterality). followUpYears
is defined in a somewhat unusual way:
- In order to meet the requirements for follow-up in a particular year, there must be two different imaging dates during that year i.e. between days 0-365 for year 1, days 366-730 for year 2 etc. The first image date is always the baseline and
followUpYears
is always an integer. - Only one image per date is considered.
- Follow-up ceases as soon as the requirement for 2 imaging dates in a year is not met i.e. if there is only 1 imaging date in the first year,
followUpYears
is 0 regardless of how many images are taken subsequently. - There is no requirement for there to be at least n years between the first and last image date for an eye to have n
followUpYears
.
The following dummy data demonstrates these points:
data <- tibble(patientId = c('A','A','A','A','A','A','B','B','B','B','B','B','B'),
laterality = c('L','L','L','L','L','L','R','R','R','R','L','L','L'),
date = as.Date(c('2000-05-05','2000-05-05','2001-05-06','2001-05-07','2002-05-06','2002-05-07','2000-09-08','2001-09-07','2001-09-09','2001-09-10','2000-09-08','2001-09-07','2001-09-10')),
imageId = 1:13)
expected_output <- tibble(patientId = c('A','B','B'),
laterality = c('L','R','L'),
followUpYears = c(0, 2, 1))
Patient A's left eye has 0 followUpYears
because of points 2 and 3. Patient B's right eye has 2 followUpYears
because of point 4 (despite the fact that there is only slightly more than 1 year between the first and last image date). Patient B's left eye only has 1 year of follow up since it doesn't meet the requirement for 2 image dates in year 2.
I am familiar with the basic dplyr verbs but I can't think of how to frame this type of variable. Note that patients might have one or both eyes included and some might have 10+ years of follow up. Finally, a solution that considers 1 year to be 365 days regardless of leap years is fine.
Thank you!
答案1
得分: 2
这里是使用ifelse
的一种方法。diff_year
是一个辅助函数,用于计算两个日期之间的年份差,四舍五入到最接近的整数值。
library(dplyr)
diff_year <- function(date1, date2) ceiling(as.numeric(difftime(date1, date2)) / 365)
data %>%
group_by(patientId) %>%
summarise(followUpYears = ifelse(diff_year(date[date != first(date)][1], first(date)) <= 1,
diff_year(max(date), min(date)), 0))
# 一个 tibble: 2 × 2
# patientId followUpYears
# <chr> <dbl>
#1 A 0
#2 B 2
根据OP的评论更新。这应该适用于所有条件:
diff_year <- function(date1, date2) as.numeric((date1 - date2) / 365)
data %>%
distinct(patientId, laterality, date, .keep_all = TRUE) %>%
group_by(patientId, laterality) %>%
mutate(diffYear = floor(diff_year(date, min(date))) %>%
add_count(count = diffYear) %>%
filter(!cumany(lag(n == 1, default = 0)) | row_number() == 1) %>%
summarise(followUpYears = ifelse(any(n > 1), ceiling(diff_year(max(date[n != 1]), min(date))), 0))
# patientId laterality followUpYears
#1 A L 0
#2 B L 1
#3 B R 2
英文:
Here's a way with ifelse
. diff_year
is a helper function that computes the difference between two dates in year rounded to the value above.
library(dplyr)
diff_year <- function(date1, date2) ceiling(as.numeric(difftime(date1, date2)) / 365)
data %>%
group_by(patientId) %>%
summarise(followUpYears = ifelse(diff_year(date[date != first(date)][1], first(date)) <= 1,
diff_year(max(date), min(date)), 0))
#A tibble: 2 × 2
# patientId followUpYears
# <chr> <dbl>
#1 A 0
#2 B 2
Update with OP's comment. This should work with all conditions:
diff_year <- function(date1, date2) as.numeric((date1 - date2) / 365)
data %>%
distinct(patientId, laterality, date, .keep_all = TRUE) %>%
group_by(patientId, laterality) %>%
mutate(diffYear = floor(diff_year(date, min(date)))) %>%
add_count(count = diffYear) %>%
filter(!cumany(lag(n == 1, default = 0)) | row_number() == 1) %>%
summarise(followUpYears = ifelse(any(n > 1), ceiling(diff_year(max(date[n != 1]), min(date))), 0))
# patientId laterality followUpYears
#1 A L 0
#2 B L 1
#3 B R 2
答案2
得分: 0
以下是我的方法,应该涵盖所有四个条件,但我不确定你是如何得出以下结果的:
#> # A tibble: 1 x 3
#> patientId laterality followUpYears
#> <chr> <chr> <dbl>
#> 1 B L 1
因为根据你的逻辑,它应该落入从2000-09-08
到2001-09-10
的两年区间,而367天等于两年。
我们的想法是首先计算一个followup_flag
,检查日期是否在前一日期的365天内,然后使用cummin()
,这样系列就会在没有直接后续年份时中断。
然后,我们可以筛选出满足followup_flag == 1
的所有行。
对于这个数据集,我们检查第一个日期和最后一个日期之间有多少年,因为我们想将367天计为2年,所以我们要使用ceiling()
。
library(dplyr)
library(lubridate)
data %>%
group_by(patientId, laterality) %>%
mutate(followup_flag = cummin(date - dplyr::lag(date, default = first(date)) <= 365)) %>%
filter(followup_flag == 1) %>%
summarise(followUpYears = as.numeric(
difftime(last(date), first(date), units = "days") / 365) %>%
ceiling()
)
#> `summarise()`已经按'patientId'分组输出。您可以使用`.groups`参数进行覆盖。
#> # A tibble: 3 x 3
#> # Groups: patientId [2]
#> patientId laterality followUpYears
#> <chr> <chr> <dbl>
#> 1 A L 0
#> 2 B L 2
#> 3 B R 2
使用的数据:
data <- tibble(patientId = c('A','A','A','A','A','A','B','B','B','B','B','B','B'),
laterality = c('L','L','L','L','L','L','R','R','R','R','L','L','L'),
date = as.Date(c('2000-05-05','2000-05-05','2001-05-06','2001-05-07','2002-05-06','2002-05-07','2000-09-08','2001-09-07','2001-09-09','2001-09-10','2000-09-08','2001-09-07','2001-09-10')),
imageId = 1:13)
由reprex包(v2.0.1)于2023-02-08创建
英文:
Below is my approach which should cover all four conditions, I'm not sure however, how you get:
#> # A tibble: 1 x 3
#> patientId laterality followUpYears
#> <chr> <chr> <dbl>
#> 1 B L 1
since according to your logic it should fall into the two year band from 2000-09-08
to 2001-09-10
are 367 days which equals two years.
The idea is that we first calculate a followup_flag
which checks if the date is within 365 days of the former date, and then takes the cummin()
so that the series breaks as soon there is no direct follow up year.
Then we can filter all rows which meet the followup_flag == 1
.
And for this data set we check how many years are between the first and the last date, and since we want to count 367 as 2 years we have to take the ceiling()
.
library(dplyr)
library(lubridate)
data %>%
group_by(patientId, laterality) %>%
mutate(followup_flag = cummin(date - dplyr::lag(date, default = first(date)) <= 365)) %>%
filter(followup_flag == 1) %>%
summarise(followUpYears = as.numeric(
difftime(last(date), first(date), units = "days") / 365) %>%
ceiling()
)
#> `summarise()` has grouped output by 'patientId'. You can override using the
#> `.groups` argument.
#> # A tibble: 3 x 3
#> # Groups: patientId [2]
#> patientId laterality followUpYears
#> <chr> <chr> <dbl>
#> 1 A L 0
#> 2 B L 2
#> 3 B R 2
Data used:
data <- tibble(patientId = c('A','A','A','A','A','A','B','B','B','B','B','B','B'),
laterality = c('L','L','L','L','L','L','R','R','R','R','L','L','L'),
date = as.Date(c('2000-05-05','2000-05-05','2001-05-06','2001-05-07','2002-05-06','2002-05-07','2000-09-08','2001-09-07','2001-09-09','2001-09-10','2000-09-08','2001-09-07','2001-09-10')),
imageId = 1:13)
<sup>Created on 2023-02-08 by the reprex package (v2.0.1)</sup>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论