英文:
Count open cases with and without time cut-off
问题
我已经翻译好了你提供的代码和注释,如下所示:
df <- data.frame(Person = c('111','334','334','334','334','334','888','888','888','888','888','888','888','888'),
RelevantCase = c(0,1,1,0,1,0,1,0,1,0,0,1,0,1),
StartDate = c('2017-03-04','2015-11-14','2018-04-26','2020-01-24','2020-01-25','2020-02-29','2015-08-09',
'2015-08-09','2018-04-10','2019-09-20','2020-06-30','2020-11-01','2021-08-13','2022-11-11'),
EndDate = c('2017-12-12','2022-01-25','2020-03-01','2021-02-24','2020-01-30','2022-02-02','2019-10-20',
'2019-10-30','2018-10-10','2021-10-10','2020-07-20','2022-11-20','2021-11-12','2023-01-01')
)
我明白你要创建两个新变量的需求,不过我不会回答这个问题。如果你有任何其他需要翻译的内容,请告诉我。
英文:
I have this dataset with variables Person, RelevantCase, StartDate, and EndDate:
df <- data.frame(Person = c('111','334','334','334','334','334','888','888','888','888','888','888','888','888'),
RelevantCase = c(0,1,1,0,1,0,1,0,1,0,0,1,0,1),
StartDate = c('2017-03-04','2015-11-14','2018-04-26','2020-01-24','2020-01-25','2020-02-29','2015-08-09',
'2015-08-09','2018-04-10','2019-09-20','2020-06-30','2020-11-01','2021-08-13','2022-11-11'),
EndDate = c('2017-12-12','2022-01-25','2020-03-01','2021-02-24','2020-01-30','2022-02-02','2019-10-20',
'2019-10-30','2018-10-10','2021-10-10','2020-07-20','2022-11-20','2021-11-12','2023-01-01')
)
I want to create two new variables:
-
A count of the number of relevant open cases per Person. That is, I want to count how many relevant cases have
1.1. StartDates before the current cases' StartDate and
1.2. EndDates on or after the current StartDate.
By "relevant case" I mean that I want to only count observations with RelevantCase==1.
- A count of the number of relevant open cases per Person that started within the last two years of the current StartDate. So, this is the same as the first new variable, but it will not count relevant open cases with StartDates that are more than two years prior to the current StartDate.
The resulting dataset should look like this:
df2 <- data.frame(Person = c('111','334','334','334','334','334','888','888','888','888','888','888','888','888'),
RelevantCase = c(0,1,1,0,1,0,1,0,1,0,0,1,0,1),
StartDate = c('2017-03-04','2015-11-14','2018-04-26','2020-01-24','2020-01-25','2020-02-29','2015-08-09',
'2015-08-09','2018-04-10','2019-09-20','2020-06-30','2020-11-01','2021-08-13','2022-11-11'),
EndDate = c('2017-12-12','2022-01-25','2020-03-01','2021-02-24','2020-01-30','2022-02-02','2019-10-20',
'2019-10-30','2018-10-10','2021-10-10','2020-07-20','2022-11-20','2021-11-12','2023-01-01'),
NumberOpenCases = c(0,0,1,2,2,2,0,0,1,1,0,0,1,1),
NumberOpenCases_2y = c(0,0,0,1,1,1,0,0,0,0,0,0,1,0)
)
答案1
得分: 1
这段代码的功能是通过循环遍历每个分组中的 StartDate
列,并检查所需条件来计算相关的开放案例数量。
英文:
This gives the number of relevant open cases by looping over StartDate
column within each group and checking for the conditions desired.
library(dplyr)
library(purrr)
df %>%
mutate(StartDate = as.Date(StartDate),
EndDate = as.Date(EndDate)) %>%
arrange(Person, StartDate, EndDate) %>%
group_by(Person) %>%
mutate(NumberOpenCases = map_int(StartDate, ~sum(StartDate < .x &
EndDate >= .x &
RelevantCase == 1)),
NumberOpenCases_2y = map_int(StartDate, ~sum(StartDate < .x &
EndDate >= .x &
RelevantCase == 1 &
.x - StartDate < 730)))
#> # A tibble: 14 x 6
#> # Groups: Person [3]
#> Person RelevantCase StartDate EndDate NumberOpenCases NumberOpenCases_2y
#> <chr> <dbl> <date> <date> <int> <int>
#> 1 111 0 2017-03-04 2017-12-12 0 0
#> 2 334 1 2015-11-14 2022-01-25 0 0
#> 3 334 1 2018-04-26 2020-03-01 1 0
#> 4 334 0 2020-01-24 2021-02-24 2 1
#> 5 334 1 2020-01-25 2020-01-30 2 1
#> 6 334 0 2020-02-29 2022-02-02 2 1
#> 7 888 1 2015-08-09 2019-10-20 0 0
#> 8 888 0 2015-08-09 2019-10-30 0 0
#> 9 888 1 2018-04-10 2018-10-10 1 0
#> 10 888 0 2019-09-20 2021-10-10 1 0
#> 11 888 0 2020-06-30 2020-07-20 0 0
#> 12 888 1 2020-11-01 2022-11-20 0 0
#> 13 888 0 2021-08-13 2021-11-12 1 1
#> 14 888 1 2022-11-11 2023-01-01 1 0
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论