英文:
How to extract the overlapping hospitalization periods in each hospital?
问题
df中有五个变量:studyid、hospitalname、Date1、Date2和group。对于df中hospitalname中的每个名称,我想提取所有日期从Date1到Date2与group为0和1的组之间重叠的组合。
library(zoo)
# 创建示例数据
df <- data.frame(
studyid = 1:5,
Date1 = as.yearmon(c("2020-01", "2020-03", "2020-10", "2020-07", "2020-06")),
Date2 = as.yearmon(c("2020-02", "2020-10", "2021-02", "2020-08", "2020-10")),
hospitalname = c("Hospital A", "Hospital A", "Hospital A", "Hospital B", "Hospital B"),
group = c(0, 1, 0, 1, 0)
)
分析后,我想要的结果如下:
result <- data.frame(
studyid.0 = c("3","5"),
Date1_0 = as.yearmon(c("2020-10", "2020-06")),
Date2_0 = as.yearmon(c("2021-02", "2020-10")),
studyid.1 = c("2","4"),
Date1_1 = as.yearmon(c("2020-03", "2020-07")),
Date2_1 = as.yearmon(c("2020-10", "2020-08")),
hospitalname = c("Hospital A", "Hospital B")
)
非常感谢您的支持。
英文:
There are five variables in df: studyid, hospitalname, Date1, Date2, and group. For each name in hospitalname in df, I would like to extract all combinations where the period from Date1 to Date2 overlaps between groups with group of 0 and 1.
library(zoo)
# create example data
df <- data.frame(
studyid = 1:5,
Date1 = as.yearmon(c("2020-01", "2020-03", "2020-10", "2020-07", "2020-06")),
Date2 = as.yearmon(c("2020-02", "2020-10", "2021-02", "2020-08", "2020-10")),
hospitalname = c("Hospital A", "Hospital A", "Hospital A", "Hospital B", "Hospital B"),
group = c(0, 1, 0, 1, 0)
)
After the analysis, I would like to have the result like this
result <- data.frame(
studyid.0 = c("3","5"),
Date1_0 = as.yearmon(c("2020-10", "2020-06")),
Date2_0 = as.yearmon(c("2021-02", "2020-10")),
studyid.1 = c("2","4"),
Date1_1 = as.yearmon(c("2020-03", "2020-07")),
Date2_1 = as.yearmon(c("2020-10", "2020-08")),
hospitalname = c("Hospital A", "Hospital B")
)
I truly appreciate your support.
答案1
得分: 1
library(dplyr)
library(zoo)
result <- df %>%
inner_join(df, by = "hospitalname") %>%
filter(group.x == 0, group.y == 1,
Date1.x <= Date2.y, Date1.y <= Date2.x) %>%
select(studyid.0 = studyid.x, Date1_0 = Date1.x, Date2_0 = Date2.x,
studyid.1 = studyid.y, Date1_1 = Date1.y, Date2_1 = Date2.y,
hospitalname) %>%
distinct()
result
studyid.0 Date1_0 Date2_0 studyid.1 Date1_1 Date2_1 hospitalname
1 3 Oct 2020 Feb 2021 2 Mar 2020 Oct 2020 Hospital A
2 5 Jun 2020 Oct 2020 4 Jul 2020 Aug 2020 Hospital B
或者,如@onyambu所提到的,您可以在inner_join()函数中添加 'suffix = c('_0', '_1')',而不是使用select():
result <- df %>%
inner_join(df, by = "hospitalname", suffix = c("_0", "_1")) %>%
filter(group_0 == 0, group_1 == 1,
Date1_0 <= Date2_1, Date1_1 <= Date2_0) %>%
distinct()
这个 'suffix' 意味着所有具有相同名称的列将在第一个数据框中添加 '_0' 后缀,而在第二个数据框中添加 '_1' 后缀。
英文:
library(dplyr)
library(zoo)
result <- df %>%
inner_join(df, by = "hospitalname") %>%
filter(group.x == 0, group.y == 1,
Date1.x <= Date2.y, Date1.y <= Date2.x) %>%
select(studyid.0 = studyid.x, Date1_0 = Date1.x, Date2_0 = Date2.x,
studyid.1 = studyid.y, Date1_1 = Date1.y, Date2_1 = Date2.y,
hospitalname) %>%
distinct()
result
studyid.0 Date1_0 Date2_0 studyid.1 Date1_1 Date2_1 hospitalname
1 3 Oct 2020 Feb 2021 2 Mar 2020 Oct 2020 Hospital A
2 5 Jun 2020 Oct 2020 4 Jul 2020 Aug 2020 Hospital B
The inner_join() function creates combinations of rows with the same hospital name. Next, using filter() function you can choose rows where the date periods overlap between groups with group of 0 and 1. Then, using select() function renames the columns to match your desired output. Finally, the distinct() function removes duplicated rows created by the join function.
Alternatively, as was mentioned by @onyambu, you can add ' suffix = c('_0', '_1') ' within the inner_join() function instead of using select():
result <- df %>%
inner_join(df, by = "hospitalname", suffix = c("_0", "_1")) %>%
filter(group_0 == 0, group_1 == 1,
Date1_0 <= Date2_1, Date1_1 <= Date2_0) %>%
distinct()
This 'suffix' means that all columns with the same name will be suffixed with '_0' for the first data frame and '_1' for the second data frame.
答案2
得分: 0
df %>%
mutate(id = cumsum(group)) %>%
pivot_wider(id_cols = c(id, hospitalname), names_from = group,
values_from = studyid:Date2, names_vary = 'slowest') %>%
drop_na()
一个数据框:2 × 8
id hospitalname studyid_0 Date1_0 Date2_0 studyid_1 Date1_1 Date2_1
1 1 Hospital A 3 Oct 2020 Feb 2021 2 Mar 2020 Oct 2020
2 2 Hospital B 5 Jun 2020 Oct 2020 4 Jul 2020 Aug 2020
<details>
<summary>英文:</summary>
df %>%
mutate(id=cumsum(group)) %>%
pivot_wider(id_cols = c(id, hospitalname), names_from = group,
values_from = studyid:Date2, names_vary = 'slowest') %>%
drop_na()
# A tibble: 2 × 8
id hospitalname studyid_0 Date1_0 Date2_0 studyid_1 Date1_1 Date2_1
<dbl> <chr> <int> <yearmon> <yearmon> <int> <yearmon> <yearmon>
1 1 Hospital A 3 Oct 2020 Feb 2021 2 Mar 2020 Oct 2020
2 2 Hospital B 5 Jun 2020 Oct 2020 4 Jul 2020 Aug 2020
</details>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论