如何提取每家医院中重叠的住院期间?

huangapple go评论55阅读模式
英文:

How to extract the overlapping hospitalization periods in each hospital?

问题

df中有五个变量:studyid、hospitalname、Date1、Date2和group。对于df中hospitalname中的每个名称,我想提取所有日期从Date1到Date2与group为0和1的组之间重叠的组合。

library(zoo)

# 创建示例数据
df <- data.frame(
  studyid = 1:5,
  Date1 = as.yearmon(c("2020-01", "2020-03", "2020-10", "2020-07", "2020-06")),
  Date2 = as.yearmon(c("2020-02", "2020-10", "2021-02", "2020-08", "2020-10")),
  hospitalname = c("Hospital A", "Hospital A", "Hospital A", "Hospital B", "Hospital B"),
  group = c(0, 1, 0, 1, 0)
)

分析后,我想要的结果如下:

result <- data.frame(
  studyid.0 = c("3","5"),
  Date1_0 = as.yearmon(c("2020-10", "2020-06")),
  Date2_0 = as.yearmon(c("2021-02", "2020-10")),
  studyid.1 = c("2","4"),
  Date1_1 = as.yearmon(c("2020-03", "2020-07")),
  Date2_1 = as.yearmon(c("2020-10", "2020-08")),
  hospitalname = c("Hospital A", "Hospital B")
)

非常感谢您的支持。

英文:

There are five variables in df: studyid, hospitalname, Date1, Date2, and group. For each name in hospitalname in df, I would like to extract all combinations where the period from Date1 to Date2 overlaps between groups with group of 0 and 1.

library(zoo)

# create example data
df &lt;- data.frame(
  studyid = 1:5,
  Date1 = as.yearmon(c(&quot;2020-01&quot;, &quot;2020-03&quot;, &quot;2020-10&quot;, &quot;2020-07&quot;, &quot;2020-06&quot;)),
  Date2 = as.yearmon(c(&quot;2020-02&quot;, &quot;2020-10&quot;, &quot;2021-02&quot;, &quot;2020-08&quot;, &quot;2020-10&quot;)),
  hospitalname = c(&quot;Hospital A&quot;, &quot;Hospital A&quot;, &quot;Hospital A&quot;, &quot;Hospital B&quot;, &quot;Hospital B&quot;),
  group = c(0, 1, 0, 1, 0)
)

After the analysis, I would like to have the result like this

result &lt;- data.frame(
  studyid.0 = c(&quot;3&quot;,&quot;5&quot;),
  Date1_0 = as.yearmon(c(&quot;2020-10&quot;, &quot;2020-06&quot;)),
  Date2_0 = as.yearmon(c(&quot;2021-02&quot;, &quot;2020-10&quot;)),
  studyid.1 = c(&quot;2&quot;,&quot;4&quot;),
  Date1_1 = as.yearmon(c(&quot;2020-03&quot;, &quot;2020-07&quot;)),
  Date2_1 = as.yearmon(c(&quot;2020-10&quot;, &quot;2020-08&quot;)),
  hospitalname = c(&quot;Hospital A&quot;, &quot;Hospital B&quot;)
)

I truly appreciate your support.

答案1

得分: 1

library(dplyr)
library(zoo)

result <- df %>%
      inner_join(df, by = "hospitalname") %>%
      filter(group.x == 0, group.y == 1,
             Date1.x <= Date2.y, Date1.y <= Date2.x) %>%
      select(studyid.0 = studyid.x, Date1_0 = Date1.x, Date2_0 = Date2.x,
             studyid.1 = studyid.y, Date1_1 = Date1.y, Date2_1 = Date2.y,
             hospitalname) %>%
      distinct()

result
  studyid.0  Date1_0  Date2_0 studyid.1  Date1_1  Date2_1 hospitalname
1         3 Oct 2020 Feb 2021         2 Mar 2020 Oct 2020   Hospital A
2         5 Jun 2020 Oct 2020         4 Jul 2020 Aug 2020   Hospital B

或者,如@onyambu所提到的,您可以在inner_join()函数中添加 'suffix = c('_0', '_1')',而不是使用select():

result <- df %>%
  inner_join(df, by = "hospitalname", suffix = c("_0", "_1")) %>%
  filter(group_0 == 0, group_1 == 1,
         Date1_0 <= Date2_1, Date1_1 <= Date2_0) %>%
  distinct()

这个 'suffix' 意味着所有具有相同名称的列将在第一个数据框中添加 '_0' 后缀,而在第二个数据框中添加 '_1' 后缀。

英文:
library(dplyr)
library(zoo)

result &lt;- df %&gt;%
      inner_join(df, by = &quot;hospitalname&quot;) %&gt;%
      filter(group.x == 0, group.y == 1,
             Date1.x &lt;= Date2.y, Date1.y &lt;= Date2.x) %&gt;%
      select(studyid.0 = studyid.x, Date1_0 = Date1.x, Date2_0 = Date2.x,
             studyid.1 = studyid.y, Date1_1 = Date1.y, Date2_1 = Date2.y,
             hospitalname) %&gt;%
      distinct()


result
  studyid.0  Date1_0  Date2_0 studyid.1  Date1_1  Date2_1 hospitalname
1         3 Oct 2020 Feb 2021         2 Mar 2020 Oct 2020   Hospital A
2         5 Jun 2020 Oct 2020         4 Jul 2020 Aug 2020   Hospital B

The inner_join() function creates combinations of rows with the same hospital name. Next, using filter() function you can choose rows where the date periods overlap between groups with group of 0 and 1. Then, using select() function renames the columns to match your desired output. Finally, the distinct() function removes duplicated rows created by the join function.

Alternatively, as was mentioned by @onyambu, you can add ' suffix = c('_0', '_1') ' within the inner_join() function instead of using select():

result &lt;- df %&gt;%
  inner_join(df, by = &quot;hospitalname&quot;, suffix = c(&quot;_0&quot;, &quot;_1&quot;)) %&gt;%
  filter(group_0 == 0, group_1 == 1,
         Date1_0 &lt;= Date2_1, Date1_1 &lt;= Date2_0) %&gt;%
  distinct()

This 'suffix' means that all columns with the same name will be suffixed with '_0' for the first data frame and '_1' for the second data frame.

答案2

得分: 0

df %>%
  mutate(id = cumsum(group)) %>%
  pivot_wider(id_cols = c(id, hospitalname), names_from = group,
               values_from = studyid:Date2, names_vary = 'slowest') %>%
  drop_na()

一个数据框:2 × 8

 id       hospitalname  studyid_0  Date1_0   Date2_0   studyid_1  Date1_1   Date2_1  


1 1 Hospital A 3 Oct 2020 Feb 2021 2 Mar 2020 Oct 2020
2 2 Hospital B 5 Jun 2020 Oct 2020 4 Jul 2020 Aug 2020


<details>
<summary>英文:</summary>

       df %&gt;%
         mutate(id=cumsum(group)) %&gt;%
         pivot_wider(id_cols = c(id, hospitalname), names_from = group, 
                      values_from = studyid:Date2, names_vary = &#39;slowest&#39;) %&gt;%
         drop_na()

    # A tibble: 2 &#215; 8
         id hospitalname studyid_0 Date1_0   Date2_0   studyid_1 Date1_1   Date2_1  
      &lt;dbl&gt; &lt;chr&gt;            &lt;int&gt; &lt;yearmon&gt; &lt;yearmon&gt;     &lt;int&gt; &lt;yearmon&gt; &lt;yearmon&gt;
    1     1 Hospital A           3 Oct 2020  Feb 2021          2 Mar 2020  Oct 2020 
    2     2 Hospital B           5 Jun 2020  Oct 2020          4 Jul 2020  Aug 2020 

</details>



huangapple
  • 本文由 发表于 2023年2月16日 06:09:10
  • 转载请务必保留本文链接:https://go.coder-hub.com/75465889.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定