2023年2月19日 04:45:18go评论89阅读模式

英文:

Is there a way to collapse rows based on multiple column conditions, grouped by ID in R?

问题

我有一个关于抗生素使用数据的数据框（长格式）。对于每位患者，我想要根据以下条件将特定抗生素组合成“联合疗法”：1）特定抗生素列表感兴趣，并且2）仅当这些特定抗生素具有相同的开始日期时。感兴趣的特定疗法包括：

Ampicillin 和 ceftriaxone
Ampicillin 和 gentamicin
Ampicillin 和 streptomycin

我不需要其他组合（例如，如果 ampicillin 和 daptomycin 在同一天开始，我不想将它们组合在一起）。

我有一个患者ID，可以根据它将抗生素使用情况进行分组。我的 start_date 变量已经被转换为日期类（YYYY-MM-DD）。所有其他变量都是字符型。

以下是一组数据示例。在实际的数据框中，我有超过1,000行。

ID <- c("C1", "C1", "C1", "C2", "C3", "C3", "C3", "C4", "C4", "C5", "C5", "C5", "C6", "C6")
abx <- c("ampicillin", "ceftriaxone", "vancomycin", "linezolid", "ampicillin", "gentamicin", "vancomycin", "piptazo", "daptomycin", "ampicillin", "streptomycin", "vancomycin", "ampicillin", "daptomycin") 
date <- c(as.Date("2020-02-01"), as.Date("2020-02-01"), as.Date("2020-01-28"), as.Date("2020-03-01"), as.Date("2020-02-17"), as.Date("2020-02-17"), as.Date("2020-03-01"), as.Date("2020-01-11"), as.Date("2020-01-05"), as.Date("2020-01-05"), as.Date("2020-01-05"), as.Date("2020-01-01"), as.Date("2020-01-14"), as.Date("2020-01-14"))
df_abx <- data.frame(ID, abx, date)

我希望的解决方案如下：

ID	abx	start_date
C1	ampicillin, ceftriaxone	2020-02-01
C1	vancomycin	2020-01-28
C2	linezolid	2020-03-01
C3	ampicillin, gentamicin	2020-02-17
C3	vancomycin	2020-03-01
C4	piptazo	2020-01-11
C4	daptomycin	2020-01-05
C5	ampicillin, streptomycin	2020-01-05
C5	vancomycin	2020-01-01
C6	ampicillin	2020-01-14
C6	daptomycin	2020-01-14

我怀疑解决方案包括使用 "group_by" 对患者ID进行分组，我已经尝试过使用 "summarize" 和 "case_when"，但还没有达到我想要的结果。

英文:

I have a df of episodic antibiotic data (long form). For each patient, I’m looking to combine specific antibiotics into “combo” therapies based on 1) a specific list of antibiotics that would be of interest, and 2) ONLY if those specific antibiotics had the same start dates. The specific therapies of interest are:

Ampicillin and ceftriaxone
Ampicillin and gentamicin
Ampicillin and streptomycin

I don’t need any other combinations (e.g. if ampicillin and daptomycin start on the same day, I don’t want that combined).

I have a patient ID by which the antibiotic episodes can be grouped. My start_date variable has already been coerced into date class (YYYY-MM-DD). All other variables are character.

Here’s a sample set of data. In my actual dataframe, have over 1,000 rows.

ID &lt;- c(&quot;C1&quot;, &quot;C1&quot;, &quot;C1&quot;, &quot;C2&quot;, &quot;C3&quot;, &quot;C3&quot;, &quot;C3&quot;, &quot;C4&quot;, &quot;C4&quot;, &quot;C5&quot;, &quot;C5&quot;, &quot;C5&quot;, &quot;C6&quot;, &quot;C6&quot;)
abx &lt;- c(&quot;ampicillin&quot;, &quot;ceftriaxone&quot;, &quot;vancomycin&quot;, &quot;linezolid&quot;, &quot;ampicillin&quot;, &quot;gentamicin&quot;, &quot;vancomycin&quot;, &quot;piptazo&quot;, &quot;daptomycin&quot;, &quot;ampicillin&quot;, &quot;streptomycin&quot;, &quot;vancomycin&quot;, &quot;ampicillin&quot;, &quot;daptomycin&quot;) 
date &lt;- c(as.Date(&quot;2020-02-01&quot;), as.Date(&quot;2020-02-01&quot;), as.Date(&quot;2020-01-28&quot;), as.Date(&quot;2020-03-01&quot;), as.Date(&quot;2020-02-17&quot;), as.Date(&quot;2020-02-17&quot;), as.Date(&quot;2020-03-01&quot;), as.Date(&quot;2020-01-11&quot;), as.Date(&quot;2020-01-05&quot;), as.Date(&quot;2020-01-05&quot;), as.Date(&quot;2020-01-05&quot;), as.Date(&quot;2020-01-01&quot;), as.Date(&quot;2020-01-14&quot;), as.Date(&quot;2020-01-14&quot;))
df_abx &lt;- data.frame(ID, abx, date)

This is the solution I’m hoping for:

ID	abx	start_date
C1	ampicillin, ceftriaxone	2020-02-01
C1	vancomycin	2020-01-28
C2	linezolid	2020-03-01
C3	ampicillin, gentamicin	2020-02-17
C3	vancomycin	2020-03-01
C4	piptazo	2020-01-11
C4	daptomycin	2020-01-05
C5	ampicillin, streptomycin	2020-01-05
C5	vancomycin	2020-01-01
C6	ampicillin	2020-01-14
C6	daptomycin	2020-01-14

I suspect the solution includes "group_by" for the patient ID, and I've been toying around with summarize and case_when but I haven't gotten to where I want to be.

答案1

得分: 2

"Update: the correct version ** We can do it this way:

library(dplyr)
library(tidyr)
df_abx %>%
  group_by(ID, date) %>%
  mutate(abx = toString(abx)) %>%
  anti_join(df1, by="abx") %>%
  separate_rows(abx) %>%
  distinct() %>%
  bind_rows(df_abx %>%
              group_by(ID, date) %>%
              mutate(x = ifelse(str_detect(abx, pattern), 1, 0)) %>%
              filter(sum(x) >= 2) %>%
              summarise(abx = toString(abx))) %>%
  arrange(ID, abx)

 ID    abx                      date      
   <chr> <chr>                    <date>    
 1 C1    ampicillin, ceftriaxone  2020-02-01
 2 C1    vancomycin               2020-01-28
 3 C2    linezolid                2020-03-01
 4 C3    ampicillin, gentamicin   2020-02-17
 5 C3    vancomycin               2020-03-01
 6 C4    daptomycin               2020-01-05
 7 C4    piptazo                  2020-01-11
 8 C5    ampicillin, streptomycin 2020-01-05
 9 C5    vancomycin               2020-01-01
10 C6    ampicillin               2020-01-14
11 C6    daptomycin               2020-01-14
```"
<details>
<summary>英文:</summary>
**Update: the correct version ** We can do it this way:

library(dplyr)
library(tidyr)

df_abx %>%
group_by(ID, date) %>%
mutate(abx = toString(abx)) %>%
anti_join(df1, by="abx") %>%
separate_rows(abx) %>%
distinct() %>%
bind_rows(df_abx %>%
group_by(ID, date) %>%
mutate(x = ifelse(str_detect(abx, pattern), 1, 0)) %>%
filter(sum(x)>=2) %>%
summarise(abx= toString(abx))) %>%
arrange(ID, abx)

ID abx date
<chr> <chr> <date>
1 C1 ampicillin, ceftriaxone 2020-02-01
2 C1 vancomycin 2020-01-28
3 C2 linezolid 2020-03-01
4 C3 ampicillin, gentamicin 2020-02-17
5 C3 vancomycin 2020-03-01
6 C4 daptomycin 2020-01-05
7 C4 piptazo 2020-01-11
8 C5 ampicillin, streptomycin 2020-01-05
9 C5 vancomycin 2020-01-01
10 C6 ampicillin 2020-01-14
11 C6 daptomycin 2020-01-14


</details>
# 答案2
**得分**: 1
我们可以按照 'ID'、'date' 分组，通过使用 `case_match` 创建一个新的分组列，为所选药物创建一个共同的分组，然后添加分组，通过 `paste`（`str_c`）在 `abx` 列中汇总 'abx'。
```r
library(dplyr)
library(stringr)
df_abx %>%
 group_by(ID, date) %>%
  mutate(abx_grp = coalesce(case_match(abx,
   c('ampicillin', 'ceftriaxone', 'gentamicin', 'streptomycin') ~ 'g1'), abx)) %>%
  group_by(abx_grp, .add = TRUE) %>%
 summarise(abx = str_c(abx, collapse = ", "), .groups = 'drop')  %>%
 select(-abx_grp)

英文:

We may group by 'ID', 'date', create a new grouping column with case_match by making a common group for the selected drugs, and then add the grouping, to summarise the 'abx' by pasteing (str_c) the values in abx

library(dplyr)
library(stringr)
df_abx %&gt;% 
 group_by(ID, date) %&gt;%
  mutate(abx_grp = coalesce(case_match(abx,
   c(&#39;ampicillin&#39;, &#39;ceftriaxone&#39;, &#39;gentamicin&#39;, &#39;streptomycin&#39;) ~ &#39;g1&#39;), abx)) %&gt;%
  group_by(abx_grp, .add = TRUE) %&gt;% 
 summarise(abx = str_c(abx, collapse = &quot;, &quot;), .groups = &#39;drop&#39;)  %&gt;% 
 select(-abx_grp)

-output

# A tibble: 11 &#215; 3
   ID    date       abx                     
   &lt;chr&gt; &lt;date&gt;     &lt;chr&gt;                   
 1 C1    2020-01-28 vancomycin              
 2 C1    2020-02-01 ampicillin, ceftriaxone 
 3 C2    2020-03-01 linezolid               
 4 C3    2020-02-17 ampicillin, gentamicin  
 5 C3    2020-03-01 vancomycin              
 6 C4    2020-01-05 daptomycin              
 7 C4    2020-01-11 piptazo                 
 8 C5    2020-01-01 vancomycin              
 9 C5    2020-01-05 ampicillin, streptomycin
10 C6    2020-01-14 daptomycin              
11 C6    2020-01-14 ampicillin

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

有没有办法根据R中的多个列条件，按ID分组来折叠行？

问题

答案1

识别特定节点的所有子节点，适用于非常大的数据。

在R中，如何根据区间匹配向data.table添加列？

如何将具有重复行的数据框重塑为行名称和列名称

Error using predict function terra package in R 在R中使用terra包的predict函数时发生错误

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。