2023年8月9日 17:51:23go评论141阅读模式

英文:

How to merge several rows with event data from multiple columns when based on unique character column?

问题

假设我有一个名为d的数据框，其中包含以下内容：

> d
  pnr age.hl age.hl.time kon.hl kon.hl.time sen.hl sen.hl.time mix.hl mix.hl.time
1   A      0  2018-12-31      1  2011-02-01      0  2018-12-31      0  2018-12-31
2   A      1  2013-10-31      0  2018-12-31      0  2018-12-31      1  2013-10-31
3   A      0  2018-12-31      0  2018-12-31      1  2016-09-30      0  2018-12-31
4   B      0  2018-12-31      0  2018-12-31      1  2004-04-30      0  2018-12-31
5   C      0  2018-12-31      0  2018-12-31      0  2018-12-31      1  2006-01-17
6   C      1  2008-07-31      0  2018-12-31      1  2009-01-31      0  2018-12-31

其中，d$pnr是唯一的患者标识符，age.hl、kon.hl、sen.hl和mix.hl表示不同的疾病及其对应的时间变量。如果在任何一列中出现1，则不能再在同一列中再次出现1。

我需要合并行，使得d$pnr只出现一次。也就是说，每列的事件数据/信息应合并到同一行中。

我在dplyr中寻找解决方案。

期望的输出：

  pnr age.hl age.hl.time kon.hl kon.hl.time sen.hl sen.hl.time mix.hl mix.hl.time
1   A      1  2013-10-31      1  2011-02-01      1  2016-09-30      1  2013-10-31
2   B      0  2018-12-31      0  2018-12-31      1  2004-04-30      0  2018-12-31
3   C      1  2008-07-31      0  2018-12-31      1  2009-01-31      1  2006-01-17

数据：

d <- data.frame(
  pnr = c("A", "A", "A", "B", "C", "C"),
  age.hl = c(0, 1, 0, 0, 0, 1),
  age.hl.time = c(as.Date("2018-12-31"), as.Date("2013-10-31"), as.Date("2018-12-31"),
                  as.Date("2018-12-31"), as.Date("2018-12-31"), as.Date("2008-07-31")),
  kon.hl = c(1, 0, 0, 0, 0, 0),
  kon.hl.time = c(as.Date("2011-02-01"), as.Date("2018-12-31"), as.Date("2018-12-31"),
                  as.Date("2018-12-31"), as.Date("2018-12-31"), as.Date("2018-12-31")),
  sen.hl = c(0, 0, 1, 1, 0, 1),
  sen.hl.time = c(as.Date("2018-12-31"), as.Date("2018-12-31"), as.Date("2016-09-30"),
                  as.Date("2004-04-30"), as.Date("2018-12-31"), as.Date("2009-01-31")),
  mix.hl = c(0, 1, 0, 0, 1, 0),
  mix.hl.time = c(as.Date("2018-12-31"), as.Date("2013-10-31"), as.Date("2018-12-31"),
                  as.Date("2018-12-31"), as.Date("2006-01-17"), as.Date("2018-12-31"))
)

英文:

Let's say I have d:

&gt; d
  pnr age.hl age.hl.time kon.hl kon.hl.time sen.hl sen.hl.time mix.hl mix.hl.time
1   A      0  2018-12-31      1  2011-02-01      0  2018-12-31      0  2018-12-31
2   A      1  2013-10-31      0  2018-12-31      0  2018-12-31      1  2013-10-31
3   A      0  2018-12-31      0  2018-12-31      1  2016-09-30      0  2018-12-31
4   B      0  2018-12-31      0  2018-12-31      1  2004-04-30      0  2018-12-31
5   C      0  2018-12-31      0  2018-12-31      0  2018-12-31      1  2006-01-17
6   C      1  2008-07-31      0  2018-12-31      1  2009-01-31      0  2018-12-31

d$pnr is a unique patient identifier. age.hl, kon.hl, sen.hl and mix.hl denote different diseases and their corresponding time variable. All patients are censored at 2018-12-31 if 0 event occured.

If 1 occur in any column, it cannot occur again in the same column.

I need to merge rows, so d$pnr only occur one time. I.e, event data/information from each column shall be merged to the same row.

I look for a solution in dplyr.

Expected output:

  pnr age.hl age.hl.time kon.hl kon.hl.time sen.hl sen.hl.time mix.hl mix.hl.time
1   A      1  2013-10-31      1  2011-02-01      1  2016-09-30      1  2013-10-31
2   B      0  2018-12-31      0  2018-12-31      1  2004-04-30      0  2018-12-31
3   C      1  2008-07-31      0  2018-12-31      1  2009-01-31      1  2006-01-17

Data

d &lt;- data.frame(
  pnr = c(&quot;A&quot;, &quot;A&quot;, &quot;A&quot;, &quot;B&quot;, &quot;C&quot;, &quot;C&quot;),
  age.hl = c(0, 1, 0, 0, 0, 1),
  age.hl.time = c(as.Date(&quot;2018-12-31&quot;), as.Date(&quot;2013-10-31&quot;), as.Date(&quot;2018-12-31&quot;),
                  as.Date(&quot;2018-12-31&quot;), as.Date(&quot;2018-12-31&quot;), as.Date(&quot;2008-07-31&quot;)),
  kon.hl = c(1, 0, 0, 0, 0, 0),
  kon.hl.time = c(as.Date(&quot;2011-02-01&quot;), as.Date(&quot;2018-12-31&quot;), as.Date(&quot;2018-12-31&quot;),
                  as.Date(&quot;2018-12-31&quot;), as.Date(&quot;2018-12-31&quot;), as.Date(&quot;2018-12-31&quot;)),
  sen.hl = c(0, 0, 1, 1, 0, 1),
  sen.hl.time = c(as.Date(&quot;2018-12-31&quot;), as.Date(&quot;2018-12-31&quot;), as.Date(&quot;2016-09-30&quot;),
                  as.Date(&quot;2004-04-30&quot;), as.Date(&quot;2018-12-31&quot;), as.Date(&quot;2009-01-31&quot;)),
  mix.hl = c(0, 1, 0, 0, 1, 0),
  mix.hl.time = c(as.Date(&quot;2018-12-31&quot;), as.Date(&quot;2013-10-31&quot;), as.Date(&quot;2018-12-31&quot;),
                  as.Date(&quot;2018-12-31&quot;), as.Date(&quot;2006-01-17&quot;), as.Date(&quot;2018-12-31&quot;))
)

答案1

得分: 2

你可以使用which.max()函数：

library(dplyr)
d %>%
  summarise(across(ends_with("time"), ~ .x[which.max(get(sub(".time", "", cur_column())))]),
            across(ends_with("hl"), max),
            .by = pnr) %>%
  select(names(d))
#   pnr age.hl age.hl.time kon.hl kon.hl.time sen.hl sen.hl.time mix.hl mix.hl.time
# 1   A      1  2013-10-31      1  2011-02-01      1  2016-09-30      1  2013-10-31
# 2   B      0  2018-12-31      0  2018-12-31      1  2004-04-30      0  2018-12-31
# 3   C      1  2008-07-31      0  2018-12-31      1  2009-01-31      1  2006-01-17

注意：第一个across()函数中的值依赖于第二个across()函数中的值，所以这两个across()函数不能互换位置。

英文:

You can use which.max():

library(dplyr)
d %&gt;%
  summarise(across(ends_with(&quot;time&quot;), ~ .x[which.max(get(sub(&quot;.time&quot;, &quot;&quot;, cur_column())))]),
            across(ends_with(&quot;hl&quot;), max),
            .by = pnr) %&gt;%
  select(names(d))
#   pnr age.hl age.hl.time kon.hl kon.hl.time sen.hl sen.hl.time mix.hl mix.hl.time
# 1   A      1  2013-10-31      1  2011-02-01      1  2016-09-30      1  2013-10-31
# 2   B      0  2018-12-31      0  2018-12-31      1  2004-04-30      0  2018-12-31
# 3   C      1  2008-07-31      0  2018-12-31      1  2009-01-31      1  2006-01-17

Note: The function in the first across() involves the values in the second across(), so the two across() are not exchangeable.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何在基于唯一字符列时，将多个列的事件数据合并为几行？

问题

答案1

Pandas 透视表 – 如何按多列进行透视

Plotting multiple dendrogram plots by ggplot and faceting them by 2 two methods by ggplot, R

读取一个文本文件，根据分隔符将其拆分为多行。

有没有办法根据R中的多个列条件，按ID分组来折叠行？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。