如何在基于唯一字符列时,将多个列的事件数据合并为几行?

huangapple go评论90阅读模式
英文:

How to merge several rows with event data from multiple columns when based on unique character column?

问题

假设我有一个名为d的数据框,其中包含以下内容:

> d
  pnr age.hl age.hl.time kon.hl kon.hl.time sen.hl sen.hl.time mix.hl mix.hl.time
1   A      0  2018-12-31      1  2011-02-01      0  2018-12-31      0  2018-12-31
2   A      1  2013-10-31      0  2018-12-31      0  2018-12-31      1  2013-10-31
3   A      0  2018-12-31      0  2018-12-31      1  2016-09-30      0  2018-12-31
4   B      0  2018-12-31      0  2018-12-31      1  2004-04-30      0  2018-12-31
5   C      0  2018-12-31      0  2018-12-31      0  2018-12-31      1  2006-01-17
6   C      1  2008-07-31      0  2018-12-31      1  2009-01-31      0  2018-12-31

其中,d$pnr是唯一的患者标识符,age.hlkon.hlsen.hlmix.hl表示不同的疾病及其对应的时间变量。如果在任何一列中出现1,则不能再在同一列中再次出现1。

我需要合并行,使得d$pnr只出现一次。也就是说,每列的事件数据/信息应合并到同一行中。

我在dplyr中寻找解决方案。

期望的输出

  pnr age.hl age.hl.time kon.hl kon.hl.time sen.hl sen.hl.time mix.hl mix.hl.time
1   A      1  2013-10-31      1  2011-02-01      1  2016-09-30      1  2013-10-31
2   B      0  2018-12-31      0  2018-12-31      1  2004-04-30      0  2018-12-31
3   C      1  2008-07-31      0  2018-12-31      1  2009-01-31      1  2006-01-17

数据

d <- data.frame(
  pnr = c("A", "A", "A", "B", "C", "C"),
  age.hl = c(0, 1, 0, 0, 0, 1),
  age.hl.time = c(as.Date("2018-12-31"), as.Date("2013-10-31"), as.Date("2018-12-31"),
                  as.Date("2018-12-31"), as.Date("2018-12-31"), as.Date("2008-07-31")),
  kon.hl = c(1, 0, 0, 0, 0, 0),
  kon.hl.time = c(as.Date("2011-02-01"), as.Date("2018-12-31"), as.Date("2018-12-31"),
                  as.Date("2018-12-31"), as.Date("2018-12-31"), as.Date("2018-12-31")),
  sen.hl = c(0, 0, 1, 1, 0, 1),
  sen.hl.time = c(as.Date("2018-12-31"), as.Date("2018-12-31"), as.Date("2016-09-30"),
                  as.Date("2004-04-30"), as.Date("2018-12-31"), as.Date("2009-01-31")),
  mix.hl = c(0, 1, 0, 0, 1, 0),
  mix.hl.time = c(as.Date("2018-12-31"), as.Date("2013-10-31"), as.Date("2018-12-31"),
                  as.Date("2018-12-31"), as.Date("2006-01-17"), as.Date("2018-12-31"))
)
英文:

Let's say I have d:

&gt; d
  pnr age.hl age.hl.time kon.hl kon.hl.time sen.hl sen.hl.time mix.hl mix.hl.time
1   A      0  2018-12-31      1  2011-02-01      0  2018-12-31      0  2018-12-31
2   A      1  2013-10-31      0  2018-12-31      0  2018-12-31      1  2013-10-31
3   A      0  2018-12-31      0  2018-12-31      1  2016-09-30      0  2018-12-31
4   B      0  2018-12-31      0  2018-12-31      1  2004-04-30      0  2018-12-31
5   C      0  2018-12-31      0  2018-12-31      0  2018-12-31      1  2006-01-17
6   C      1  2008-07-31      0  2018-12-31      1  2009-01-31      0  2018-12-31

d$pnr is a unique patient identifier. age.hl, kon.hl, sen.hl and mix.hl denote different diseases and their corresponding time variable. All patients are censored at 2018-12-31 if 0 event occured.

If 1 occur in any column, it cannot occur again in the same column.

I need to merge rows, so d$pnr only occur one time. I.e, event data/information from each column shall be merged to the same row.

I look for a solution in dplyr.

Expected output:

  pnr age.hl age.hl.time kon.hl kon.hl.time sen.hl sen.hl.time mix.hl mix.hl.time
1   A      1  2013-10-31      1  2011-02-01      1  2016-09-30      1  2013-10-31
2   B      0  2018-12-31      0  2018-12-31      1  2004-04-30      0  2018-12-31
3   C      1  2008-07-31      0  2018-12-31      1  2009-01-31      1  2006-01-17

Data

d &lt;- data.frame(
  pnr = c(&quot;A&quot;, &quot;A&quot;, &quot;A&quot;, &quot;B&quot;, &quot;C&quot;, &quot;C&quot;),
  age.hl = c(0, 1, 0, 0, 0, 1),
  age.hl.time = c(as.Date(&quot;2018-12-31&quot;), as.Date(&quot;2013-10-31&quot;), as.Date(&quot;2018-12-31&quot;),
                  as.Date(&quot;2018-12-31&quot;), as.Date(&quot;2018-12-31&quot;), as.Date(&quot;2008-07-31&quot;)),
  kon.hl = c(1, 0, 0, 0, 0, 0),
  kon.hl.time = c(as.Date(&quot;2011-02-01&quot;), as.Date(&quot;2018-12-31&quot;), as.Date(&quot;2018-12-31&quot;),
                  as.Date(&quot;2018-12-31&quot;), as.Date(&quot;2018-12-31&quot;), as.Date(&quot;2018-12-31&quot;)),
  sen.hl = c(0, 0, 1, 1, 0, 1),
  sen.hl.time = c(as.Date(&quot;2018-12-31&quot;), as.Date(&quot;2018-12-31&quot;), as.Date(&quot;2016-09-30&quot;),
                  as.Date(&quot;2004-04-30&quot;), as.Date(&quot;2018-12-31&quot;), as.Date(&quot;2009-01-31&quot;)),
  mix.hl = c(0, 1, 0, 0, 1, 0),
  mix.hl.time = c(as.Date(&quot;2018-12-31&quot;), as.Date(&quot;2013-10-31&quot;), as.Date(&quot;2018-12-31&quot;),
                  as.Date(&quot;2018-12-31&quot;), as.Date(&quot;2006-01-17&quot;), as.Date(&quot;2018-12-31&quot;))
)

答案1

得分: 2

你可以使用which.max()函数:

library(dplyr)

d %>%
  summarise(across(ends_with("time"), ~ .x[which.max(get(sub(".time", "", cur_column())))]),
            across(ends_with("hl"), max),
            .by = pnr) %>%
  select(names(d))

#   pnr age.hl age.hl.time kon.hl kon.hl.time sen.hl sen.hl.time mix.hl mix.hl.time
# 1   A      1  2013-10-31      1  2011-02-01      1  2016-09-30      1  2013-10-31
# 2   B      0  2018-12-31      0  2018-12-31      1  2004-04-30      0  2018-12-31
# 3   C      1  2008-07-31      0  2018-12-31      1  2009-01-31      1  2006-01-17

注意:第一个across()函数中的值依赖于第二个across()函数中的值,所以这两个across()函数不能互换位置。

英文:

You can use which.max():

library(dplyr)

d %&gt;%
  summarise(across(ends_with(&quot;time&quot;), ~ .x[which.max(get(sub(&quot;.time&quot;, &quot;&quot;, cur_column())))]),
            across(ends_with(&quot;hl&quot;), max),
            .by = pnr) %&gt;%
  select(names(d))

#   pnr age.hl age.hl.time kon.hl kon.hl.time sen.hl sen.hl.time mix.hl mix.hl.time
# 1   A      1  2013-10-31      1  2011-02-01      1  2016-09-30      1  2013-10-31
# 2   B      0  2018-12-31      0  2018-12-31      1  2004-04-30      0  2018-12-31
# 3   C      1  2008-07-31      0  2018-12-31      1  2009-01-31      1  2006-01-17

Note: The function in the first across() involves the values in the second across(), so the two across() are not exchangeable.

huangapple
  • 本文由 发表于 2023年8月9日 17:51:23
  • 转载请务必保留本文链接:https://go.coder-hub.com/76866546.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定