累积总和基于R中的Subject ID

huangapple go评论64阅读模式
英文:

Cumulative sum based on Subject ID in R

问题

假设我们有一个名为 df 的数据框,它看起来像这样:

subjectid event football_year_baseline football_total
1 baseline 3 6
1 followup NA ??
2 baseline 0 0
2 followup NA ??
3 baseline 2 4

我试图填写 football_total 列,为了这个示例,让我们假设在基线行中的公式是 football_year_baseline * 2。

对于随访行,结果需要基于累积,意味着公式是基线中的 football_total + 2。应该使用 subjectidevent 来确定要将2添加到哪个基线值。

请注意:并非所有主题都有随访行。

因此,在第2行中的 football_total 将是 8 -> 6 + 2。

英文:

Assume we have dataframe df, it looks like this:

subjectid event football_year_baseline football_total
1 baseline 3 6
1 followup NA ??
2 baseline 0 0
2 followup NA ??
3 baseline 2 4

I'm trying to fill out the football_total column, for purposes of this example, let's assume that the formula is football_year_baseline * 2 in the baseline rows.

For the follow-up rows, the result needs to be cumulative based, meaning that the formula is football_total from baseline + 2. subjectid and event should be used to determine which baseline value to add 2 to.

Please note: not all subjects have a follow-up row.

So, football_row in row 2 would be 8 -> 6 + 2.

答案1

得分: 1

这应该适用于1行或多行的后续数据。它假定 "baseline" 已经是每个受试者中的第一行 - 如果不是,请先使用 arrange() 排列数据。

library(dplyr)
df |>
  mutate(
    fball_total = case_when(event == "baseline" ~ football_year_baseline * 2, TRUE ~ NA_integer_),
    fball_total = coalesce(fball_total, fball_total[1] + 2 * (row_number() - 1)),
    .by = subjectid
  )
#   subjectid    event football_year_baseline football_total fball_total
# 1         1 baseline                      3              6           6
# 2         1 followup                     NA             NA           8
# 3         2 baseline                      0              0           0
# 4         2 followup                     NA             NA           2
# 5         3 baseline                      2              4           4
英文:

This should work for 1 or more rows of follow-up. It assumes the "baseline" is already the first row in each subject - if not, arrange() the data first.

library(dplyr)
df |>
  mutate(
    fball_total = case_when(event == "baseline" ~ football_year_baseline * 2, TRUE ~ NA_integer_),
    fball_total = coalesce(fball_total, fball_total[1] + 2 * (row_number() - 1)),
    .by = subjectid
  )
#   subjectid    event football_year_baseline football_total fball_total
# 1         1 baseline                      3              6           6
# 2         1 followup                     NA             NA           8
# 3         2 baseline                      0              0           0
# 4         2 followup                     NA             NA           2
# 5         3 baseline                      2              4           4

答案2

得分: 1

鉴于你提到每个研究id最多只有一个后续操作,可以使用lagdplyr解决方案:

library(dplyr)

x %>% 
  mutate(football_total = case_when(
    event == "baseline" ~ football_total,
    event == "followup" ~ (lag(football_total) + 2)
  ), .by = subjectid)

输出:

  subjectid    event football_year_baseline football_total
1         1 baseline                      3              6
2         1 followup                     NA              8
3         2 baseline                      0              0
4         2 followup                     NA              2
5         3 baseline                      2              4

数据:

x <- read.table(text = "subjectid event football_year_baseline football_total
1 baseline 3 6
1 followup NA NA
2 baseline 0 0
2 followup NA NA
3 baseline 2 4", h = TRUE)

扩展示例

要在多列上执行此操作,假设采用相同的命名约定(即“xxx_total”),可以使用dplyr的across()contains()。下面我添加了两列,vball_baselinevball_total

x %>% 
  mutate(across(contains("total"), ~ case_when(
    event == "baseline" ~ .x,
    event == "followup" ~ (lag(.x) + 2)
  )), .by = subjectid)

扩展输出:

  subjectid    event football_year_baseline football_total vball_baseline vball_total
1         1 baseline                      3              6              1           2
2         1 followup                     NA              8             NA           4
3         2 baseline                      0              0              3           4
4         2 followup                     NA              2             NA           6
5         3 baseline                      2              4              5           6

扩展数据:

x <- read.table(text = "subjectid event football_year_baseline football_total vball_baseline vball_total
1 baseline 3 6 1 2
1 followup NA NA NA NA
2 baseline 0 0 3 4
2 followup NA NA NA NA
3 baseline 2 4 5 6", h = TRUE)
英文:

Since you mention there is at most one single follow up for each study id, an alternative dplyr solution using lag:

library(dplyr)

x %&gt;% 
  mutate(football_total = case_when(
    event == &quot;baseline&quot; ~ football_total,
    event == &quot;followup&quot; ~ (lag(football_total) + 2)
  ), .by = subjectid)

Output:

  subjectid    event football_year_baseline football_total
1         1 baseline                      3              6
2         1 followup                     NA              8
3         2 baseline                      0              0
4         2 followup                     NA              2
5         3 baseline                      2              4

Data

x &lt;- read.table(text = &quot;subjectid	event	football_year_baseline	football_total
1	baseline	3	6
1	followup	NA	NA
2	baseline	0	0
2	followup	NA	NA
3	baseline	2	4&quot;, h = TRUE)

Extended example

To do this across multiple columns, assuming the same nomenclature convention (i.e., "xxx_total"), you can use dplyr's across() and contains(). Below I added two columns, vball_baseline and vball_total:

x %&gt;% 
  mutate(across(contains(&quot;total&quot;), ~ case_when(
    event == &quot;baseline&quot; ~ .x,
    event == &quot;followup&quot; ~ (lag(.x) + 2)
  )), .by = subjectid)

Extended Output

  subjectid    event football_year_baseline football_total vball_baseline vball_total
1         1 baseline                      3              6              1           2
2         1 followup                     NA              8             NA           4
3         2 baseline                      0              0              3           4
4         2 followup                     NA              2             NA           6
5         3 baseline                      2              4              5           6

Extended Data:

x &lt;- read.table(text = &quot;subjectid   event   football_year_baseline  football_total vball_baseline vball_total
1   baseline    3   6 1 2
1   followup    NA  NA NA NA
2   baseline    0   0 3 4
2   followup    NA  NA NA NA
3   baseline    2   4 5 6&quot;, h = TRUE)

huangapple
  • 本文由 发表于 2023年5月31日 23:55:52
  • 转载请务必保留本文链接:https://go.coder-hub.com/76375343.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定