累积总和基于R中的Subject ID

huangapple go评论88阅读模式
英文:

Cumulative sum based on Subject ID in R

问题

假设我们有一个名为 df 的数据框,它看起来像这样:

subjectid event football_year_baseline football_total
1 baseline 3 6
1 followup NA ??
2 baseline 0 0
2 followup NA ??
3 baseline 2 4

我试图填写 football_total 列,为了这个示例,让我们假设在基线行中的公式是 football_year_baseline * 2。

对于随访行,结果需要基于累积,意味着公式是基线中的 football_total + 2。应该使用 subjectidevent 来确定要将2添加到哪个基线值。

请注意:并非所有主题都有随访行。

因此,在第2行中的 football_total 将是 8 -> 6 + 2。

英文:

Assume we have dataframe df, it looks like this:

subjectid event football_year_baseline football_total
1 baseline 3 6
1 followup NA ??
2 baseline 0 0
2 followup NA ??
3 baseline 2 4

I'm trying to fill out the football_total column, for purposes of this example, let's assume that the formula is football_year_baseline * 2 in the baseline rows.

For the follow-up rows, the result needs to be cumulative based, meaning that the formula is football_total from baseline + 2. subjectid and event should be used to determine which baseline value to add 2 to.

Please note: not all subjects have a follow-up row.

So, football_row in row 2 would be 8 -> 6 + 2.

答案1

得分: 1

这应该适用于1行或多行的后续数据。它假定 "baseline" 已经是每个受试者中的第一行 - 如果不是,请先使用 arrange() 排列数据。

  1. library(dplyr)
  2. df |>
  3. mutate(
  4. fball_total = case_when(event == "baseline" ~ football_year_baseline * 2, TRUE ~ NA_integer_),
  5. fball_total = coalesce(fball_total, fball_total[1] + 2 * (row_number() - 1)),
  6. .by = subjectid
  7. )
  8. # subjectid event football_year_baseline football_total fball_total
  9. # 1 1 baseline 3 6 6
  10. # 2 1 followup NA NA 8
  11. # 3 2 baseline 0 0 0
  12. # 4 2 followup NA NA 2
  13. # 5 3 baseline 2 4 4
英文:

This should work for 1 or more rows of follow-up. It assumes the "baseline" is already the first row in each subject - if not, arrange() the data first.

  1. library(dplyr)
  2. df |>
  3. mutate(
  4. fball_total = case_when(event == "baseline" ~ football_year_baseline * 2, TRUE ~ NA_integer_),
  5. fball_total = coalesce(fball_total, fball_total[1] + 2 * (row_number() - 1)),
  6. .by = subjectid
  7. )
  8. # subjectid event football_year_baseline football_total fball_total
  9. # 1 1 baseline 3 6 6
  10. # 2 1 followup NA NA 8
  11. # 3 2 baseline 0 0 0
  12. # 4 2 followup NA NA 2
  13. # 5 3 baseline 2 4 4

答案2

得分: 1

鉴于你提到每个研究id最多只有一个后续操作,可以使用lagdplyr解决方案:

  1. library(dplyr)
  2. x %>%
  3. mutate(football_total = case_when(
  4. event == "baseline" ~ football_total,
  5. event == "followup" ~ (lag(football_total) + 2)
  6. ), .by = subjectid)

输出:

  1. subjectid event football_year_baseline football_total
  2. 1 1 baseline 3 6
  3. 2 1 followup NA 8
  4. 3 2 baseline 0 0
  5. 4 2 followup NA 2
  6. 5 3 baseline 2 4

数据:

  1. x <- read.table(text = "subjectid event football_year_baseline football_total
  2. 1 baseline 3 6
  3. 1 followup NA NA
  4. 2 baseline 0 0
  5. 2 followup NA NA
  6. 3 baseline 2 4", h = TRUE)

扩展示例

要在多列上执行此操作,假设采用相同的命名约定(即“xxx_total”),可以使用dplyr的across()contains()。下面我添加了两列,vball_baselinevball_total

  1. x %>%
  2. mutate(across(contains("total"), ~ case_when(
  3. event == "baseline" ~ .x,
  4. event == "followup" ~ (lag(.x) + 2)
  5. )), .by = subjectid)

扩展输出:

  1. subjectid event football_year_baseline football_total vball_baseline vball_total
  2. 1 1 baseline 3 6 1 2
  3. 2 1 followup NA 8 NA 4
  4. 3 2 baseline 0 0 3 4
  5. 4 2 followup NA 2 NA 6
  6. 5 3 baseline 2 4 5 6

扩展数据:

  1. x <- read.table(text = "subjectid event football_year_baseline football_total vball_baseline vball_total
  2. 1 baseline 3 6 1 2
  3. 1 followup NA NA NA NA
  4. 2 baseline 0 0 3 4
  5. 2 followup NA NA NA NA
  6. 3 baseline 2 4 5 6", h = TRUE)
英文:

Since you mention there is at most one single follow up for each study id, an alternative dplyr solution using lag:

  1. library(dplyr)
  2. x %&gt;%
  3. mutate(football_total = case_when(
  4. event == &quot;baseline&quot; ~ football_total,
  5. event == &quot;followup&quot; ~ (lag(football_total) + 2)
  6. ), .by = subjectid)

Output:

  1. subjectid event football_year_baseline football_total
  2. 1 1 baseline 3 6
  3. 2 1 followup NA 8
  4. 3 2 baseline 0 0
  5. 4 2 followup NA 2
  6. 5 3 baseline 2 4

Data

  1. x &lt;- read.table(text = &quot;subjectid event football_year_baseline football_total
  2. 1 baseline 3 6
  3. 1 followup NA NA
  4. 2 baseline 0 0
  5. 2 followup NA NA
  6. 3 baseline 2 4&quot;, h = TRUE)

Extended example

To do this across multiple columns, assuming the same nomenclature convention (i.e., "xxx_total"), you can use dplyr's across() and contains(). Below I added two columns, vball_baseline and vball_total:

  1. x %&gt;%
  2. mutate(across(contains(&quot;total&quot;), ~ case_when(
  3. event == &quot;baseline&quot; ~ .x,
  4. event == &quot;followup&quot; ~ (lag(.x) + 2)
  5. )), .by = subjectid)

Extended Output

  1. subjectid event football_year_baseline football_total vball_baseline vball_total
  2. 1 1 baseline 3 6 1 2
  3. 2 1 followup NA 8 NA 4
  4. 3 2 baseline 0 0 3 4
  5. 4 2 followup NA 2 NA 6
  6. 5 3 baseline 2 4 5 6

Extended Data:

  1. x &lt;- read.table(text = &quot;subjectid event football_year_baseline football_total vball_baseline vball_total
  2. 1 baseline 3 6 1 2
  3. 1 followup NA NA NA NA
  4. 2 baseline 0 0 3 4
  5. 2 followup NA NA NA NA
  6. 3 baseline 2 4 5 6&quot;, h = TRUE)

huangapple
  • 本文由 发表于 2023年5月31日 23:55:52
  • 转载请务必保留本文链接:https://go.coder-hub.com/76375343.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定