2023年5月31日 23:55:52go评论88阅读模式

英文:

Cumulative sum based on Subject ID in R

问题

假设我们有一个名为 df 的数据框，它看起来像这样：

subjectid	event	football_year_baseline	football_total
1	baseline	3	6
1	followup	`NA`	??
2	baseline	0	0
2	followup	`NA`	??
3	baseline	2	4

我试图填写 football_total 列，为了这个示例，让我们假设在基线行中的公式是 football_year_baseline * 2。

对于随访行，结果需要基于累积，意味着公式是基线中的 football_total + 2。应该使用 subjectid 和 event 来确定要将2添加到哪个基线值。

请注意：并非所有主题都有随访行。

因此，在第2行中的 football_total 将是 8 -> 6 + 2。

英文:

Assume we have dataframe df, it looks like this:

subjectid	event	football_year_baseline	football_total
1	baseline	3	6
1	followup	`NA`	??
2	baseline	0	0
2	followup	`NA`	??
3	baseline	2	4

I'm trying to fill out the football_total column, for purposes of this example, let's assume that the formula is football_year_baseline * 2 in the baseline rows.

For the follow-up rows, the result needs to be cumulative based, meaning that the formula is football_total from baseline + 2. subjectid and event should be used to determine which baseline value to add 2 to.

Please note: not all subjects have a follow-up row.

So, football_row in row 2 would be 8 -> 6 + 2.

答案1

得分: 1

这应该适用于1行或多行的后续数据。它假定 "baseline" 已经是每个受试者中的第一行 - 如果不是，请先使用 arrange() 排列数据。

library(dplyr)
df |&gt;
  mutate(
    fball_total = case_when(event == &quot;baseline&quot; ~ football_year_baseline * 2, TRUE ~ NA_integer_),
    fball_total = coalesce(fball_total, fball_total[1] + 2 * (row_number() - 1)),
    .by = subjectid
  )
#   subjectid    event football_year_baseline football_total fball_total
# 1         1 baseline                      3              6           6
# 2         1 followup                     NA             NA           8
# 3         2 baseline                      0              0           0
# 4         2 followup                     NA             NA           2
# 5         3 baseline                      2              4           4

英文:

This should work for 1 or more rows of follow-up. It assumes the "baseline" is already the first row in each subject - if not, arrange() the data first.

library(dplyr)
df |&gt;
  mutate(
    fball_total = case_when(event == &quot;baseline&quot; ~ football_year_baseline * 2, TRUE ~ NA_integer_),
    fball_total = coalesce(fball_total, fball_total[1] + 2 * (row_number() - 1)),
    .by = subjectid
  )
#   subjectid    event football_year_baseline football_total fball_total
# 1         1 baseline                      3              6           6
# 2         1 followup                     NA             NA           8
# 3         2 baseline                      0              0           0
# 4         2 followup                     NA             NA           2
# 5         3 baseline                      2              4           4

答案2

得分: 1

鉴于你提到每个研究id最多只有一个后续操作，可以使用lag的dplyr解决方案：

library(dplyr)
x %>% 
  mutate(football_total = case_when(
    event == "baseline" ~ football_total,
    event == "followup" ~ (lag(football_total) + 2)
  ), .by = subjectid)

输出：

  subjectid    event football_year_baseline football_total
1         1 baseline                      3              6
2         1 followup                     NA              8
3         2 baseline                      0              0
4         2 followup                     NA              2
5         3 baseline                      2              4

数据：

x <- read.table(text = "subjectid event football_year_baseline football_total
1 baseline 3 6
1 followup NA NA
2 baseline 0 0
2 followup NA NA
3 baseline 2 4", h = TRUE)

扩展示例

要在多列上执行此操作，假设采用相同的命名约定（即“xxx_total”），可以使用dplyr的across()和contains()。下面我添加了两列，vball_baseline和vball_total：

x %>% 
  mutate(across(contains("total"), ~ case_when(
    event == "baseline" ~ .x,
    event == "followup" ~ (lag(.x) + 2)
  )), .by = subjectid)

扩展输出：

  subjectid    event football_year_baseline football_total vball_baseline vball_total
1         1 baseline                      3              6              1           2
2         1 followup                     NA              8             NA           4
3         2 baseline                      0              0              3           4
4         2 followup                     NA              2             NA           6
5         3 baseline                      2              4              5           6

扩展数据：

x <- read.table(text = "subjectid event football_year_baseline football_total vball_baseline vball_total
1 baseline 3 6 1 2
1 followup NA NA NA NA
2 baseline 0 0 3 4
2 followup NA NA NA NA
3 baseline 2 4 5 6", h = TRUE)

英文:

Since you mention there is at most one single follow up for each study id, an alternative dplyr solution using lag:

library(dplyr)
x %&gt;% 
  mutate(football_total = case_when(
    event == &quot;baseline&quot; ~ football_total,
    event == &quot;followup&quot; ~ (lag(football_total) + 2)
  ), .by = subjectid)

Output:

  subjectid    event football_year_baseline football_total
1         1 baseline                      3              6
2         1 followup                     NA              8
3         2 baseline                      0              0
4         2 followup                     NA              2
5         3 baseline                      2              4

Data

x &lt;- read.table(text = &quot;subjectid	event	football_year_baseline	football_total
1	baseline	3	6
1	followup	NA	NA
2	baseline	0	0
2	followup	NA	NA
3	baseline	2	4&quot;, h = TRUE)

Extended example

To do this across multiple columns, assuming the same nomenclature convention (i.e., "xxx_total"), you can use dplyr's across() and contains(). Below I added two columns, vball_baseline and vball_total:

x %&gt;% 
  mutate(across(contains(&quot;total&quot;), ~ case_when(
    event == &quot;baseline&quot; ~ .x,
    event == &quot;followup&quot; ~ (lag(.x) + 2)
  )), .by = subjectid)

Extended Output

  subjectid    event football_year_baseline football_total vball_baseline vball_total
1         1 baseline                      3              6              1           2
2         1 followup                     NA              8             NA           4
3         2 baseline                      0              0              3           4
4         2 followup                     NA              2             NA           6
5         3 baseline                      2              4              5           6

Extended Data:

x &lt;- read.table(text = &quot;subjectid   event   football_year_baseline  football_total vball_baseline vball_total
1   baseline    3   6 1 2
1   followup    NA  NA NA NA
2   baseline    0   0 3 4
2   followup    NA  NA NA NA
3   baseline    2   4 5 6&quot;, h = TRUE)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

累积总和基于R中的Subject ID

问题

答案1

答案2

扩展示例

Extended example

Numbering rows within groups in a data frame, but in relation to the blocks of rows with the same value

MICE填补受限条件

R-Shiny，在Shiny模块内将动作按钮用于Leaflet弹出窗口。

将具有NAs的数值变量进行转换和四舍五入。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。