在tidyverse中R中每个组的累积总和

huangapple go评论55阅读模式
英文:

Cumulative sum per group in tidyverse R

问题

我有面板数据,即每个家庭的重复观察。一个单位(家庭)会随着时间变化并展示一个特征(例如 variable)。我可以使用 group_by(id, year) 计算每年的组总和。如何得到类似 goal 列中的累积总和?在这个示例中,我需要结果保留10行,即不将数据合并到年份。如何选择每个单位每年只计算一个组总和来相加?

英文:

I have paneldata, i.e. repeated observations per household. A unit (household) is measured through time and exhibits a characteric (e.g. variable). I can calculate a group sum per year with group_by(id, year). How can I have a cummulative sum over time as in the goal column? I need the result to preserve 10 rows in this example, i.e. not collapse data to the years. How can I pick just one group-sum per year per unit to sum up?

set.seed(1234)
data <- data.frame(id = rep(100, 10),
                   year = c(rep(2022, 5), rep(2023, 5)),
                   variable = rbinom(10, 1, 0.5))

library(tidyverse)

data <- data %>% 
  group_by(id, year) %>% 
  mutate(group_sum_per_year = sum(variable)) 

data$goal <- c(4,4,4,4,4,7,7,7,7,7)

data

# A tibble: 10 × 5
# Groups:   id, year [2]
      id  year variable group_sum_per_year  goal
   <dbl> <dbl>    <int>              <int> <dbl>
 1   100  2022        0                  4     4
 2   100  2022        1                  4     4
 3   100  2022        1                  4     4
 4   100  2022        1                  4     4
 5   100  2022        1                  4     4
 6   100  2023        1                  3     7
 7   100  2023        0                  3     7
 8   100  2023        0                  3     7
 9   100  2023        1                  3     7
10   100  2023        1                  3     7

答案1

得分: 1

你可以首先创建一个临时列 hlp,该列仅对每个组的第一个条目等于 group_sum_per_year

然后,你可以按 id 分组并对 hlp 使用累积和:

data %>%
  group_by(id, year) %>%
  mutate(group_sum_per_year = sum(variable)) %>%
  mutate(hlp = if_else(1:n() == 1, group_sum_per_year, 0)) %>%
  group_by(id) %>%
  mutate(goal = cumsum(hlp))

一个 tibble: 10 × 6

组别: id [1]

  id  year variable group_sum_per_year   hlp  goal


1 100 2022 0 4 4 4
2 100 2022 1 4 0 4
3 100 2022 1 4 0 4
4 100 2022 1 4 0 4
5 100 2022 1 4 0 4
6 100 2023 1 3 3 7
7 100 2023 0 3 0 7
8 100 2023 0 3 0 7
9 100 2023 1 3 0 7
10 100 2023 1 3 0 7


<details>
<summary>英文:</summary>

You could first create a tempory column `hlp` which is equal to `group_sum_per_year` for only the first entry per group.

Then you could group by `id` and use cumsum on `hlp`:

    data %&gt;% 
      group_by(id, year) %&gt;% 
      mutate(group_sum_per_year = sum(variable)) %&gt;% 
      mutate(hlp = if_else(1:n() == 1, group_sum_per_year, 0)) %&gt;%
      group_by(id) %&gt;%
      mutate(goal = cumsum(hlp))

    # A tibble: 10 &#215; 6
    # Groups:   id [1]
          id  year variable group_sum_per_year   hlp  goal
       &lt;dbl&gt; &lt;dbl&gt;    &lt;int&gt;              &lt;int&gt; &lt;dbl&gt; &lt;dbl&gt;
     1   100  2022        0                  4     4     4
     2   100  2022        1                  4     0     4
     3   100  2022        1                  4     0     4
     4   100  2022        1                  4     0     4
     5   100  2022        1                  4     0     4
     6   100  2023        1                  3     3     7
     7   100  2023        0                  3     0     7
     8   100  2023        0                  3     0     7
     9   100  2023        1                  3     0     7
    10   100  2023        1                  3     0     7



</details>



# 答案2
**得分**: 1

更直观的方法是:

- 使用 `summarise` 而不是 `mutate`
- 使用 `right_join` 与原始数据框合并

```r
data %>% 
  group_by(id, year) %>% 
  summarise(group_sum_per_year = sum(variable)) %>% # 注意这里使用了 `summarise`
  group_by(id) %>% 
  mutate(goal = cumsum(group_sum_per_year)) %>% 
  right_join(data)
英文:

A more intuitive approach is to:

  • use summarise instead of mutate
  • merge with the original dataframe using right_join
data %&gt;% 
  group_by(id, year) %&gt;% 
  summarise(group_sum_per_year = sum(variable)) %&gt;% # Note the use of `summarise` here
  group_by(id) %&gt;% 
  mutate(goal = cumsum(group_sum_per_year)) %&gt;% 
  right_join(data)

huangapple
  • 本文由 发表于 2023年7月13日 19:35:37
  • 转载请务必保留本文链接:https://go.coder-hub.com/76678932.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定