根据条件在R中累积求和时添加0

huangapple go评论75阅读模式
英文:

Add 0 for cumulative sum based on condition in R

问题

以下是您提供的代码的翻译部分:

我有一个示例的data.table,我试图对每个“Type”按ID编码3个不同的累积和。我的数据表也按日期排序。

DT <- data.table(ID = c(A, A, A, A, B, B, B, C, C, C, C),
                 date = c("2008-01-01", "2009-01-01", "2010-01-01", "2011-01-01", "1978-01-01", "1982-01-01", "1985-01-01", "2001-01-01", "2011-01-01", "2015-01-01", "2019-01-01"),
                 Type = c(1, 2, 2, 3, 3, 1, 1, 2, 2, 1, 3),
                 Days = c(18, 333, 26, 57, 48, 10, 17, , 212, 55, 64, 18))

我尝试过:

DT[Type == 1 & order(date), cumdays1 := cumsum(Days), by = ID]
DT[Type == 2 & order(date), cumdays2 := cumsum(Days), by = ID]
DT[Type == 3 & order(date), cumdays3 := cumsum(Days), by = ID]

但是,这在计算不同于我正在计算的Type时会给我NA。

ID       date Type Days cumdays1 cumdays2 cumdays3
 1:  A 2008-01-01    1   18       18       NA       NA
 2:  A 2009-01-01    2  333       NA      333       NA
 3:  A 2010-01-01    2   26       NA      359       NA
 4:  A 2011-01-01    3   57       NA       NA       57
 5:  B 1978-01-01    3   48       NA       NA       48
 6:  B 1982-01-01    1   10       10       NA       NA
 7:  B 1985-01-01    1   17       27       NA       NA
 8:  C 2001-01-01    2  212       NA      212       NA
 9:  C 2011-01-01    2   55       NA      267       NA
10:  C 2015-01-01    1   64       64       NA       NA
11:  C 2019-01-01    3   18       NA       NA       18

相反,我希望继续使用Day=0来计算这些列。我知道我可以为每种Type编写单独的Day变量,但是否有更简单的代码一次完成这个任务?

我的期望输出:

ID       date Type Days cumdays1 cumdays2 cumdays3
 1:  A 2008-01-01    1   18       18        0        0
 2:  A 2009-01-01    2  333       18      333        0
 3:  A 2010-01-01    2   26       18      359        0
 4:  A 2011-01-01    3   57       18      359       57
 5:  B 1978-01-01    3   48        0        0       48
 6:  B 1982-01-01    1   10       10        0       48
 7:  B 1985-01-01    1   17       27        0       48
 8:  C 2001-01-01    2  212        0      212        0
 9:  C 2011-01-01    2   55        0      267        0
10:  C 2015-01-01    1   64       64      267        0
11:  C 2019-01-01    3   18       64      267       18

希望这能帮助您理解代码的翻译。如果您有其他问题,请随时提问。

英文:

I have an example data.table and I am trying to code 3 different cumulative sums by ID, one for each "Type". My data table is also sorted by date.

DT&lt;-data.table(ID=c(A,A,A,A,B,B,B,C,C,C,C),date=c(&quot;2008-01-01&quot;,&quot;2009-01-01&quot;,&quot;2010-01-01&quot;,&quot;2011-01-01&quot;,&quot;1978-01-01&quot;,&quot;1982-01-01&quot;,&quot;1985-01-01&quot;,&quot;2001-01-01&quot;,&quot;2011-01-01&quot;,&quot;2015-01-01&quot;,&quot;2019-01-01&quot;),Type=c(1,2,2,3,3,1,1,2,2,1,3),Days=c(18,333,26,57,48,10,17,,212,55,64,18))

I tried:

DT[Type==1&amp;order(date),cumdays1:=cumsum(Days),by=ID]
DT[Type==2&amp;order(date),cumdays2:=cumsum(Days),by=ID]
DT[Type==3&amp;order(date),cumdays3:=cumsum(Days),by=ID]

However, this gives me NA for when the Type is different to the one I was calculating.

    ID       date Type Days cumdays1 cumdays2 cumdays3
1:  A 2008-01-01    1   18       18       NA       NA
2:  A 2009-01-01    2  333       NA      333       NA
3:  A 2010-01-01    2   26       NA      359       NA
4:  A 2011-01-01    3   57       NA       NA       57
5:  B 1978-01-01    3   48       NA       NA       48
6:  B 1982-01-01    1   10       10       NA       NA
7:  B 1985-01-01    1   17       27       NA       NA
8:  C 2001-01-01    2  212       NA      212       NA
9:  C 2011-01-01    2   55       NA      267       NA
10:  C 2015-01-01    1   64       64       NA       NA
11:  C 2019-01-01    3   18       NA       NA       18

I would like instead to continue calculate cumulative sum using Day=0 for these columns. I know I could code a separate Day variable for each Type, but is there a simpler code to do this at once?

My desired output

    ID       date Type Days cumdays1 cumdays2 cumdays3
1:  A 2008-01-01    1   18       18       0        0
2:  A 2009-01-01    2  333       18      333       0
3:  A 2010-01-01    2   26       18      359       0
4:  A 2011-01-01    3   57       18      359       57
5:  B 1978-01-01    3   48       0        0        48
6:  B 1982-01-01    1   10       10       0        48
7:  B 1985-01-01    1   17       27       0        48
8:  C 2001-01-01    2  212       0        212      0
9:  C 2011-01-01    2   55       0        267      0
10:  C 2015-01-01    1   64       64       267      0
11:  C 2019-01-01    3   18       64       267      18

答案1

得分: 2

# 使用基础R:

transform(dt[order(dt$ID,dt$date),],
          cumdays = apply(Days * diag(max(Type))[Type,], 
                           2, \(x)ave(x,ID, FUN = cumsum)))

        ID       date Type Days cumdays.V1 cumdays.V2 cumdays.V3
     1:  A 2008-01-01    1   18         18          0          0
     2:  A 2009-01-01    2  333         18        333          0
     3:  A 2010-01-01    2   26         18        359          0
     4:  A 2011-01-01    3   57         18        359         57
     5:  B 1978-01-01    3   48          0          0         48
     6:  B 1982-01-01    1   10         10          0         48
     7:  B 1985-01-01    1   17         27          0         48
     8:  C 2001-01-01    2  212          0        212          0
     9:  C 2011-01-01    2   55          0        267          0
    10:  C 2015-01-01    1   64         64        267          0
    11:  C 2019-01-01    3   18         64        267         18

---

# 使用Tidyverse:

dt %>%
   arrange(ID, date) %>%
   mutate(name = Type) %>%
   pivot_wider(id_cols = c(ID, date, Type),
               names_prefix = 'cumdays',
               values_from = Days, values_fill = 0) %>%
   mutate(across(starts_with('cumdays'), cumsum), .by = ID)
    
# 一个 tibble: 11 × 6
   ID    date        Type cumdays1 cumdays2 cumdays3
   <chr> <chr>      <int>    <int>    <int>    <int>
 1 A     2008-01-01     1       18        0        0
 2 A     2009-01-01     2       18      333        0
 3 A     2010-01-01     2       18      359        0
 4 A     2011-01-01     3       18      359       57
 5 B     1978-01-01     3        0        0       48
 6 B     1982-01-01     1       10        0       48
 7 B     1985-01-01     1       27        0       48
 8 C     2001-01-01     2        0      212        0
 9 C     2011-01-01     2        0      267        0
10 C     2015-01-01     1       64      267        0
11 C     2019-01-01     3       64      267       18
英文:

Using Base R:

transform(dt[order(dt$ID,dt$date),],
cumdays = apply(Days * diag(max(Type))[Type,], 
2, \(x)ave(x,ID, FUN = cumsum)))
ID       date Type Days cumdays.V1 cumdays.V2 cumdays.V3
1:  A 2008-01-01    1   18         18          0          0
2:  A 2009-01-01    2  333         18        333          0
3:  A 2010-01-01    2   26         18        359          0
4:  A 2011-01-01    3   57         18        359         57
5:  B 1978-01-01    3   48          0          0         48
6:  B 1982-01-01    1   10         10          0         48
7:  B 1985-01-01    1   17         27          0         48
8:  C 2001-01-01    2  212          0        212          0
9:  C 2011-01-01    2   55          0        267          0
10:  C 2015-01-01    1   64         64        267          0
11:  C 2019-01-01    3   18         64        267         18

Tidyverse:

dt %&gt;%
arrange(ID, date) %&gt;%
mutate(name = Type) %&gt;%
pivot_wider(id_cols = c(ID, date, Type),
names_prefix = &#39;cumdays&#39;,
values_from = Days, values_fill = 0) %&gt;%
mutate(across(starts_with(&#39;cumdays&#39;), cumsum), .by = ID)
# A tibble: 11 &#215; 6
ID    date        Type cumdays1 cumdays2 cumdays3
&lt;chr&gt; &lt;chr&gt;      &lt;int&gt;    &lt;int&gt;    &lt;int&gt;    &lt;int&gt;
1 A     2008-01-01     1       18        0        0
2 A     2009-01-01     2       18      333        0
3 A     2010-01-01     2       18      359        0
4 A     2011-01-01     3       18      359       57
5 B     1978-01-01     3        0        0       48
6 B     1982-01-01     1       10        0       48
7 B     1985-01-01     1       27        0       48
8 C     2001-01-01     2        0      212        0
9 C     2011-01-01     2        0      267        0
10 C     2015-01-01     1       64      267        0
11 C     2019-01-01     3       64      267       18

答案2

得分: 1

# 一次性计算累积和,然后重新排列
DT[, cumdays := cumsum(Days), by = .(ID, Type)]
DT <- dcast(DT, ID + date + Days ~ Type, value.var = "cumdays")

cols <- 4:6

# 使用nafill函数填充缺失值
DT[, (cols) := lapply(.SD, nafill, type = "locf"), ID, .SDcols = cols]

# 使用setnafill函数以常数填充缺失值
setnafill(DT, type = "const", fill = 0L, cols = cols)

# 重命名列名
setnames(DT, 4:6, paste0("cumdays", 1:3))

# 输出结果
DT

#     ID       date Days cumdays1 cumdays2 cumdays3
# 1:   A 2008-01-01   18       18        0        0
# 2:   A 2009-01-01  333       18      333        0
# 3:   A 2010-01-01   26       18      359        0
# 4:   A 2011-01-01   57       18      359       57
# 5:   B 1978-01-01   48        0        0       48
# 6:   B 1982-01-01   10       10        0       48
# 7:   B 1985-01-01   17       27        0       48
# 8:   C 2001-01-01  212        0      212        0
# 9:   C 2011-01-01   55        0      267        0
# 10:  C 2015-01-01   64       64      267        0
# 11:  C 2019-01-01   18       64      267       18
英文:
# you can calculate the cumsums in once, then dcast them
DT[, cumdays := cumsum(Days), by = .(ID, Type)]
DT &lt;- dcast(DT, ID + date + Days ~ Type, value.var = &quot;cumdays&quot;)
cols &lt;- 4:6
DT[, (cols) := lapply(.SD, nafill, type = &quot;locf&quot;), ID, .SDcols = cols]
setnafill(DT, type = &quot;const&quot;, fill = 0L, cols = cols)
setnames(DT, 4:6, paste0(&quot;cumdays&quot;, 1:3))
DT
#     ID       date Days cumdays1 cumdays2 cumdays3
# 1:   A 2008-01-01   18       18        0        0
# 2:   A 2009-01-01  333       18      333        0
# 3:   A 2010-01-01   26       18      359        0
# 4:   A 2011-01-01   57       18      359       57
# 5:   B 1978-01-01   48        0        0       48
# 6:   B 1982-01-01   10       10        0       48
# 7:   B 1985-01-01   17       27        0       48
# 8:   C 2001-01-01  212        0      212        0
# 9:   C 2011-01-01   55        0      267        0
# 10:  C 2015-01-01   64       64      267        0
# 11:  C 2019-01-01   18       64      267       18

huangapple
  • 本文由 发表于 2023年7月6日 16:41:53
  • 转载请务必保留本文链接:https://go.coder-hub.com/76627013.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定