根据条件在R中累积求和时添加0

huangapple go评论110阅读模式
英文:

Add 0 for cumulative sum based on condition in R

问题

以下是您提供的代码的翻译部分:

  1. 我有一个示例的data.table,我试图对每个“Type”按ID编码3个不同的累积和。我的数据表也按日期排序。
  2. DT <- data.table(ID = c(A, A, A, A, B, B, B, C, C, C, C),
  3. date = c("2008-01-01", "2009-01-01", "2010-01-01", "2011-01-01", "1978-01-01", "1982-01-01", "1985-01-01", "2001-01-01", "2011-01-01", "2015-01-01", "2019-01-01"),
  4. Type = c(1, 2, 2, 3, 3, 1, 1, 2, 2, 1, 3),
  5. Days = c(18, 333, 26, 57, 48, 10, 17, , 212, 55, 64, 18))
  6. 我尝试过:
  7. DT[Type == 1 & order(date), cumdays1 := cumsum(Days), by = ID]
  8. DT[Type == 2 & order(date), cumdays2 := cumsum(Days), by = ID]
  9. DT[Type == 3 & order(date), cumdays3 := cumsum(Days), by = ID]
  10. 但是,这在计算不同于我正在计算的Type时会给我NA
  11. ID date Type Days cumdays1 cumdays2 cumdays3
  12. 1: A 2008-01-01 1 18 18 NA NA
  13. 2: A 2009-01-01 2 333 NA 333 NA
  14. 3: A 2010-01-01 2 26 NA 359 NA
  15. 4: A 2011-01-01 3 57 NA NA 57
  16. 5: B 1978-01-01 3 48 NA NA 48
  17. 6: B 1982-01-01 1 10 10 NA NA
  18. 7: B 1985-01-01 1 17 27 NA NA
  19. 8: C 2001-01-01 2 212 NA 212 NA
  20. 9: C 2011-01-01 2 55 NA 267 NA
  21. 10: C 2015-01-01 1 64 64 NA NA
  22. 11: C 2019-01-01 3 18 NA NA 18
  23. 相反,我希望继续使用Day=0来计算这些列。我知道我可以为每种Type编写单独的Day变量,但是否有更简单的代码一次完成这个任务?
  24. 我的期望输出:
  25. ID date Type Days cumdays1 cumdays2 cumdays3
  26. 1: A 2008-01-01 1 18 18 0 0
  27. 2: A 2009-01-01 2 333 18 333 0
  28. 3: A 2010-01-01 2 26 18 359 0
  29. 4: A 2011-01-01 3 57 18 359 57
  30. 5: B 1978-01-01 3 48 0 0 48
  31. 6: B 1982-01-01 1 10 10 0 48
  32. 7: B 1985-01-01 1 17 27 0 48
  33. 8: C 2001-01-01 2 212 0 212 0
  34. 9: C 2011-01-01 2 55 0 267 0
  35. 10: C 2015-01-01 1 64 64 267 0
  36. 11: C 2019-01-01 3 18 64 267 18

希望这能帮助您理解代码的翻译。如果您有其他问题,请随时提问。

英文:

I have an example data.table and I am trying to code 3 different cumulative sums by ID, one for each "Type". My data table is also sorted by date.

  1. DT&lt;-data.table(ID=c(A,A,A,A,B,B,B,C,C,C,C),date=c(&quot;2008-01-01&quot;,&quot;2009-01-01&quot;,&quot;2010-01-01&quot;,&quot;2011-01-01&quot;,&quot;1978-01-01&quot;,&quot;1982-01-01&quot;,&quot;1985-01-01&quot;,&quot;2001-01-01&quot;,&quot;2011-01-01&quot;,&quot;2015-01-01&quot;,&quot;2019-01-01&quot;),Type=c(1,2,2,3,3,1,1,2,2,1,3),Days=c(18,333,26,57,48,10,17,,212,55,64,18))

I tried:

  1. DT[Type==1&amp;order(date),cumdays1:=cumsum(Days),by=ID]
  2. DT[Type==2&amp;order(date),cumdays2:=cumsum(Days),by=ID]
  3. DT[Type==3&amp;order(date),cumdays3:=cumsum(Days),by=ID]

However, this gives me NA for when the Type is different to the one I was calculating.

  1. ID date Type Days cumdays1 cumdays2 cumdays3
  2. 1: A 2008-01-01 1 18 18 NA NA
  3. 2: A 2009-01-01 2 333 NA 333 NA
  4. 3: A 2010-01-01 2 26 NA 359 NA
  5. 4: A 2011-01-01 3 57 NA NA 57
  6. 5: B 1978-01-01 3 48 NA NA 48
  7. 6: B 1982-01-01 1 10 10 NA NA
  8. 7: B 1985-01-01 1 17 27 NA NA
  9. 8: C 2001-01-01 2 212 NA 212 NA
  10. 9: C 2011-01-01 2 55 NA 267 NA
  11. 10: C 2015-01-01 1 64 64 NA NA
  12. 11: C 2019-01-01 3 18 NA NA 18

I would like instead to continue calculate cumulative sum using Day=0 for these columns. I know I could code a separate Day variable for each Type, but is there a simpler code to do this at once?

My desired output

  1. ID date Type Days cumdays1 cumdays2 cumdays3
  2. 1: A 2008-01-01 1 18 18 0 0
  3. 2: A 2009-01-01 2 333 18 333 0
  4. 3: A 2010-01-01 2 26 18 359 0
  5. 4: A 2011-01-01 3 57 18 359 57
  6. 5: B 1978-01-01 3 48 0 0 48
  7. 6: B 1982-01-01 1 10 10 0 48
  8. 7: B 1985-01-01 1 17 27 0 48
  9. 8: C 2001-01-01 2 212 0 212 0
  10. 9: C 2011-01-01 2 55 0 267 0
  11. 10: C 2015-01-01 1 64 64 267 0
  12. 11: C 2019-01-01 3 18 64 267 18

答案1

得分: 2

  1. # 使用基础R:
  2. transform(dt[order(dt$ID,dt$date),],
  3. cumdays = apply(Days * diag(max(Type))[Type,],
  4. 2, \(x)ave(x,ID, FUN = cumsum)))
  5. ID date Type Days cumdays.V1 cumdays.V2 cumdays.V3
  6. 1: A 2008-01-01 1 18 18 0 0
  7. 2: A 2009-01-01 2 333 18 333 0
  8. 3: A 2010-01-01 2 26 18 359 0
  9. 4: A 2011-01-01 3 57 18 359 57
  10. 5: B 1978-01-01 3 48 0 0 48
  11. 6: B 1982-01-01 1 10 10 0 48
  12. 7: B 1985-01-01 1 17 27 0 48
  13. 8: C 2001-01-01 2 212 0 212 0
  14. 9: C 2011-01-01 2 55 0 267 0
  15. 10: C 2015-01-01 1 64 64 267 0
  16. 11: C 2019-01-01 3 18 64 267 18
  17. ---
  18. # 使用Tidyverse:
  19. dt %>%
  20. arrange(ID, date) %>%
  21. mutate(name = Type) %>%
  22. pivot_wider(id_cols = c(ID, date, Type),
  23. names_prefix = 'cumdays',
  24. values_from = Days, values_fill = 0) %>%
  25. mutate(across(starts_with('cumdays'), cumsum), .by = ID)
  26. # 一个 tibble: 11 × 6
  27. ID date Type cumdays1 cumdays2 cumdays3
  28. <chr> <chr> <int> <int> <int> <int>
  29. 1 A 2008-01-01 1 18 0 0
  30. 2 A 2009-01-01 2 18 333 0
  31. 3 A 2010-01-01 2 18 359 0
  32. 4 A 2011-01-01 3 18 359 57
  33. 5 B 1978-01-01 3 0 0 48
  34. 6 B 1982-01-01 1 10 0 48
  35. 7 B 1985-01-01 1 27 0 48
  36. 8 C 2001-01-01 2 0 212 0
  37. 9 C 2011-01-01 2 0 267 0
  38. 10 C 2015-01-01 1 64 267 0
  39. 11 C 2019-01-01 3 64 267 18
英文:

Using Base R:

  1. transform(dt[order(dt$ID,dt$date),],
  2. cumdays = apply(Days * diag(max(Type))[Type,],
  3. 2, \(x)ave(x,ID, FUN = cumsum)))
  4. ID date Type Days cumdays.V1 cumdays.V2 cumdays.V3
  5. 1: A 2008-01-01 1 18 18 0 0
  6. 2: A 2009-01-01 2 333 18 333 0
  7. 3: A 2010-01-01 2 26 18 359 0
  8. 4: A 2011-01-01 3 57 18 359 57
  9. 5: B 1978-01-01 3 48 0 0 48
  10. 6: B 1982-01-01 1 10 10 0 48
  11. 7: B 1985-01-01 1 17 27 0 48
  12. 8: C 2001-01-01 2 212 0 212 0
  13. 9: C 2011-01-01 2 55 0 267 0
  14. 10: C 2015-01-01 1 64 64 267 0
  15. 11: C 2019-01-01 3 18 64 267 18

Tidyverse:

  1. dt %&gt;%
  2. arrange(ID, date) %&gt;%
  3. mutate(name = Type) %&gt;%
  4. pivot_wider(id_cols = c(ID, date, Type),
  5. names_prefix = &#39;cumdays&#39;,
  6. values_from = Days, values_fill = 0) %&gt;%
  7. mutate(across(starts_with(&#39;cumdays&#39;), cumsum), .by = ID)
  8. # A tibble: 11 &#215; 6
  9. ID date Type cumdays1 cumdays2 cumdays3
  10. &lt;chr&gt; &lt;chr&gt; &lt;int&gt; &lt;int&gt; &lt;int&gt; &lt;int&gt;
  11. 1 A 2008-01-01 1 18 0 0
  12. 2 A 2009-01-01 2 18 333 0
  13. 3 A 2010-01-01 2 18 359 0
  14. 4 A 2011-01-01 3 18 359 57
  15. 5 B 1978-01-01 3 0 0 48
  16. 6 B 1982-01-01 1 10 0 48
  17. 7 B 1985-01-01 1 27 0 48
  18. 8 C 2001-01-01 2 0 212 0
  19. 9 C 2011-01-01 2 0 267 0
  20. 10 C 2015-01-01 1 64 267 0
  21. 11 C 2019-01-01 3 64 267 18

答案2

得分: 1

  1. # 一次性计算累积和,然后重新排列
  2. DT[, cumdays := cumsum(Days), by = .(ID, Type)]
  3. DT <- dcast(DT, ID + date + Days ~ Type, value.var = "cumdays")
  4. cols <- 4:6
  5. # 使用nafill函数填充缺失值
  6. DT[, (cols) := lapply(.SD, nafill, type = "locf"), ID, .SDcols = cols]
  7. # 使用setnafill函数以常数填充缺失值
  8. setnafill(DT, type = "const", fill = 0L, cols = cols)
  9. # 重命名列名
  10. setnames(DT, 4:6, paste0("cumdays", 1:3))
  11. # 输出结果
  12. DT
  13. # ID date Days cumdays1 cumdays2 cumdays3
  14. # 1: A 2008-01-01 18 18 0 0
  15. # 2: A 2009-01-01 333 18 333 0
  16. # 3: A 2010-01-01 26 18 359 0
  17. # 4: A 2011-01-01 57 18 359 57
  18. # 5: B 1978-01-01 48 0 0 48
  19. # 6: B 1982-01-01 10 10 0 48
  20. # 7: B 1985-01-01 17 27 0 48
  21. # 8: C 2001-01-01 212 0 212 0
  22. # 9: C 2011-01-01 55 0 267 0
  23. # 10: C 2015-01-01 64 64 267 0
  24. # 11: C 2019-01-01 18 64 267 18
英文:
  1. # you can calculate the cumsums in once, then dcast them
  2. DT[, cumdays := cumsum(Days), by = .(ID, Type)]
  3. DT &lt;- dcast(DT, ID + date + Days ~ Type, value.var = &quot;cumdays&quot;)
  4. cols &lt;- 4:6
  5. DT[, (cols) := lapply(.SD, nafill, type = &quot;locf&quot;), ID, .SDcols = cols]
  6. setnafill(DT, type = &quot;const&quot;, fill = 0L, cols = cols)
  7. setnames(DT, 4:6, paste0(&quot;cumdays&quot;, 1:3))
  8. DT
  9. # ID date Days cumdays1 cumdays2 cumdays3
  10. # 1: A 2008-01-01 18 18 0 0
  11. # 2: A 2009-01-01 333 18 333 0
  12. # 3: A 2010-01-01 26 18 359 0
  13. # 4: A 2011-01-01 57 18 359 57
  14. # 5: B 1978-01-01 48 0 0 48
  15. # 6: B 1982-01-01 10 10 0 48
  16. # 7: B 1985-01-01 17 27 0 48
  17. # 8: C 2001-01-01 212 0 212 0
  18. # 9: C 2011-01-01 55 0 267 0
  19. # 10: C 2015-01-01 64 64 267 0
  20. # 11: C 2019-01-01 18 64 267 18

huangapple
  • 本文由 发表于 2023年7月6日 16:41:53
  • 转载请务必保留本文链接:https://go.coder-hub.com/76627013.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定