每第n个元素的滞后

huangapple go评论106阅读模式
英文:

Lag of every nth element

问题

我有一个数据框如下:

  1. df <- data.frame(date = seq(from = as.Date("2000-01-01"),
  2. to = as.Date("2005-01-01"), by = 'month'))
  3. df <- df %>% mutate(cumsum = seq(1, length.out = length(date)))

我想创建一个新列,该列是cumsum列中每12个值(一年前)的总和。

编辑
我喜欢你的两个答案!实际上,我刚刚发现了一个问题,对于我的解决方案(抱歉,我的解释不太清楚)。你的方法给我现在和一年前的值的总和。但是我有几年的数据,需要在以前的年份中累积所有观测值的总和(所以sum(x,lag(x,12),lag(x,24),lag(x,36)))。我尝试过类似于(rep(lag(cumsum,12),nrow(df)/12))。也许你可以帮忙。谢谢!

英文:

I have data frame as :

  1. df &lt;- data.frame( date =seq(from = as.Date(&quot;2000-01-01&quot;),
  2. to = as.Date(&quot;2005-01-01&quot;),&#39;month&#39;))
  3. df &lt;- df %&gt;% mutate(cumsum = seq(1, length.out = length(date)))

I want to create a new column, which is the sum of the value in cumsum and every 12th value (one year back).

EDIT:
I like both your answers! Actually I just found a problem for the solution for me (sorry my explanation was not quite clear.) Your approach gives me the sum of the value now and one year befor. But I do have seveal years and would need the cumsum of all overervation in previous years (so sum(x, lag(x,12), lag(x,24), lag (x,36)). I tried smth. like (rep(lag(cumsu, 12), nrow(df)/12). May you can help. Thanks!

答案1

得分: 3

你是否在寻找类似这样的内容?

这返回当前值与之前12个月值的总和。

  1. library(dplyr)
  2. df %>%
  3. mutate(rolling_12 = cumsum + lag(cumsum, n = 12))

日期 | 累计 | 滚动12
2000-01-01 | 1 | NA
2000-02-01 | 2 | NA
2000-03-01 | 3 | NA
2000-04-01 | 4 | NA
2000-05-01 | 5 | NA
2000-06-01 | 6 | NA
2000-07-01 | 7 | NA
2000-08-01 | 8 | NA
2000-09-01 | 9 | NA
2000-10-01 | 10 | NA
2000-11-01 | 11 | NA
2000-12-01 | 12 | NA
2001-01-01 | 13 | 14
2001-02-01 | 14 | 16
2001-03-01 | 15 | 18
...

  1. <details>
  2. <summary>英文:</summary>
  3. Are you looking for something like this?
  4. This returns the sum of the current value and the value 12 months prior.

library(dplyr)

df %>%
mutate(rolling_12 = cumsum + lag(cumsum, n = 12))

  1. date cumsum rolling_12

1 2000-01-01 1 NA
2 2000-02-01 2 NA
3 2000-03-01 3 NA
4 2000-04-01 4 NA
5 2000-05-01 5 NA
6 2000-06-01 6 NA
7 2000-07-01 7 NA
8 2000-08-01 8 NA
9 2000-09-01 9 NA
10 2000-10-01 10 NA
11 2000-11-01 11 NA
12 2000-12-01 12 NA
13 2001-01-01 13 14
14 2001-02-01 14 16
15 2001-03-01 15 18
...

  1. </details>
  2. # 答案2
  3. **得分**: 2
  4. 文字方法是使用`lag`,如果您确保数据间隔完全相等,那么@Jamie的答案是最直接和最简单的方法。
  5. 然而,如果有可能没有所有中间月份,这可能导致滞后错误。防止这种情况的一种方法是与先前日期进行自连接。
  6. ```r
  7. df2 <- df[-20,] # 仅用于施加一些缺失值
  8. library(lubridate) # %m+%
  9. df2 %>%
  10. mutate(
  11. # 这是更直接的方法,但在有缺失值的情况下会出现问题
  12. rolling_12 = cumsum + lag(cumsum, n = 12),
  13. lastyear = date %m+% years(-1)
  14. ) %>%
  15. left_join(df2, by = c("lastyear" = "date"), suffix = c("", "_12")) %>%
  16. mutate(cumsum_12 = cumsum + cumsum_12) %>%
  17. select(-lastyear)
  18. # date cumsum rolling_12 cumsum_12
  19. # 1 2000-01-01 1 NA NA
  20. # 2 2000-02-01 2 NA NA
  21. # 3 2000-03-01 3 NA NA
  22. # 4 2000-04-01 4 NA NA
  23. # 5 2000-05-01 5 NA NA
  24. # 6 2000-06-01 6 NA NA
  25. # 7 2000-07-01 7 NA NA
  26. # 8 2000-08-01 8 NA NA
  27. # 9 2000-09-01 9 NA NA
  28. # 10 2000-10-01 10 NA NA
  29. # 11 2000-11-01 11 NA NA
  30. # 12 2000-12-01 12 NA NA
  31. # 13 2001-01-01 13 14 14
  32. # 14 2001-02-01 14 16 16
  33. # 15 2001-03-01 15 18 18
  34. # 16 2001-04-01 16 20 20
  35. # 17 2001-05-01 17 22 22
  36. # 18 2001-06-01 18 24 24
  37. # 19 2001-07-01 19 26 26
  38. # 20 2001-09-01 21 29 30 <-- 这是rolling_12出错的地方
  39. # 21 2001-10-01 22 31 32
  40. # 22 2001-11-01 23 33 34
  41. # 23 2001-12-01 24 35 36
  42. # 24 2002-01-01 25 37 38
  43. # 25 2002-02-01 26 39 40
  44. # 26 2002-03-01 27 41 42
  45. # 27 2002-04-01 28 43 44
  46. # 28 2002-05-01 29 45 46
  47. # 29 2002-06-01 30 47 48
  48. # 30 2002-07-01 31 49 50
  49. # 31 2002-08-01 32 51 NA
  50. # 32 2002-09-01 33 54 54
  51. # 33 2002-10-01 34 56 56
  52. # 34 2002-11-01 35 58 58
  53. # 35 2002-12-01 36 60 60
  54. # 36 2003-01-01 37 62 62
  55. # 37 2003-02-01 38 64 64
  56. # 38 2003-03-01 39 66 66
  57. # 39 2003-04-01 40 68 68
  58. # 40 2003-05-01 41 70 70
  59. # 41 2003-06-01 42 72 72
  60. # 42 2003-07-01 43 74 74
  61. # 43 2003-08-01 44 76 76
  62. # 44 2003-09-01 45 78 78
  63. # 45 2003-10-01 46 80 80
  64. # 46 2003-11-01 47 82 82
  65. # 47 2003-12-01 48 84 84
  66. # 48 2004-01-01 49 86 86
  67. # 49 2004-02-01 50 88 88
  68. # 50 2004-03-01 51 90 90
  69. # 51 2004-04-01 52 92 92
  70. # 52 2004-05-01 53 94 94
  71. # 53 2004-06-01 54 96 96
  72. # 54 2004-07-01 55 98 98
  73. # 55 2004-08-01 56 100 100
  74. # 56 2004-09-01 57 102 102
  75. # 57 2004-10-01 58 104 104
  76. # 58 2004-11-01 59 106 106
  77. # 59 2004-12-01 60 108 108
  78. # 60 2005-01-01 61 110 110
英文:

The literal approach is to use lag, and if you are assured of perfectly-spaced data, then @Jamie's answer is the most direct and simplest approach.

However, if there is a chance that you don't have all intermediate months, this could lag incorrectly. One way to guard against this is to self-join with the previous date.

  1. df2 &lt;- df[-20,] # just to impose some missingness
  2. library(lubridate) # %m+%
  3. df2 %&gt;%
  4. mutate(
  5. # this is the more direct route, but with missingness it glitches
  6. rolling_12 = cumsum + lag(cumsum, n = 12),
  7. lastyear = date %m+% years(-1)
  8. ) %&gt;%
  9. left_join(df2, by = c(&quot;lastyear&quot; = &quot;date&quot;), suffix = c(&quot;&quot;, &quot;_12&quot;)) %&gt;%
  10. mutate(cumsum_12 = cumsum + cumsum_12) %&gt;%
  11. select(-lastyear)
  12. # date cumsum rolling_12 cumsum_12
  13. # 1 2000-01-01 1 NA NA
  14. # 2 2000-02-01 2 NA NA
  15. # 3 2000-03-01 3 NA NA
  16. # 4 2000-04-01 4 NA NA
  17. # 5 2000-05-01 5 NA NA
  18. # 6 2000-06-01 6 NA NA
  19. # 7 2000-07-01 7 NA NA
  20. # 8 2000-08-01 8 NA NA
  21. # 9 2000-09-01 9 NA NA
  22. # 10 2000-10-01 10 NA NA
  23. # 11 2000-11-01 11 NA NA
  24. # 12 2000-12-01 12 NA NA
  25. # 13 2001-01-01 13 14 14
  26. # 14 2001-02-01 14 16 16
  27. # 15 2001-03-01 15 18 18
  28. # 16 2001-04-01 16 20 20
  29. # 17 2001-05-01 17 22 22
  30. # 18 2001-06-01 18 24 24
  31. # 19 2001-07-01 19 26 26
  32. # 20 2001-09-01 21 29 30 &lt;-- this is where rolling_12 goes wrong
  33. # 21 2001-10-01 22 31 32
  34. # 22 2001-11-01 23 33 34
  35. # 23 2001-12-01 24 35 36
  36. # 24 2002-01-01 25 37 38
  37. # 25 2002-02-01 26 39 40
  38. # 26 2002-03-01 27 41 42
  39. # 27 2002-04-01 28 43 44
  40. # 28 2002-05-01 29 45 46
  41. # 29 2002-06-01 30 47 48
  42. # 30 2002-07-01 31 49 50
  43. # 31 2002-08-01 32 51 NA
  44. # 32 2002-09-01 33 54 54
  45. # 33 2002-10-01 34 56 56
  46. # 34 2002-11-01 35 58 58
  47. # 35 2002-12-01 36 60 60
  48. # 36 2003-01-01 37 62 62
  49. # 37 2003-02-01 38 64 64
  50. # 38 2003-03-01 39 66 66
  51. # 39 2003-04-01 40 68 68
  52. # 40 2003-05-01 41 70 70
  53. # 41 2003-06-01 42 72 72
  54. # 42 2003-07-01 43 74 74
  55. # 43 2003-08-01 44 76 76
  56. # 44 2003-09-01 45 78 78
  57. # 45 2003-10-01 46 80 80
  58. # 46 2003-11-01 47 82 82
  59. # 47 2003-12-01 48 84 84
  60. # 48 2004-01-01 49 86 86
  61. # 49 2004-02-01 50 88 88
  62. # 50 2004-03-01 51 90 90
  63. # 51 2004-04-01 52 92 92
  64. # 52 2004-05-01 53 94 94
  65. # 53 2004-06-01 54 96 96
  66. # 54 2004-07-01 55 98 98
  67. # 55 2004-08-01 56 100 100
  68. # 56 2004-09-01 57 102 102
  69. # 57 2004-10-01 58 104 104
  70. # 58 2004-11-01 59 106 106
  71. # 59 2004-12-01 60 108 108
  72. # 60 2005-01-01 61 110 110

huangapple
  • 本文由 发表于 2023年6月2日 02:21:32
  • 转载请务必保留本文链接:https://go.coder-hub.com/76384686.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定