每第n个元素的滞后

huangapple go评论87阅读模式
英文:

Lag of every nth element

问题

我有一个数据框如下:

df <- data.frame(date = seq(from = as.Date("2000-01-01"), 
                            to = as.Date("2005-01-01"), by = 'month'))

df <- df %>% mutate(cumsum = seq(1, length.out = length(date)))

我想创建一个新列,该列是cumsum列中每12个值(一年前)的总和。

编辑
我喜欢你的两个答案!实际上,我刚刚发现了一个问题,对于我的解决方案(抱歉,我的解释不太清楚)。你的方法给我现在和一年前的值的总和。但是我有几年的数据,需要在以前的年份中累积所有观测值的总和(所以sum(x,lag(x,12),lag(x,24),lag(x,36)))。我尝试过类似于(rep(lag(cumsum,12),nrow(df)/12))。也许你可以帮忙。谢谢!

英文:

I have data frame as :

df &lt;- data.frame( date =seq(from = as.Date(&quot;2000-01-01&quot;), 
                              to =  as.Date(&quot;2005-01-01&quot;),&#39;month&#39;))


df &lt;-  df %&gt;% mutate(cumsum = seq(1, length.out = length(date)))

I want to create a new column, which is the sum of the value in cumsum and every 12th value (one year back).

EDIT:
I like both your answers! Actually I just found a problem for the solution for me (sorry my explanation was not quite clear.) Your approach gives me the sum of the value now and one year befor. But I do have seveal years and would need the cumsum of all overervation in previous years (so sum(x, lag(x,12), lag(x,24), lag (x,36)). I tried smth. like (rep(lag(cumsu, 12), nrow(df)/12). May you can help. Thanks!

答案1

得分: 3

你是否在寻找类似这样的内容?

这返回当前值与之前12个月值的总和。

library(dplyr)

df %>%
  mutate(rolling_12 = cumsum + lag(cumsum, n = 12))

日期 | 累计 | 滚动12
2000-01-01 | 1 | NA
2000-02-01 | 2 | NA
2000-03-01 | 3 | NA
2000-04-01 | 4 | NA
2000-05-01 | 5 | NA
2000-06-01 | 6 | NA
2000-07-01 | 7 | NA
2000-08-01 | 8 | NA
2000-09-01 | 9 | NA
2000-10-01 | 10 | NA
2000-11-01 | 11 | NA
2000-12-01 | 12 | NA
2001-01-01 | 13 | 14
2001-02-01 | 14 | 16
2001-03-01 | 15 | 18
...


<details>
<summary>英文:</summary>

Are you looking for something like this?

This returns the sum of the current value and the value 12 months prior.

library(dplyr)

df %>%
mutate(rolling_12 = cumsum + lag(cumsum, n = 12))

     date cumsum rolling_12

1 2000-01-01 1 NA
2 2000-02-01 2 NA
3 2000-03-01 3 NA
4 2000-04-01 4 NA
5 2000-05-01 5 NA
6 2000-06-01 6 NA
7 2000-07-01 7 NA
8 2000-08-01 8 NA
9 2000-09-01 9 NA
10 2000-10-01 10 NA
11 2000-11-01 11 NA
12 2000-12-01 12 NA
13 2001-01-01 13 14
14 2001-02-01 14 16
15 2001-03-01 15 18
...


</details>



# 答案2
**得分**: 2

文字方法是使用`lag`,如果您确保数据间隔完全相等,那么@Jamie的答案是最直接和最简单的方法。

然而,如果有可能没有所有中间月份,这可能导致滞后错误。防止这种情况的一种方法是与先前日期进行自连接。

```r
df2 <- df[-20,] # 仅用于施加一些缺失值
library(lubridate) # %m+%
df2 %>%
  mutate(
    # 这是更直接的方法,但在有缺失值的情况下会出现问题
    rolling_12 = cumsum + lag(cumsum, n = 12),
    lastyear = date %m+% years(-1)
  ) %>%
  left_join(df2, by = c("lastyear" = "date"), suffix = c("", "_12")) %>%
  mutate(cumsum_12 = cumsum + cumsum_12) %>%
  select(-lastyear)
#          date cumsum rolling_12 cumsum_12
# 1  2000-01-01      1         NA        NA
# 2  2000-02-01      2         NA        NA
# 3  2000-03-01      3         NA        NA
# 4  2000-04-01      4         NA        NA
# 5  2000-05-01      5         NA        NA
# 6  2000-06-01      6         NA        NA
# 7  2000-07-01      7         NA        NA
# 8  2000-08-01      8         NA        NA
# 9  2000-09-01      9         NA        NA
# 10 2000-10-01     10         NA        NA
# 11 2000-11-01     11         NA        NA
# 12 2000-12-01     12         NA        NA
# 13 2001-01-01     13         14        14
# 14 2001-02-01     14         16        16
# 15 2001-03-01     15         18        18
# 16 2001-04-01     16         20        20
# 17 2001-05-01     17         22        22
# 18 2001-06-01     18         24        24
# 19 2001-07-01     19         26        26
# 20 2001-09-01     21         29        30  <-- 这是rolling_12出错的地方
# 21 2001-10-01     22         31        32
# 22 2001-11-01     23         33        34
# 23 2001-12-01     24         35        36
# 24 2002-01-01     25         37        38
# 25 2002-02-01     26         39        40
# 26 2002-03-01     27         41        42
# 27 2002-04-01     28         43        44
# 28 2002-05-01     29         45        46
# 29 2002-06-01     30         47        48
# 30 2002-07-01     31         49        50
# 31 2002-08-01     32         51        NA
# 32 2002-09-01     33         54        54
# 33 2002-10-01     34         56        56
# 34 2002-11-01     35         58        58
# 35 2002-12-01     36         60        60
# 36 2003-01-01     37         62        62
# 37 2003-02-01     38         64        64
# 38 2003-03-01     39         66        66
# 39 2003-04-01     40         68        68
# 40 2003-05-01     41         70        70
# 41 2003-06-01     42         72        72
# 42 2003-07-01     43         74        74
# 43 2003-08-01     44         76        76
# 44 2003-09-01     45         78        78
# 45 2003-10-01     46         80        80
# 46 2003-11-01     47         82        82
# 47 2003-12-01     48         84        84
# 48 2004-01-01     49         86        86
# 49 2004-02-01     50         88        88
# 50 2004-03-01     51         90        90
# 51 2004-04-01     52         92        92
# 52 2004-05-01     53         94        94
# 53 2004-06-01     54         96        96
# 54 2004-07-01     55         98        98
# 55 2004-08-01     56        100       100
# 56 2004-09-01     57        102       102
# 57 2004-10-01     58        104       104
# 58 2004-11-01     59        106       106
# 59 2004-12-01     60        108       108
# 60 2005-01-01     61        110       110
英文:

The literal approach is to use lag, and if you are assured of perfectly-spaced data, then @Jamie's answer is the most direct and simplest approach.

However, if there is a chance that you don't have all intermediate months, this could lag incorrectly. One way to guard against this is to self-join with the previous date.

df2 &lt;- df[-20,] # just to impose some missingness
library(lubridate) # %m+%
df2 %&gt;%
  mutate(
    # this is the more direct route, but with missingness it glitches
    rolling_12 = cumsum + lag(cumsum, n = 12),
    lastyear = date %m+% years(-1)
  ) %&gt;%
  left_join(df2, by = c(&quot;lastyear&quot; = &quot;date&quot;), suffix = c(&quot;&quot;, &quot;_12&quot;)) %&gt;%
  mutate(cumsum_12 = cumsum + cumsum_12) %&gt;%
  select(-lastyear)
#          date cumsum rolling_12 cumsum_12
# 1  2000-01-01      1         NA        NA
# 2  2000-02-01      2         NA        NA
# 3  2000-03-01      3         NA        NA
# 4  2000-04-01      4         NA        NA
# 5  2000-05-01      5         NA        NA
# 6  2000-06-01      6         NA        NA
# 7  2000-07-01      7         NA        NA
# 8  2000-08-01      8         NA        NA
# 9  2000-09-01      9         NA        NA
# 10 2000-10-01     10         NA        NA
# 11 2000-11-01     11         NA        NA
# 12 2000-12-01     12         NA        NA
# 13 2001-01-01     13         14        14
# 14 2001-02-01     14         16        16
# 15 2001-03-01     15         18        18
# 16 2001-04-01     16         20        20
# 17 2001-05-01     17         22        22
# 18 2001-06-01     18         24        24
# 19 2001-07-01     19         26        26
# 20 2001-09-01     21         29        30  &lt;-- this is where rolling_12 goes wrong
# 21 2001-10-01     22         31        32
# 22 2001-11-01     23         33        34
# 23 2001-12-01     24         35        36
# 24 2002-01-01     25         37        38
# 25 2002-02-01     26         39        40
# 26 2002-03-01     27         41        42
# 27 2002-04-01     28         43        44
# 28 2002-05-01     29         45        46
# 29 2002-06-01     30         47        48
# 30 2002-07-01     31         49        50
# 31 2002-08-01     32         51        NA
# 32 2002-09-01     33         54        54
# 33 2002-10-01     34         56        56
# 34 2002-11-01     35         58        58
# 35 2002-12-01     36         60        60
# 36 2003-01-01     37         62        62
# 37 2003-02-01     38         64        64
# 38 2003-03-01     39         66        66
# 39 2003-04-01     40         68        68
# 40 2003-05-01     41         70        70
# 41 2003-06-01     42         72        72
# 42 2003-07-01     43         74        74
# 43 2003-08-01     44         76        76
# 44 2003-09-01     45         78        78
# 45 2003-10-01     46         80        80
# 46 2003-11-01     47         82        82
# 47 2003-12-01     48         84        84
# 48 2004-01-01     49         86        86
# 49 2004-02-01     50         88        88
# 50 2004-03-01     51         90        90
# 51 2004-04-01     52         92        92
# 52 2004-05-01     53         94        94
# 53 2004-06-01     54         96        96
# 54 2004-07-01     55         98        98
# 55 2004-08-01     56        100       100
# 56 2004-09-01     57        102       102
# 57 2004-10-01     58        104       104
# 58 2004-11-01     59        106       106
# 59 2004-12-01     60        108       108
# 60 2005-01-01     61        110       110

huangapple
  • 本文由 发表于 2023年6月2日 02:21:32
  • 转载请务必保留本文链接:https://go.coder-hub.com/76384686.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定