英文:
Lag of every nth element
问题
我有一个数据框如下:
df <- data.frame(date = seq(from = as.Date("2000-01-01"),
to = as.Date("2005-01-01"), by = 'month'))
df <- df %>% mutate(cumsum = seq(1, length.out = length(date)))
我想创建一个新列,该列是cumsum列中每12个值(一年前)的总和。
编辑:
我喜欢你的两个答案!实际上,我刚刚发现了一个问题,对于我的解决方案(抱歉,我的解释不太清楚)。你的方法给我现在和一年前的值的总和。但是我有几年的数据,需要在以前的年份中累积所有观测值的总和(所以sum(x,lag(x,12),lag(x,24),lag(x,36)))。我尝试过类似于(rep(lag(cumsum,12),nrow(df)/12))。也许你可以帮忙。谢谢!
英文:
I have data frame as :
df <- data.frame( date =seq(from = as.Date("2000-01-01"),
to = as.Date("2005-01-01"),'month'))
df <- df %>% mutate(cumsum = seq(1, length.out = length(date)))
I want to create a new column, which is the sum of the value in cumsum and every 12th value (one year back).
EDIT:
I like both your answers! Actually I just found a problem for the solution for me (sorry my explanation was not quite clear.) Your approach gives me the sum of the value now and one year befor. But I do have seveal years and would need the cumsum of all overervation in previous years (so sum(x, lag(x,12), lag(x,24), lag (x,36)). I tried smth. like (rep(lag(cumsu, 12), nrow(df)/12). May you can help. Thanks!
答案1
得分: 3
你是否在寻找类似这样的内容?
这返回当前值与之前12个月值的总和。
library(dplyr)
df %>%
mutate(rolling_12 = cumsum + lag(cumsum, n = 12))
日期 | 累计 | 滚动12
2000-01-01 | 1 | NA
2000-02-01 | 2 | NA
2000-03-01 | 3 | NA
2000-04-01 | 4 | NA
2000-05-01 | 5 | NA
2000-06-01 | 6 | NA
2000-07-01 | 7 | NA
2000-08-01 | 8 | NA
2000-09-01 | 9 | NA
2000-10-01 | 10 | NA
2000-11-01 | 11 | NA
2000-12-01 | 12 | NA
2001-01-01 | 13 | 14
2001-02-01 | 14 | 16
2001-03-01 | 15 | 18
...
<details>
<summary>英文:</summary>
Are you looking for something like this?
This returns the sum of the current value and the value 12 months prior.
library(dplyr)
df %>%
mutate(rolling_12 = cumsum + lag(cumsum, n = 12))
date cumsum rolling_12
1 2000-01-01 1 NA
2 2000-02-01 2 NA
3 2000-03-01 3 NA
4 2000-04-01 4 NA
5 2000-05-01 5 NA
6 2000-06-01 6 NA
7 2000-07-01 7 NA
8 2000-08-01 8 NA
9 2000-09-01 9 NA
10 2000-10-01 10 NA
11 2000-11-01 11 NA
12 2000-12-01 12 NA
13 2001-01-01 13 14
14 2001-02-01 14 16
15 2001-03-01 15 18
...
</details>
# 答案2
**得分**: 2
文字方法是使用`lag`,如果您确保数据间隔完全相等,那么@Jamie的答案是最直接和最简单的方法。
然而,如果有可能没有所有中间月份,这可能导致滞后错误。防止这种情况的一种方法是与先前日期进行自连接。
```r
df2 <- df[-20,] # 仅用于施加一些缺失值
library(lubridate) # %m+%
df2 %>%
mutate(
# 这是更直接的方法,但在有缺失值的情况下会出现问题
rolling_12 = cumsum + lag(cumsum, n = 12),
lastyear = date %m+% years(-1)
) %>%
left_join(df2, by = c("lastyear" = "date"), suffix = c("", "_12")) %>%
mutate(cumsum_12 = cumsum + cumsum_12) %>%
select(-lastyear)
# date cumsum rolling_12 cumsum_12
# 1 2000-01-01 1 NA NA
# 2 2000-02-01 2 NA NA
# 3 2000-03-01 3 NA NA
# 4 2000-04-01 4 NA NA
# 5 2000-05-01 5 NA NA
# 6 2000-06-01 6 NA NA
# 7 2000-07-01 7 NA NA
# 8 2000-08-01 8 NA NA
# 9 2000-09-01 9 NA NA
# 10 2000-10-01 10 NA NA
# 11 2000-11-01 11 NA NA
# 12 2000-12-01 12 NA NA
# 13 2001-01-01 13 14 14
# 14 2001-02-01 14 16 16
# 15 2001-03-01 15 18 18
# 16 2001-04-01 16 20 20
# 17 2001-05-01 17 22 22
# 18 2001-06-01 18 24 24
# 19 2001-07-01 19 26 26
# 20 2001-09-01 21 29 30 <-- 这是rolling_12出错的地方
# 21 2001-10-01 22 31 32
# 22 2001-11-01 23 33 34
# 23 2001-12-01 24 35 36
# 24 2002-01-01 25 37 38
# 25 2002-02-01 26 39 40
# 26 2002-03-01 27 41 42
# 27 2002-04-01 28 43 44
# 28 2002-05-01 29 45 46
# 29 2002-06-01 30 47 48
# 30 2002-07-01 31 49 50
# 31 2002-08-01 32 51 NA
# 32 2002-09-01 33 54 54
# 33 2002-10-01 34 56 56
# 34 2002-11-01 35 58 58
# 35 2002-12-01 36 60 60
# 36 2003-01-01 37 62 62
# 37 2003-02-01 38 64 64
# 38 2003-03-01 39 66 66
# 39 2003-04-01 40 68 68
# 40 2003-05-01 41 70 70
# 41 2003-06-01 42 72 72
# 42 2003-07-01 43 74 74
# 43 2003-08-01 44 76 76
# 44 2003-09-01 45 78 78
# 45 2003-10-01 46 80 80
# 46 2003-11-01 47 82 82
# 47 2003-12-01 48 84 84
# 48 2004-01-01 49 86 86
# 49 2004-02-01 50 88 88
# 50 2004-03-01 51 90 90
# 51 2004-04-01 52 92 92
# 52 2004-05-01 53 94 94
# 53 2004-06-01 54 96 96
# 54 2004-07-01 55 98 98
# 55 2004-08-01 56 100 100
# 56 2004-09-01 57 102 102
# 57 2004-10-01 58 104 104
# 58 2004-11-01 59 106 106
# 59 2004-12-01 60 108 108
# 60 2005-01-01 61 110 110
英文:
The literal approach is to use lag
, and if you are assured of perfectly-spaced data, then @Jamie's answer is the most direct and simplest approach.
However, if there is a chance that you don't have all intermediate months, this could lag incorrectly. One way to guard against this is to self-join with the previous date.
df2 <- df[-20,] # just to impose some missingness
library(lubridate) # %m+%
df2 %>%
mutate(
# this is the more direct route, but with missingness it glitches
rolling_12 = cumsum + lag(cumsum, n = 12),
lastyear = date %m+% years(-1)
) %>%
left_join(df2, by = c("lastyear" = "date"), suffix = c("", "_12")) %>%
mutate(cumsum_12 = cumsum + cumsum_12) %>%
select(-lastyear)
# date cumsum rolling_12 cumsum_12
# 1 2000-01-01 1 NA NA
# 2 2000-02-01 2 NA NA
# 3 2000-03-01 3 NA NA
# 4 2000-04-01 4 NA NA
# 5 2000-05-01 5 NA NA
# 6 2000-06-01 6 NA NA
# 7 2000-07-01 7 NA NA
# 8 2000-08-01 8 NA NA
# 9 2000-09-01 9 NA NA
# 10 2000-10-01 10 NA NA
# 11 2000-11-01 11 NA NA
# 12 2000-12-01 12 NA NA
# 13 2001-01-01 13 14 14
# 14 2001-02-01 14 16 16
# 15 2001-03-01 15 18 18
# 16 2001-04-01 16 20 20
# 17 2001-05-01 17 22 22
# 18 2001-06-01 18 24 24
# 19 2001-07-01 19 26 26
# 20 2001-09-01 21 29 30 <-- this is where rolling_12 goes wrong
# 21 2001-10-01 22 31 32
# 22 2001-11-01 23 33 34
# 23 2001-12-01 24 35 36
# 24 2002-01-01 25 37 38
# 25 2002-02-01 26 39 40
# 26 2002-03-01 27 41 42
# 27 2002-04-01 28 43 44
# 28 2002-05-01 29 45 46
# 29 2002-06-01 30 47 48
# 30 2002-07-01 31 49 50
# 31 2002-08-01 32 51 NA
# 32 2002-09-01 33 54 54
# 33 2002-10-01 34 56 56
# 34 2002-11-01 35 58 58
# 35 2002-12-01 36 60 60
# 36 2003-01-01 37 62 62
# 37 2003-02-01 38 64 64
# 38 2003-03-01 39 66 66
# 39 2003-04-01 40 68 68
# 40 2003-05-01 41 70 70
# 41 2003-06-01 42 72 72
# 42 2003-07-01 43 74 74
# 43 2003-08-01 44 76 76
# 44 2003-09-01 45 78 78
# 45 2003-10-01 46 80 80
# 46 2003-11-01 47 82 82
# 47 2003-12-01 48 84 84
# 48 2004-01-01 49 86 86
# 49 2004-02-01 50 88 88
# 50 2004-03-01 51 90 90
# 51 2004-04-01 52 92 92
# 52 2004-05-01 53 94 94
# 53 2004-06-01 54 96 96
# 54 2004-07-01 55 98 98
# 55 2004-08-01 56 100 100
# 56 2004-09-01 57 102 102
# 57 2004-10-01 58 104 104
# 58 2004-11-01 59 106 106
# 59 2004-12-01 60 108 108
# 60 2005-01-01 61 110 110
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论