英文:
Cumulative sum for specific range of dates
问题
I'm trying to calculate the rowise cumulative sum of Rates from DATE to DATE_following.
我试图计算从DATE到DATE_following的逐行累积总和Rates。
For example:
例如:
library(bizdays)
library(lubridate)
set.seed(1)
dat <- seq.Date(from = as.Date(as.Date("2023-04-06")- days(10)),
to = as.Date(as.Date("2023-04-06")),
by = "day") %>%
data.frame(DATE = .) %>%
mutate(Rates = sample(seq(from=1,to=10,by=1), size = length(DATE),replace=TRUE),
DATE_following = modified.following(DATE %m+% days(3)))
dat
DATE Rates DATE_following
1 2023-03-27 9 2023-03-30
2 2023-03-28 4 2023-03-31
3 2023-03-29 7 2023-04-01
4 2023-03-30 1 2023-04-02
5 2023-03-31 2 2023-04-03
6 2023-04-01 7 2023-04-04
7 2023-04-02 2 2023-04-05
8 2023-04-03 3 2023-04-06
9 2023-04-04 1 2023-04-07
10 2023-04-05 5 2023-04-08
11 2023-04-06 5 2023-04-09
The output i'm trying to get is:
我想要的输出是:
-
Result: 9+4+7+1 = 21 (the sum of Rates from 2023-03-27 to 2023-03-30 )
-
Result: 4+7+1+2 = 14 ...
-
结果:9+4+7+1 = 21(从2023-03-27到2023-03-30的Rates总和)
-
结果:4+7+1+2 = 14...
DATE Rates DATE_following Results
1 2023-03-27 9 2023-03-30 21
2 2023-03-28 4 2023-03-31 14
3 2023-03-29 7 2023-04-01 17
4 2023-03-30 1 2023-04-02 12
5 2023-03-31 2 2023-04-03 14
6 2023-04-01 7 2023-04-04 13
7 2023-04-02 2 2023-04-05 11
8 2023-04-03 3 2023-04-06 14
9 2023-04-04 1 2023-04-07 NA
10 2023-04-05 5 2023-04-08 NA
11 2023-04-06 5 2023-04-09 NA
Is it possible to get this result using dplyr functions like rowwise() and cumsum()? My main problem is that I don't know how to define this condition within these functions.
是否可以使用dplyr函数如rowwise()和cumsum()来获得这个结果?我的主要问题是不知道如何在这些函数内定义这个条件。
英文:
I'm trying to calculate the rowise cumulative sum of Rates from DATE to DATE_following.
For example:
library(tidyverse)
library(bizdays)
library(lubridate)
set.seed(1)
dat <- seq.Date(from = as.Date(as.Date("2023-04-06")- days(10)),
to = as.Date(as.Date("2023-04-06")),
by = "day") %>%
data.frame(DATE = .) %>%
mutate(Rates = sample(seq(from=1,to=10,by=1), size = length(DATE),replace=TRUE),
DATE_following = modified.following(DATE %m+% days(3)))
dat
DATE Rates DATE_following
1 2023-03-27 9 2023-03-30
2 2023-03-28 4 2023-03-31
3 2023-03-29 7 2023-04-01
4 2023-03-30 1 2023-04-02
5 2023-03-31 2 2023-04-03
6 2023-04-01 7 2023-04-04
7 2023-04-02 2 2023-04-05
8 2023-04-03 3 2023-04-06
9 2023-04-04 1 2023-04-07
10 2023-04-05 5 2023-04-08
11 2023-04-06 5 2023-04-09
The output i'm trying to get is:
- Result: 9+4+7+1 = 21 (the sum of Rates from 2023-03-27 to 2023-03-30 )
- Result: 4+7+1+2 = 14 ...
DATE Rates DATE_following Results
1 2023-03-27 9 2023-03-30 21
2 2023-03-28 4 2023-03-31 14
3 2023-03-29 7 2023-04-01 17
4 2023-03-30 1 2023-04-02 12
5 2023-03-31 2 2023-04-03 14
6 2023-04-01 7 2023-04-04 13
7 2023-04-02 2 2023-04-05 11
8 2023-04-03 3 2023-04-06 14
9 2023-04-04 1 2023-04-07 NA
10 2023-04-05 5 2023-04-08 NA
11 2023-04-06 5 2023-04-09 NA
Is it possible to get this result using dplyr functions like rowwise() and cumsum()? My main problem is that I don't know how to define this condition within these functions.
答案1
得分: 4
如果您想要对连续的四个Rates
进行滚动求和,您可以使用zoo
的rollsum()
函数:
library(dplyr)
library(zoo)
dat %>%
mutate(Result = rollsum(Rates, k = 4, fill = NA_real_, align = "left"))
这将返回:
# A tibble: 11 × 5
no DATE Rates DATE_following Result
<dbl> <date> <dbl> <date> <dbl>
1 1 2023-03-27 9 2023-03-30 21
2 2 2023-03-28 4 2023-03-31 14
3 3 2023-03-29 7 2023-04-01 17
4 4 2023-03-30 1 2023-04-02 12
5 5 2023-03-31 2 2023-04-03 14
6 6 2023-04-01 7 2023-04-04 13
7 7 2023-04-02 2 2023-04-05 11
8 8 2023-04-03 3 2023-04-06 14
9 9 2023-04-04 1 2023-04-07 NA
10 10 2023-04-05 5 2023-04-08 NA
11 11 2023-04-06 5 2023-04-09 NA
基于LeMarque的评论,还有一个稍微通用的答案:
dat2 %>%
mutate(days = as.integer(DATE_following - DATE) + 1,
res = rollapply(data = Rates, width = days, FUN = sum, align = "left", fill = NA_real_))
这将返回:
# A tibble: 11 × 6
no DATE Rates DATE_following days res
<dbl> <date> <dbl> <date> <dbl> <dbl>
1 1 2023-03-27 9 2023-03-30 4 21
2 2 2023-03-28 4 2023-03-31 4 14
3 3 2023-03-29 7 2023-04-01 4 17
4 4 2023-03-30 1 2023-04-02 4 12
5 5 2023-03-31 2 2023-04-10 11 NA
6 6 2023-04-01 7 2023-04-04 4 13
7 7 2023-04-02 2 2023-04-05 4 11
8 8 2023-04-03 3 2023-04-06 4 14
9 9 2023-04-04 1 2023-04-07 4 NA
10 10 2023-04-05 5 2023-04-08 4 NA
11 11 2023-04-06 5 2023-04-09 4 NA
由于第5行的DATE_following
在数据中不存在,此版本返回NA
。此外,这个版本不是对连续的四天进行求和,而是计算了DATE
和DATE_following
之间的天数,并将它们应用于滚动求和。
英文:
If you want a rolling sum for four consecutive Rates
, you could use zoo
s rollsum()
function:
library(dplyr)
library(zoo)
dat %>%
mutate(Result = rollsum(Rates, k = 4, fill = NA_real_, align = "left"))
This returns
# A tibble: 11 × 5
no DATE Rates DATE_following Result
<dbl> <date> <dbl> <date> <dbl>
1 1 2023-03-27 9 2023-03-30 21
2 2 2023-03-28 4 2023-03-31 14
3 3 2023-03-29 7 2023-04-01 17
4 4 2023-03-30 1 2023-04-02 12
5 5 2023-03-31 2 2023-04-03 14
6 6 2023-04-01 7 2023-04-04 13
7 7 2023-04-02 2 2023-04-05 11
8 8 2023-04-03 3 2023-04-06 14
9 9 2023-04-04 1 2023-04-07 NA
10 10 2023-04-05 5 2023-04-08 NA
11 11 2023-04-06 5 2023-04-09 NA
A slightly more general answer based on LeMarque's comment:
dat2 %>%
mutate(days = as.integer(DATE_following - DATE) + 1,
res = rollapply(data = Rates, width = days, FUN = sum, align = "left", fill = NA_real_))
This returns
# A tibble: 11 × 6
no DATE Rates DATE_following days res
<dbl> <date> <dbl> <date> <dbl> <dbl>
1 1 2023-03-27 9 2023-03-30 4 21
2 2 2023-03-28 4 2023-03-31 4 14
3 3 2023-03-29 7 2023-04-01 4 17
4 4 2023-03-30 1 2023-04-02 4 12
5 5 2023-03-31 2 2023-04-10 11 NA
6 6 2023-04-01 7 2023-04-04 4 13
7 7 2023-04-02 2 2023-04-05 4 11
8 8 2023-04-03 3 2023-04-06 4 14
9 9 2023-04-04 1 2023-04-07 4 NA
10 10 2023-04-05 5 2023-04-08 4 NA
11 11 2023-04-06 5 2023-04-09 4 NA
Since the DATE_following
in row 5 isn't present in the data, this version returns NA
. Furthermore this version doesn't sum four consecutive days but calculates the days between DATE
and DATE_following
and applies them to the rolling sum.
Data
dat <- structure(list(no = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11), DATE = structure(c(19443,
19444, 19445, 19446, 19447, 19448, 19449, 19450, 19451, 19452,
19453), class = "Date"), Rates = c(9, 4, 7, 1, 2, 7, 2, 3, 1,
5, 5), DATE_following = structure(c(19446, 19447, 19448, 19449,
19450, 19451, 19452, 19453, 19454, 19455, 19456), class = "Date")), class = c("spec_tbl_df",
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -11L), spec = structure(list(
cols = list(no = structure(list(), class = c("collector_double",
"collector")), DATE = structure(list(format = ""), class = c("collector_date",
"collector")), Rates = structure(list(), class = c("collector_double",
"collector")), DATE_following = structure(list(format = ""), class = c("collector_date",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), skip = 1L), class = "col_spec"))
dat2 <- structure(list(no = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11), DATE = structure(c(19443,
19444, 19445, 19446, 19447, 19448, 19449, 19450, 19451, 19452,
19453), class = "Date"), Rates = c(9, 4, 7, 1, 2, 7, 2, 3, 1,
5, 5), DATE_following = structure(c(19446, 19447, 19448, 19449,
19457, 19451, 19452, 19453, 19454, 19455, 19456), class = "Date")), class = c("spec_tbl_df",
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -11L), spec = structure(list(
cols = list(no = structure(list(), class = c("collector_double",
"collector")), DATE = structure(list(format = ""), class = c("collector_date",
"collector")), Rates = structure(list(), class = c("collector_double",
"collector")), DATE_following = structure(list(format = ""), class = c("collector_date",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), skip = 1L), class = "col_spec"))
答案2
得分: 2
这是您需要的内容:
library(zoo)
library(tidyverse)
set.seed(1)
dat <- seq.Date(from = as.Date(today()- days(10)), to = as.Date(today()), by = "day") %>%
data.frame(DATE = .) %>%
mutate(Rates = sample(seq(from=1,to=10,by=1), size = length(DATE),,replace=TRUE),
DATE_following = DATE %m+% days(3),
Results = rollapply(data = Rates, width = 4, FUN = sum, align = "left", fill = NA, partial = TRUE)) %>%
mutate( Results = ifelse(DATE_following %in% DATE, Results, NA))
dat
希望这对您有帮助。
英文:
Did you need something like this:
library(zoo)
library(tidyverse)
set.seed(1)
dat <- seq.Date(from = as.Date(today()- days(10)), to = as.Date(today()), by = "day") %>%
data.frame(DATE = .) %>%
mutate(Rates = sample(seq(from=1,to=10,by=1), size = length(DATE),,replace=TRUE),
DATE_following = DATE %m+% days(3),
Results = rollapply(data = Rates, width = 4, FUN = sum, align = "left", fill = NA, partial = TRUE)) %>%
mutate( Results = ifelse(DATE_following %in% DATE, Results, NA))
dat
which results in:
DATE Rates DATE_following Results
1 2023-03-27 3 2023-03-30 20
2 2023-03-28 3 2023-03-31 24
3 2023-03-29 8 2023-04-01 27
4 2023-03-30 6 2023-04-02 27
5 2023-03-31 7 2023-04-03 28
6 2023-04-01 6 2023-04-04 22
7 2023-04-02 8 2023-04-05 20
8 2023-04-03 7 2023-04-06 20
9 2023-04-04 1 2023-04-07 NA
10 2023-04-05 4 2023-04-08 NA
11 2023-04-06 8 2023-04-09 NA
Please check and let me know...
答案3
得分: 2
Sure, here are the translated code parts:
如果你想要一个滚动总和,你可以使用 filter
。
rev(filter(rev(dat$Rates), rep(1,4), side=1))
#rev(stats::filter(rev(dat$Rates), rep(1,4), side=1)) #如果使用了遮盖了 stats::filter 的 dplyr
# [1] 21 14 17 12 14 13 11 14 NA NA NA
如果需要匹配日期:
mapply(\(a,b) if(is.na(b)) NA else sum(dat$Rates[a:b]),
seq_len(nrow(dat)), match(dat$DATE_following, dat$DATE))
# [1] 21 14 17 12 14 13 11 14 NA NA NA
或者在日期没有排序且不需要匹配所有日期或无需匹配的情况下:
mapply(\(a,b) sum(dat$Rates[dat$DATE >= a & dat$DATE <= b]),
dat$DATE, dat$DATE_following)
# [1] 21 14 17 12 14 13 11 14 11 10 5
感谢 @Martin Gal 提供的数据!
英文:
If you want a rolling sum you can use filter
.
rev(filter(rev(dat$Rates), rep(1,4), side=1))
#rev(stats::filter(rev(dat$Rates), rep(1,4), side=1)) #In case using dplyr which is masking stats::filter
# [1] 21 14 17 12 14 13 11 14 NA NA NA
In case the dates should be matched:
mapply(\(a,b) if(is.na(b)) NA else sum(dat$Rates[a:b]),
seq_len(nrow(dat)), match(dat$DATE_following, dat$DATE))
# [1] 21 14 17 12 14 13 11 14 NA NA NA
Or in case it is not sorted and there is no need that all dates are present nor need to match:
mapply(\(a,b) sum(dat$Rates[dat$DATE >= a & dat$DATE <= b]),
+ dat$DATE, dat$DATE_following)
# [1] 21 14 17 12 14 13 11 14 11 10 5
Thanks to @Martin Gal for providing the data!
答案4
得分: 1
以下是 data.table
中的两种方法:
frollsum
方法
library(data.table)
setDT(dat)
dat[,
Results := frollsum(Rates, DATE_following - DATE + 1, adaptive = TRUE)
][,
Results := Results[order(is.na(Results))]
][]
non-equi join
方法
library(data.table)
setDT(dat)
dat[
dat[
dat,
on = .(DATE <= DATE_following)
][
DATE_following >= DATE,
.(Results = sum(Rates)),
i.DATE
],
Results := Results * (match(DATE_following, DATE) > 0),
on = .(DATE = i.DATE)
][]
输出
DATE Rates DATE_following Results
1: 2023-03-27 9 2023-03-30 21
2: 2023-03-28 4 2023-03-31 14
3: 2023-03-29 7 2023-04-01 17
4: 2023-03-30 1 2023-04-02 12
5: 2023-03-31 2 2023-04-03 14
6: 2023-04-01 7 2023-04-04 13
7: 2023-04-02 2 2023-04-05 11
8: 2023-04-03 3 2023-04-06 14
9: 2023-04-04 1 2023-04-07 NA
10: 2023-04-05 5 2023-04-08 NA
11: 2023-04-06 5 2023-04-09 NA
英文:
Here are some data.table
option
frollsum
approach
library(data.table)
setDT(dat)
dat[
,
Results := frollsum(Rates, DATE_following - DATE + 1, adaptive = TRUE)
][
,
Results := Results[order(is.na(Results))]
][]
non-equi join
approach
library(data.table)
setDT(dat)
dat[
dat[
dat,
on = .(DATE <= DATE_following)
][
DATE_following >= DATE,
.(Results = sum(Rates)),
i.DATE
],
Results := Results * (match(DATE_following, DATE) > 0),
on = .(DATE = i.DATE)
][]
Output
DATE Rates DATE_following Results
1: 2023-03-27 9 2023-03-30 21
2: 2023-03-28 4 2023-03-31 14
3: 2023-03-29 7 2023-04-01 17
4: 2023-03-30 1 2023-04-02 12
5: 2023-03-31 2 2023-04-03 14
6: 2023-04-01 7 2023-04-04 13
7: 2023-04-02 2 2023-04-05 11
8: 2023-04-03 3 2023-04-06 14
9: 2023-04-04 1 2023-04-07 NA
10: 2023-04-05 5 2023-04-08 NA
11: 2023-04-06 5 2023-04-09 NA
答案5
得分: 1
以下是代码的翻译部分:
# remotes::install_github("NicChr/timeplyr")
library(timeplyr)
library(dplyr)
library(lubridate)
set.seed(1)
dat <- seq.Date(from = as.Date(Sys.Date()- days(10)),
to = as.Date(Sys.Date()),
by = "day") %>%
data.frame(DATE = .) %>%
mutate(Rates = sample(seq(from=1,to=10,by=1), size = length(DATE),,replace=TRUE),
DATE_following =DATE %m+% days(3))
# Vectorised seq function
x1 <- time_seq_v(dat$DATE, dat$DATE_following, by = "days")
# Label these based on the relevant rows
x2 <- rep(seq_len(nrow(dat)), time_seq_len(dat$DATE, dat$DATE_following, by = "days"))
# Add these sequences to the data
dat <- dat %>%
mutate(dates = split(x1, x2))
# Sum
my_sum <- numeric(nrow(dat))
for (i in seq_len(nrow(dat))){
my_sum[[i]] <- sum(dat$Rates[dat$DATE %in% dat$dates[[i]]])
}
dat$Result <- my_sum
dat
希望这对您有所帮助。
英文:
A rolling sum would indeed be more efficient, but this could be a potential solution if your start and end points are dynamic.
# remotes::install_github("NicChr/timeplyr")
library(timeplyr)
library(dplyr)
library(lubridate)
set.seed(1)
dat <- seq.Date(from = as.Date(Sys.Date()- days(10)),
to = as.Date(Sys.Date()),
by = "day") %>%
data.frame(DATE = .) %>%
mutate(Rates = sample(seq(from=1,to=10,by=1), size = length(DATE),,replace=TRUE),
DATE_following =DATE %m+% days(3))
# Vectorised seq function
x1 <- time_seq_v(dat$DATE, dat$DATE_following, by = "days")
# Label these based on the relevant rows
x2 <- rep(seq_len(nrow(dat)), time_seq_len(dat$DATE, dat$DATE_following, by = "days"))
# Add these sequences to the data
dat <- dat %>%
as_tibble() %>%
mutate(dates = split(x1, x2))
# Sum
my_sum <- numeric(nrow(dat))
for (i in seq_len(nrow(dat))){
my_sum[[i]] <- sum(dat$Rates[dat$DATE %in% dat$dates[[i]]])
}
dat$Result <- my_sum
dat
#> # A tibble: 11 x 5
#> DATE Rates DATE_following dates Result
#> <date> <dbl> <date> <named list> <dbl>
#> 1 2023-03-27 9 2023-03-30 <date [4]> 21
#> 2 2023-03-28 4 2023-03-31 <date [4]> 14
#> 3 2023-03-29 7 2023-04-01 <date [4]> 17
#> 4 2023-03-30 1 2023-04-02 <date [4]> 12
#> 5 2023-03-31 2 2023-04-03 <date [4]> 14
#> 6 2023-04-01 7 2023-04-04 <date [4]> 13
#> 7 2023-04-02 2 2023-04-05 <date [4]> 11
#> 8 2023-04-03 3 2023-04-06 <date [4]> 14
#> 9 2023-04-04 1 2023-04-07 <date [4]> 11
#> 10 2023-04-05 5 2023-04-08 <date [4]> 10
#> 11 2023-04-06 5 2023-04-09 <date [4]> 5
<sup>Created on 2023-04-06 with reprex v2.0.2</sup>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论