累积日期范围内的总和

huangapple go评论59阅读模式
英文:

Cumulative sum for specific range of dates

问题

I'm trying to calculate the rowise cumulative sum of Rates from DATE to DATE_following.

我试图计算从DATE到DATE_following的逐行累积总和Rates。

For example:

例如:

library(bizdays)
library(lubridate)

set.seed(1)
dat <- seq.Date(from = as.Date(as.Date("2023-04-06")- days(10)),
                to = as.Date(as.Date("2023-04-06")),
                by = "day") %>% 
  data.frame(DATE = .) %>% 
  mutate(Rates = sample(seq(from=1,to=10,by=1), size = length(DATE),replace=TRUE),
         DATE_following = modified.following(DATE %m+% days(3)))

dat
        DATE Rates DATE_following
1  2023-03-27     9     2023-03-30
2  2023-03-28     4     2023-03-31
3  2023-03-29     7     2023-04-01
4  2023-03-30     1     2023-04-02
5  2023-03-31     2     2023-04-03
6  2023-04-01     7     2023-04-04
7  2023-04-02     2     2023-04-05
8  2023-04-03     3     2023-04-06
9  2023-04-04     1     2023-04-07
10 2023-04-05     5     2023-04-08
11 2023-04-06     5     2023-04-09

The output i'm trying to get is:

我想要的输出是:

  1. Result: 9+4+7+1 = 21 (the sum of Rates from 2023-03-27 to 2023-03-30 )

  2. Result: 4+7+1+2 = 14 ...

  3. 结果:9+4+7+1 = 21(从2023-03-27到2023-03-30的Rates总和)

  4. 结果:4+7+1+2 = 14...

         DATE Rates DATE_following Results
1  2023-03-27     9     2023-03-30      21
2  2023-03-28     4     2023-03-31      14
3  2023-03-29     7     2023-04-01      17
4  2023-03-30     1     2023-04-02      12
5  2023-03-31     2     2023-04-03      14
6  2023-04-01     7     2023-04-04      13
7  2023-04-02     2     2023-04-05      11
8  2023-04-03     3     2023-04-06      14
9  2023-04-04     1     2023-04-07      NA
10 2023-04-05     5     2023-04-08      NA
11 2023-04-06     5     2023-04-09      NA

Is it possible to get this result using dplyr functions like rowwise() and cumsum()? My main problem is that I don't know how to define this condition within these functions.

是否可以使用dplyr函数如rowwise()和cumsum()来获得这个结果?我的主要问题是不知道如何在这些函数内定义这个条件。

英文:

I'm trying to calculate the rowise cumulative sum of Rates from DATE to DATE_following.

For example:

library(tidyverse)
library(bizdays)
library(lubridate)

set.seed(1)
dat &lt;- seq.Date(from = as.Date(as.Date(&quot;2023-04-06&quot;)- days(10)),
                to = as.Date(as.Date(&quot;2023-04-06&quot;)),
                by = &quot;day&quot;)  %&gt;% 
  data.frame(DATE = .) %&gt;% 
  mutate(Rates = sample(seq(from=1,to=10,by=1), size = length(DATE),replace=TRUE),
         DATE_following = modified.following(DATE %m+% days(3)))

dat
        DATE Rates DATE_following
1  2023-03-27     9     2023-03-30
2  2023-03-28     4     2023-03-31
3  2023-03-29     7     2023-04-01
4  2023-03-30     1     2023-04-02
5  2023-03-31     2     2023-04-03
6  2023-04-01     7     2023-04-04
7  2023-04-02     2     2023-04-05
8  2023-04-03     3     2023-04-06
9  2023-04-04     1     2023-04-07
10 2023-04-05     5     2023-04-08
11 2023-04-06     5     2023-04-09

The output i'm trying to get is:

  1. Result: 9+4+7+1 = 21 (the sum of Rates from 2023-03-27 to 2023-03-30 )
  2. Result: 4+7+1+2 = 14 ...
         DATE Rates DATE_following Results
1  2023-03-27     9     2023-03-30      21
2  2023-03-28     4     2023-03-31      14
3  2023-03-29     7     2023-04-01      17
4  2023-03-30     1     2023-04-02      12
5  2023-03-31     2     2023-04-03      14
6  2023-04-01     7     2023-04-04      13
7  2023-04-02     2     2023-04-05      11
8  2023-04-03     3     2023-04-06      14
9  2023-04-04     1     2023-04-07      NA
10 2023-04-05     5     2023-04-08      NA
11 2023-04-06     5     2023-04-09      NA

Is it possible to get this result using dplyr functions like rowwise() and cumsum()? My main problem is that I don't know how to define this condition within these functions.

答案1

得分: 4

如果您想要对连续的四个Rates进行滚动求和,您可以使用zoorollsum()函数:

library(dplyr)
library(zoo)

dat %>% 
  mutate(Result = rollsum(Rates, k = 4, fill = NA_real_, align = "left"))

这将返回:

# A tibble: 11 × 5
      no DATE       Rates DATE_following Result
   <dbl> <date>     <dbl> <date>          <dbl>
 1     1 2023-03-27     9 2023-03-30         21
 2     2 2023-03-28     4 2023-03-31         14
 3     3 2023-03-29     7 2023-04-01         17
 4     4 2023-03-30     1 2023-04-02         12
 5     5 2023-03-31     2 2023-04-03         14
 6     6 2023-04-01     7 2023-04-04         13
 7     7 2023-04-02     2 2023-04-05         11
 8     8 2023-04-03     3 2023-04-06         14
 9     9 2023-04-04     1 2023-04-07         NA
10    10 2023-04-05     5 2023-04-08         NA
11    11 2023-04-06     5 2023-04-09         NA

基于LeMarque的评论,还有一个稍微通用的答案:

dat2 %>% 
  mutate(days = as.integer(DATE_following - DATE) + 1,
         res = rollapply(data = Rates, width = days, FUN = sum, align = "left", fill = NA_real_))

这将返回:

# A tibble: 11 × 6
      no DATE       Rates DATE_following  days   res
   <dbl> <date>     <dbl> <date>         <dbl> <dbl>
 1     1 2023-03-27     9 2023-03-30         4    21
 2     2 2023-03-28     4 2023-03-31         4    14
 3     3 2023-03-29     7 2023-04-01         4    17
 4     4 2023-03-30     1 2023-04-02         4    12
 5     5 2023-03-31     2 2023-04-10        11    NA
 6     6 2023-04-01     7 2023-04-04         4    13
 7     7 2023-04-02     2 2023-04-05         4    11
 8     8 2023-04-03     3 2023-04-06         4    14
 9     9 2023-04-04     1 2023-04-07         4    NA
10    10 2023-04-05     5 2023-04-08         4    NA
11    11 2023-04-06     5 2023-04-09         4    NA

由于第5行的DATE_following在数据中不存在,此版本返回NA。此外,这个版本不是对连续的四天进行求和,而是计算了DATEDATE_following之间的天数,并将它们应用于滚动求和。

英文:

If you want a rolling sum for four consecutive Rates, you could use zoos rollsum() function:

library(dplyr)
library(zoo)

dat %&gt;% 
  mutate(Result = rollsum(Rates, k = 4, fill = NA_real_, align = &quot;left&quot;))

This returns

# A tibble: 11 &#215; 5
      no DATE       Rates DATE_following Result
   &lt;dbl&gt; &lt;date&gt;     &lt;dbl&gt; &lt;date&gt;          &lt;dbl&gt;
 1     1 2023-03-27     9 2023-03-30         21
 2     2 2023-03-28     4 2023-03-31         14
 3     3 2023-03-29     7 2023-04-01         17
 4     4 2023-03-30     1 2023-04-02         12
 5     5 2023-03-31     2 2023-04-03         14
 6     6 2023-04-01     7 2023-04-04         13
 7     7 2023-04-02     2 2023-04-05         11
 8     8 2023-04-03     3 2023-04-06         14
 9     9 2023-04-04     1 2023-04-07         NA
10    10 2023-04-05     5 2023-04-08         NA
11    11 2023-04-06     5 2023-04-09         NA

A slightly more general answer based on LeMarque's comment:

dat2 %&gt;% 
  mutate(days = as.integer(DATE_following - DATE) + 1,
         res = rollapply(data = Rates, width = days, FUN = sum, align = &quot;left&quot;, fill = NA_real_))

This returns

# A tibble: 11 &#215; 6
      no DATE       Rates DATE_following  days   res
   &lt;dbl&gt; &lt;date&gt;     &lt;dbl&gt; &lt;date&gt;         &lt;dbl&gt; &lt;dbl&gt;
 1     1 2023-03-27     9 2023-03-30         4    21
 2     2 2023-03-28     4 2023-03-31         4    14
 3     3 2023-03-29     7 2023-04-01         4    17
 4     4 2023-03-30     1 2023-04-02         4    12
 5     5 2023-03-31     2 2023-04-10        11    NA
 6     6 2023-04-01     7 2023-04-04         4    13
 7     7 2023-04-02     2 2023-04-05         4    11
 8     8 2023-04-03     3 2023-04-06         4    14
 9     9 2023-04-04     1 2023-04-07         4    NA
10    10 2023-04-05     5 2023-04-08         4    NA
11    11 2023-04-06     5 2023-04-09         4    NA

Since the DATE_following in row 5 isn't present in the data, this version returns NA. Furthermore this version doesn't sum four consecutive days but calculates the days between DATE and DATE_following and applies them to the rolling sum.

Data

dat &lt;- structure(list(no = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11), DATE = structure(c(19443, 
19444, 19445, 19446, 19447, 19448, 19449, 19450, 19451, 19452, 
19453), class = &quot;Date&quot;), Rates = c(9, 4, 7, 1, 2, 7, 2, 3, 1, 
5, 5), DATE_following = structure(c(19446, 19447, 19448, 19449, 
19450, 19451, 19452, 19453, 19454, 19455, 19456), class = &quot;Date&quot;)), class = c(&quot;spec_tbl_df&quot;, 
&quot;tbl_df&quot;, &quot;tbl&quot;, &quot;data.frame&quot;), row.names = c(NA, -11L), spec = structure(list(
    cols = list(no = structure(list(), class = c(&quot;collector_double&quot;, 
    &quot;collector&quot;)), DATE = structure(list(format = &quot;&quot;), class = c(&quot;collector_date&quot;, 
    &quot;collector&quot;)), Rates = structure(list(), class = c(&quot;collector_double&quot;, 
    &quot;collector&quot;)), DATE_following = structure(list(format = &quot;&quot;), class = c(&quot;collector_date&quot;, 
    &quot;collector&quot;))), default = structure(list(), class = c(&quot;collector_guess&quot;, 
    &quot;collector&quot;)), skip = 1L), class = &quot;col_spec&quot;))

dat2 &lt;- structure(list(no = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11), DATE = structure(c(19443, 
19444, 19445, 19446, 19447, 19448, 19449, 19450, 19451, 19452, 
19453), class = &quot;Date&quot;), Rates = c(9, 4, 7, 1, 2, 7, 2, 3, 1, 
5, 5), DATE_following = structure(c(19446, 19447, 19448, 19449, 
19457, 19451, 19452, 19453, 19454, 19455, 19456), class = &quot;Date&quot;)), class = c(&quot;spec_tbl_df&quot;, 
&quot;tbl_df&quot;, &quot;tbl&quot;, &quot;data.frame&quot;), row.names = c(NA, -11L), spec = structure(list(
    cols = list(no = structure(list(), class = c(&quot;collector_double&quot;, 
    &quot;collector&quot;)), DATE = structure(list(format = &quot;&quot;), class = c(&quot;collector_date&quot;, 
    &quot;collector&quot;)), Rates = structure(list(), class = c(&quot;collector_double&quot;, 
    &quot;collector&quot;)), DATE_following = structure(list(format = &quot;&quot;), class = c(&quot;collector_date&quot;, 
    &quot;collector&quot;))), default = structure(list(), class = c(&quot;collector_guess&quot;, 
    &quot;collector&quot;)), skip = 1L), class = &quot;col_spec&quot;))

答案2

得分: 2

这是您需要的内容:

library(zoo)
library(tidyverse)

set.seed(1)
dat <- seq.Date(from = as.Date(today()- days(10)), to = as.Date(today()), by = "day") %>%
  data.frame(DATE = .) %>%
  mutate(Rates = sample(seq(from=1,to=10,by=1), size = length(DATE),,replace=TRUE),
         DATE_following = DATE %m+% days(3),
         Results = rollapply(data = Rates, width = 4, FUN = sum, align = "left", fill = NA, partial = TRUE)) %>%
  mutate( Results = ifelse(DATE_following %in% DATE, Results, NA))

dat

希望这对您有帮助。

英文:

Did you need something like this:

library(zoo)
library(tidyverse)
set.seed(1)
dat &lt;- seq.Date(from = as.Date(today()- days(10)), to = as.Date(today()), by = &quot;day&quot;)  %&gt;% 
data.frame(DATE = .) %&gt;% 
mutate(Rates = sample(seq(from=1,to=10,by=1), size = length(DATE),,replace=TRUE),
DATE_following = DATE %m+% days(3),
Results = rollapply(data = Rates, width = 4, FUN = sum, align = &quot;left&quot;, fill = NA, partial = TRUE)) %&gt;%
mutate( Results = ifelse(DATE_following %in% DATE, Results, NA))

dat

which results in:

          DATE Rates DATE_following Results
1  2023-03-27     3     2023-03-30      20
2  2023-03-28     3     2023-03-31      24
3  2023-03-29     8     2023-04-01      27
4  2023-03-30     6     2023-04-02      27
5  2023-03-31     7     2023-04-03      28
6  2023-04-01     6     2023-04-04      22
7  2023-04-02     8     2023-04-05      20
8  2023-04-03     7     2023-04-06      20
9  2023-04-04     1     2023-04-07      NA
10 2023-04-05     4     2023-04-08      NA
11 2023-04-06     8     2023-04-09      NA

Please check and let me know...

答案3

得分: 2

Sure, here are the translated code parts:

如果你想要一个滚动总和,你可以使用 filter

rev(filter(rev(dat$Rates), rep(1,4), side=1))
#rev(stats::filter(rev(dat$Rates), rep(1,4), side=1)) #如果使用了遮盖了 stats::filter 的 dplyr
# [1] 21 14 17 12 14 13 11 14 NA NA NA

如果需要匹配日期:

mapply(\(a,b) if(is.na(b)) NA else sum(dat$Rates[a:b]),
  seq_len(nrow(dat)), match(dat$DATE_following, dat$DATE))
# [1] 21 14 17 12 14 13 11 14 NA NA NA

或者在日期没有排序且不需要匹配所有日期或无需匹配的情况下:

mapply(\(a,b) sum(dat$Rates[dat$DATE >= a & dat$DATE <= b]),
  dat$DATE, dat$DATE_following)
# [1] 21 14 17 12 14 13 11 14 11 10  5

感谢 @Martin Gal 提供的数据!

英文:

If you want a rolling sum you can use filter.

rev(filter(rev(dat$Rates), rep(1,4), side=1))
#rev(stats::filter(rev(dat$Rates), rep(1,4), side=1)) #In case using dplyr which is masking stats::filter
# [1] 21 14 17 12 14 13 11 14 NA NA NA

In case the dates should be matched:

mapply(\(a,b) if(is.na(b)) NA else sum(dat$Rates[a:b]),
seq_len(nrow(dat)), match(dat$DATE_following, dat$DATE))
# [1] 21 14 17 12 14 13 11 14 NA NA NA

Or in case it is not sorted and there is no need that all dates are present nor need to match:

mapply(\(a,b) sum(dat$Rates[dat$DATE &gt;= a &amp; dat$DATE &lt;= b]),
+   dat$DATE, dat$DATE_following)
# [1] 21 14 17 12 14 13 11 14 11 10  5

Thanks to @Martin Gal for providing the data!

答案4

得分: 1

以下是 data.table 中的两种方法:

  1. frollsum 方法
library(data.table)
setDT(dat)
dat[,
  Results := frollsum(Rates, DATE_following - DATE + 1, adaptive = TRUE)
][,
  Results := Results[order(is.na(Results))]
][]
  1. non-equi join 方法
library(data.table)
setDT(dat)
dat[
  dat[
    dat,
    on = .(DATE &lt;= DATE_following)
  ][
    DATE_following &gt;= DATE,
    .(Results = sum(Rates)),
    i.DATE
  ],
  Results := Results * (match(DATE_following, DATE) &gt; 0),
  on = .(DATE = i.DATE)
][]

输出

          DATE Rates DATE_following Results
 1: 2023-03-27     9     2023-03-30      21
 2: 2023-03-28     4     2023-03-31      14
 3: 2023-03-29     7     2023-04-01      17
 4: 2023-03-30     1     2023-04-02      12
 5: 2023-03-31     2     2023-04-03      14
 6: 2023-04-01     7     2023-04-04      13
 7: 2023-04-02     2     2023-04-05      11
 8: 2023-04-03     3     2023-04-06      14
 9: 2023-04-04     1     2023-04-07      NA
10: 2023-04-05     5     2023-04-08      NA
11: 2023-04-06     5     2023-04-09      NA
英文:

Here are some data.table option

  1. frollsum approach
library(data.table)
setDT(dat)
dat[
,
Results := frollsum(Rates, DATE_following - DATE + 1, adaptive = TRUE)
][
,
Results := Results[order(is.na(Results))]
][]
  1. non-equi join approach
library(data.table)
setDT(dat)
dat[
dat[
dat,
on = .(DATE &lt;= DATE_following)
][
DATE_following &gt;= DATE,
.(Results = sum(Rates)),
i.DATE
],
Results := Results * (match(DATE_following, DATE) &gt; 0),
on = .(DATE = i.DATE)
][]

Output

          DATE Rates DATE_following Results
1: 2023-03-27     9     2023-03-30      21
2: 2023-03-28     4     2023-03-31      14
3: 2023-03-29     7     2023-04-01      17
4: 2023-03-30     1     2023-04-02      12
5: 2023-03-31     2     2023-04-03      14
6: 2023-04-01     7     2023-04-04      13
7: 2023-04-02     2     2023-04-05      11
8: 2023-04-03     3     2023-04-06      14
9: 2023-04-04     1     2023-04-07      NA
10: 2023-04-05     5     2023-04-08      NA
11: 2023-04-06     5     2023-04-09      NA

答案5

得分: 1

以下是代码的翻译部分:

# remotes::install_github(&quot;NicChr/timeplyr&quot;)
library(timeplyr)
library(dplyr)
library(lubridate)
set.seed(1)
dat &lt;- seq.Date(from = as.Date(Sys.Date()- days(10)),
                to = as.Date(Sys.Date()),
                by = &quot;day&quot;)  %&gt;% 
  data.frame(DATE = .) %&gt;% 
  mutate(Rates = sample(seq(from=1,to=10,by=1), size = length(DATE),,replace=TRUE),
         DATE_following =DATE %m+% days(3))
# Vectorised seq function
x1 &lt;- time_seq_v(dat$DATE, dat$DATE_following, by = &quot;days&quot;)
# Label these based on the relevant rows
x2 &lt;- rep(seq_len(nrow(dat)), time_seq_len(dat$DATE, dat$DATE_following, by = &quot;days&quot;))

# Add these sequences to the data
dat &lt;- dat %&gt;%
mutate(dates = split(x1, x2))

# Sum
my_sum &lt;- numeric(nrow(dat))
for (i in seq_len(nrow(dat))){
  my_sum[[i]] &lt;- sum(dat$Rates[dat$DATE %in% dat$dates[[i]]])
}
dat$Result &lt;- my_sum

dat

希望这对您有所帮助。

英文:

A rolling sum would indeed be more efficient, but this could be a potential solution if your start and end points are dynamic.

# remotes::install_github(&quot;NicChr/timeplyr&quot;)
library(timeplyr)
library(dplyr)
library(lubridate)
set.seed(1)
dat &lt;- seq.Date(from = as.Date(Sys.Date()- days(10)),
                to = as.Date(Sys.Date()),
                by = &quot;day&quot;)  %&gt;% 
  data.frame(DATE = .) %&gt;% 
  mutate(Rates = sample(seq(from=1,to=10,by=1), size = length(DATE),,replace=TRUE),
         DATE_following =DATE %m+% days(3))
# Vectorised seq function
x1 &lt;- time_seq_v(dat$DATE, dat$DATE_following, by = &quot;days&quot;)
# Label these based on the relevant rows
x2 &lt;- rep(seq_len(nrow(dat)), time_seq_len(dat$DATE, dat$DATE_following, by = &quot;days&quot;))

# Add these sequences to the data
dat &lt;- dat %&gt;%
  as_tibble() %&gt;%
  mutate(dates = split(x1, x2))



# Sum
my_sum &lt;- numeric(nrow(dat))
for (i in seq_len(nrow(dat))){
  my_sum[[i]] &lt;- sum(dat$Rates[dat$DATE %in% dat$dates[[i]]])
}
dat$Result &lt;- my_sum

dat
#&gt; # A tibble: 11 x 5
#&gt;    DATE       Rates DATE_following dates        Result
#&gt;    &lt;date&gt;     &lt;dbl&gt; &lt;date&gt;         &lt;named list&gt;  &lt;dbl&gt;
#&gt;  1 2023-03-27     9 2023-03-30     &lt;date [4]&gt;       21
#&gt;  2 2023-03-28     4 2023-03-31     &lt;date [4]&gt;       14
#&gt;  3 2023-03-29     7 2023-04-01     &lt;date [4]&gt;       17
#&gt;  4 2023-03-30     1 2023-04-02     &lt;date [4]&gt;       12
#&gt;  5 2023-03-31     2 2023-04-03     &lt;date [4]&gt;       14
#&gt;  6 2023-04-01     7 2023-04-04     &lt;date [4]&gt;       13
#&gt;  7 2023-04-02     2 2023-04-05     &lt;date [4]&gt;       11
#&gt;  8 2023-04-03     3 2023-04-06     &lt;date [4]&gt;       14
#&gt;  9 2023-04-04     1 2023-04-07     &lt;date [4]&gt;       11
#&gt; 10 2023-04-05     5 2023-04-08     &lt;date [4]&gt;       10
#&gt; 11 2023-04-06     5 2023-04-09     &lt;date [4]&gt;        5

<sup>Created on 2023-04-06 with reprex v2.0.2</sup>

huangapple
  • 本文由 发表于 2023年4月6日 19:15:49
  • 转载请务必保留本文链接:https://go.coder-hub.com/75948891.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定