英文:
How can I get the average number of days between changes in a variable over a period of time in R?
问题
我有一个数据集,其中包含日期和另一个变量(银行利率)。以下是数据的一部分:
我想计算银行利率连续变化之间的平均天数。例如,要获得如下输出:
基本上,我试图计算利率在变化之前保持不变的平均天数。
我可以使用通常的difftime()
函数,但我需要它专门在利率发生变化时计算差异,然后求平均值。我是R的新手,不知道如何处理这个问题。
英文:
I have a dataset that has date and another variable (bank rate). Here is a snippet of the data:
I want to calculate the average number of days between each consecutive change in Bank rate. For example to get an output as such:
Essentially, I am trying to calculate the average number of days a rate remains for before it changes.
I am able to do the usual difftime() function, However I need it to specifically only calculate the difference when there are changes in the rate, and then average it out. I am new to R and unable to figure out how to go about this
答案1
得分: 1
我已经生成了与上述时间范围相符的一系列随机日期,并使用了上面的bank_rate
,然后将它们放入了一个数据框(DF)中。
这个DF按date
排序。然后使用filter
将不显示任何bank_rate
更改的数据删除(查看连续的bank_rate
为2的情况)。创建一个新变量days_from_before
,它计算了连续日期的天数。
平均值计算为days_from_before
的mean
。
library(dplyr)
set.seed(123)
date <- sample(seq(as.Date("2018/02/07"), as.Date("2023/01/15"), by = "day"), 14)
bank_rate <- c(1.5, 1.5, rep(2, 6), 0.5, 1.25, 4.5, 4.5, 4.75, 4.75)
df <- data.frame(date, bank_rate)
df
#> date bank_rate
#> 1 2019-03-28 1.50
#> 2 2019-05-15 1.50
#> 3 2018-08-04 2.00
#> 4 2019-07-17 2.00
#> 5 2018-08-20 2.00
#> 6 2020-09-01 2.00
#> 7 2021-03-24 2.00
#> 8 2021-09-21 2.00
#> 9 2021-07-13 0.50
#> 10 2021-07-28 1.25
#> 11 2020-12-10 4.50
#> 12 2021-12-05 4.50
#> 13 2019-12-03 4.75
#> 14 2019-10-01 4.75
ddf <- df |>
arrange(date) |>
filter(bank_rate != dplyr::lag(bank_rate, default = 0)) |>
mutate(
days_from_before = as.numeric(difftime(date, dplyr::lag(date))),
days_from_before = ifelse(is.na(days_from_before), 0, days_from_before)
)
ddf
#> date bank_rate days_from_before
#> 1 2018-08-04 2.00 0
#> 2 2019-03-28 1.50 236
#> 3 2019-07-17 2.00 111
#> 4 2019-10-01 4.75 76
#> 5 2020-09-01 2.00 336
#> 6 2020-12-10 4.50 100
#> 7 2021-03-24 2.00 104
#> 8 2021-07-13 0.50 111
#> 9 2021-07-28 1.25 15
#> 10 2021-09-21 2.00 55
#> 11 2021-12-05 4.50 75
mean(ddf$days_from_before)
#> [1] 110.8182
英文:
I have a made a random sequence of dates in the timeframe as above and have used bank_rate
from above and put them in a DF.
This DF is ordered for date
.
Data which do not show any change in bank_rate
are then removed by filter
. (see consecutive bank_rate
s of 2). A new variable days_from_before
is created which calculates the number of days of consecutive dates.
The average is calculated as the mean
from days_from_before
.
library(dplyr)
set.seed(123)
date <- sample(seq(as.Date("2018/02/07"), as.Date("2023/01/15"), by = "day"), 14)
bank_rate <- c(1.5, 1.5, rep(2, 6), 0.5, 1.25, 4.5, 4.5, 4.75, 4.75)
df <- data.frame(date, bank_rate)
df
#> date bank_rate
#> 1 2019-03-28 1.50
#> 2 2019-05-15 1.50
#> 3 2018-08-04 2.00
#> 4 2019-07-17 2.00
#> 5 2018-08-20 2.00
#> 6 2020-09-01 2.00
#> 7 2021-03-24 2.00
#> 8 2021-09-21 2.00
#> 9 2021-07-13 0.50
#> 10 2021-07-28 1.25
#> 11 2020-12-10 4.50
#> 12 2021-12-05 4.50
#> 13 2019-12-03 4.75
#> 14 2019-10-01 4.75
ddf <- df |>
arrange(date) |>
filter(bank_rate != dplyr::lag(bank_rate, default = 0)) |>
mutate(
days_from_before = as.numeric(difftime(date, dplyr::lag(date))),
days_from_before = ifelse(is.na(days_from_before), 0, days_from_before)
)
ddf
#> date bank_rate days_from_before
#> 1 2018-08-04 2.00 0
#> 2 2019-03-28 1.50 236
#> 3 2019-07-17 2.00 111
#> 4 2019-10-01 4.75 76
#> 5 2020-09-01 2.00 336
#> 6 2020-12-10 4.50 100
#> 7 2021-03-24 2.00 104
#> 8 2021-07-13 0.50 111
#> 9 2021-07-28 1.25 15
#> 10 2021-09-21 2.00 55
#> 11 2021-12-05 4.50 75
mean(ddf$days_from_before)
#> [1] 110.8182
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论