2023年2月24日 13:24:47go评论96阅读模式

英文:

How can I get the average number of days between changes in a variable over a period of time in R?

问题

我有一个数据集，其中包含日期和另一个变量（银行利率）。以下是数据的一部分：

我想计算银行利率连续变化之间的平均天数。例如，要获得如下输出：

基本上，我试图计算利率在变化之前保持不变的平均天数。

我可以使用通常的difftime()函数，但我需要它专门在利率发生变化时计算差异，然后求平均值。我是R的新手，不知道如何处理这个问题。

英文:

I have a dataset that has date and another variable (bank rate). Here is a snippet of the data:

I want to calculate the average number of days between each consecutive change in Bank rate. For example to get an output as such:

Essentially, I am trying to calculate the average number of days a rate remains for before it changes.

I am able to do the usual difftime() function, However I need it to specifically only calculate the difference when there are changes in the rate, and then average it out. I am new to R and unable to figure out how to go about this

答案1

得分: 1

我已经生成了与上述时间范围相符的一系列随机日期，并使用了上面的bank_rate，然后将它们放入了一个数据框（DF）中。

这个DF按date排序。然后使用filter将不显示任何bank_rate更改的数据删除（查看连续的bank_rate为2的情况）。创建一个新变量days_from_before，它计算了连续日期的天数。

平均值计算为days_from_before的mean。

library(dplyr)
set.seed(123)
date <- sample(seq(as.Date("2018/02/07"), as.Date("2023/01/15"), by = "day"), 14)
bank_rate <- c(1.5, 1.5, rep(2, 6), 0.5, 1.25, 4.5, 4.5, 4.75, 4.75)
df <- data.frame(date, bank_rate)
df
#>          date bank_rate
#> 1  2019-03-28      1.50
#> 2  2019-05-15      1.50
#> 3  2018-08-04      2.00
#> 4  2019-07-17      2.00
#> 5  2018-08-20      2.00
#> 6  2020-09-01      2.00
#> 7  2021-03-24      2.00
#> 8  2021-09-21      2.00
#> 9  2021-07-13      0.50
#> 10 2021-07-28      1.25
#> 11 2020-12-10      4.50
#> 12 2021-12-05      4.50
#> 13 2019-12-03      4.75
#> 14 2019-10-01      4.75
ddf <- df |>
  arrange(date) |>
  filter(bank_rate != dplyr::lag(bank_rate, default = 0)) |>
  mutate(
    days_from_before = as.numeric(difftime(date, dplyr::lag(date))),
    days_from_before = ifelse(is.na(days_from_before), 0, days_from_before)
  )
ddf
#>          date bank_rate days_from_before
#> 1  2018-08-04      2.00                0
#> 2  2019-03-28      1.50              236
#> 3  2019-07-17      2.00              111
#> 4  2019-10-01      4.75               76
#> 5  2020-09-01      2.00              336
#> 6  2020-12-10      4.50              100
#> 7  2021-03-24      2.00              104
#> 8  2021-07-13      0.50              111
#> 9  2021-07-28      1.25               15
#> 10 2021-09-21      2.00               55
#> 11 2021-12-05      4.50               75
mean(ddf$days_from_before)
#> [1] 110.8182

英文:

I have a made a random sequence of dates in the timeframe as above and have used bank_rate from above and put them in a DF.

This DF is ordered for date.
Data which do not show any change in bank_rate are then removed by filter. (see consecutive bank_rates of 2). A new variable days_from_before is created which calculates the number of days of consecutive dates.

The average is calculated as the mean from days_from_before.

library(dplyr)
set.seed(123)
date &lt;- sample(seq(as.Date(&quot;2018/02/07&quot;), as.Date(&quot;2023/01/15&quot;), by = &quot;day&quot;), 14)
bank_rate &lt;- c(1.5, 1.5, rep(2, 6), 0.5, 1.25, 4.5, 4.5, 4.75, 4.75)
df &lt;- data.frame(date, bank_rate)
df
#&gt;          date bank_rate
#&gt; 1  2019-03-28      1.50
#&gt; 2  2019-05-15      1.50
#&gt; 3  2018-08-04      2.00
#&gt; 4  2019-07-17      2.00
#&gt; 5  2018-08-20      2.00
#&gt; 6  2020-09-01      2.00
#&gt; 7  2021-03-24      2.00
#&gt; 8  2021-09-21      2.00
#&gt; 9  2021-07-13      0.50
#&gt; 10 2021-07-28      1.25
#&gt; 11 2020-12-10      4.50
#&gt; 12 2021-12-05      4.50
#&gt; 13 2019-12-03      4.75
#&gt; 14 2019-10-01      4.75
ddf &lt;- df |&gt;
  arrange(date) |&gt;
  filter(bank_rate != dplyr::lag(bank_rate, default = 0)) |&gt; 
  mutate(
    days_from_before = as.numeric(difftime(date, dplyr::lag(date))),
    days_from_before = ifelse(is.na(days_from_before), 0, days_from_before)
  )
ddf
#&gt;          date bank_rate days_from_before
#&gt; 1  2018-08-04      2.00                0
#&gt; 2  2019-03-28      1.50              236
#&gt; 3  2019-07-17      2.00              111
#&gt; 4  2019-10-01      4.75               76
#&gt; 5  2020-09-01      2.00              336
#&gt; 6  2020-12-10      4.50              100
#&gt; 7  2021-03-24      2.00              104
#&gt; 8  2021-07-13      0.50              111
#&gt; 9  2021-07-28      1.25               15
#&gt; 10 2021-09-21      2.00               55
#&gt; 11 2021-12-05      4.50               75
mean(ddf$days_from_before)
#&gt; [1] 110.8182

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

我如下翻译：如何在R中计算变量在一段时间内更改之间的平均天数？

问题

答案1

在时间轴上为二元因变量绘制十字。

用下一列的变量替换数据框中的NA值（R）

unnest_longer函数用于展开包含字符和列表条目的列。

过滤掉数据框中特定列为零的行（R）

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。