2023年6月19日 19:15:57go评论95阅读模式

英文:

Weighted average each year - R

问题

我想要计算每年（2006年至2023年）的interest_rate的平均值。我有一系列有关新利率和日期的利率决策日期。

对于在一年内有多个interest_rates的年份，我想按照该特定利率在该年份中所占比例来加权平均值。

示例数据：

library(data.table)
df = data.table(date = c("03/08/2006", "09/11/2006", "11/01/2007", "10/05/2007", "05/07/2007", "06/12/2007", "07/02/2008", "10/04/2008", "08/10/2008", "06/11/2008", "04/12/2008", "08/01/2009", "05/02/2009", "05/03/2009", "04/08/2016", "02/11/2017", "02/08/2018", "11/03/2020", "19/03/2020", "16/12/2021", "03/02/2022", "17/03/2022", "05/05/2022", "16/06/2022", "04/08/2022", "22/09/2022", "03/11/2022", "15/12/2022", "02/02/2023", "23/03/2023", "11/05/2023"),
                    interest_rate = c(4.7500, 5.0000, 5.2500, 5.5000, 5.7500, 5.5000, 5.2500, 5.0000, 4.5000, 3.0000, 2.0000, 1.5000, 1.0000, 0.5000, 0.2500, 0.5000, 0.7500, 0.2500, 0.1000, 0.2500, 0.5000, 0.7500, 1.0000, 1.2500, 1.7500, 2.2500, 3.0000, 3.5000, 4.0000, 4.2500, 4.500))

结果数据将如下所示：

年份	加权平均利率
2006	4.7
2007	...
2008	等等

英文:

I am looking to calculate the average of the interest_rate for each year (2006 to 2023). I have a series of interest rate decision dates for the new rate and the date.

For years with multiple interest_rates during a year I would like to weight the average value by the proportion of year that at that particular rate.

Example data:

library(data.table)
df = data.table(date = c(&quot;03/08/2006&quot;, &quot;09/11/2006&quot;, &quot;11/01/2007&quot;, &quot;10/05/2007&quot;, &quot;05/07/2007&quot;, &quot;06/12/2007&quot;, &quot;07/02/2008&quot;, &quot;10/04/2008&quot;, &quot;08/10/2008&quot;, &quot;06/11/2008&quot;, &quot;04/12/2008&quot;, &quot;08/01/2009&quot;, &quot;05/02/2009&quot;, &quot;05/03/2009&quot;, &quot;04/08/2016&quot;, &quot;02/11/2017&quot;, &quot;02/08/2018&quot;, &quot;11/03/2020&quot;, &quot;19/03/2020&quot;, &quot;16/12/2021&quot;, &quot;03/02/2022&quot;, &quot;17/03/2022&quot;, &quot;05/05/2022&quot;, &quot;16/06/2022&quot;, &quot;04/08/2022&quot;, &quot;22/09/2022&quot;, &quot;03/11/2022&quot;, &quot;15/12/2022&quot;, &quot;02/02/2023&quot;, &quot;23/03/2023&quot;, &quot;11/05/2023&quot;),
                interest_rate = c(4.7500, 5.0000, 5.2500, 5.5000, 5.7500, 5.5000, 5.2500, 5.0000, 4.5000, 3.0000, 2.0000, 1.5000, 1.0000, 0.5000, 0.2500, 0.5000, 0.7500, 0.2500, 0.1000, 0.2500, 0.5000, 0.7500, 1.0000, 1.2500, 1.7500, 2.2500, 3.0000, 3.5000, 4.0000, 4.2500, 4.500))

The resultant data would look like:

year	weighted_interest_rate
2006	4.7
2007	...
2008	etc

答案1

得分: 1

一种方法是创建一个包含整个时间序列中每一天的利率的数据框，按年份分组，然后计算每年每天的平均利率。

一个重要的第一步是将日期字符串转换为实际日期。

library(tidyverse)
df %>%
  mutate(date = as.Date(date, '%d/%m/%Y'),
         days = c(as.numeric(diff(date)), 0)) %>%
  rowwise() %>%
  reframe(date = seq(date, by = 'day', length.out = days),
          interest_rate = interest_rate,
          year = lubridate::year(date)) %>%
  group_by(year) %>%
  summarize(interest_rate = mean(interest_rate))
#> # A tibble: 18 x 2
#>     year interest_rate
#>    <dbl>         <dbl>
#>  1  2006         4.84 
#>  2  2007         5.51 
#>  3  2008         4.67 
#>  4  2009         0.644
#>  5  2010         0.5  
#>  6  2011         0.5  
#>  7  2012         0.5  
#>  8  2013         0.5  
#>  9  2014         0.5  
#> 10  2015         0.5  
#> 11  2016         0.398
#> 12  2017         0.291
#> 13  2018         0.604
#> 14  2019         0.75 
#> 15  2020         0.228
#> 16  2021         0.107
#> 17  2022         1.47 
#> 18  2023         3.97

英文:

One method would be to create a data frame of the interest rates for each day in the entire series, group by year, then average the interest rate for each date in the year.

An important first step is converting the dates-as-strings to actual dates.

library(tidyverse)
df %&gt;%
  mutate(date = as.Date(date, &#39;%d/%m/%Y&#39;),
         days = c(as.numeric(diff(date)), 0)) %&gt;%
  rowwise() %&gt;%
  reframe(date = seq(date, by = &#39;day&#39;, length.out = days),
          interest_rate = interest_rate,
          year = lubridate::year(date)) %&gt;%
  group_by(year) %&gt;%
  summarize(interest_rate = mean(interest_rate))
#&gt; # A tibble: 18 x 2
#&gt;     year interest_rate
#&gt;    &lt;dbl&gt;         &lt;dbl&gt;
#&gt;  1  2006         4.84 
#&gt;  2  2007         5.51 
#&gt;  3  2008         4.67 
#&gt;  4  2009         0.644
#&gt;  5  2010         0.5  
#&gt;  6  2011         0.5  
#&gt;  7  2012         0.5  
#&gt;  8  2013         0.5  
#&gt;  9  2014         0.5  
#&gt; 10  2015         0.5  
#&gt; 11  2016         0.398
#&gt; 12  2017         0.291
#&gt; 13  2018         0.604
#&gt; 14  2019         0.75 
#&gt; 15  2020         0.228
#&gt; 16  2021         0.107
#&gt; 17  2022         1.47 
#&gt; 18  2023         3.97

答案2

得分: 1

Inelegant solution:

# 向数据框中添加每年起始日期的日期列表
df = rbind(df, data.table(date = as.Date(ISOdate(seq(2005,2023),1,1)), "interest_rate" = NA), fill=T)
# 设置排序
setorder(df, -date)
# 将新日期的值设置为当前利率
df[is.na(interest_rate), interest_rate := df[df[is.na(interest_rate)], on = .(date), interest_rate, roll=Inf]]
# 标识前一个移动日期是什么
df[, change_date := shift(date, 1L)]
# 获取下一个利率变化或年底
df[, next_change_date_or_year := min(as.Date(ISOdate(year,12,31)), change_date, na.rm=T), date]
# 获取年内相对持续时间
df[, relative_duration := as.numeric((next_change_date_or_year-date)/365)]
df[, year := year(date)]
# 每年的加权平均值
yearly_IR = df[!is.na(year), .(weight_interest_rate = weighted.mean(`interest rate`, relative_duration)), year]

英文:

Inelegant solution:

# add to the dataframe a list of dates for the start of each year
df= rbind(df, data.table(date = as.Date(ISOdate(seq(2005,2023),1,1)), &quot;interest_rate&quot; = NA), fill=T)
# set order
setorder(df, -date)
# set value for the new dates to the current interest rate
df[is.na(interest_rate), interest_rate := df[df[is.na(interest_rate)], on = .(date), interest_rate, roll=Inf]]
# identify the previous shift date was
df[, change_date := shift(date, 1L)]
# get next rate change or end of year
df[, next_change_date_or_year := min(as.Date(ISOdate(year,12,31)), change_date, na.rm=T), date]
# get relative duration within year
df[, relative_duration := as.numeric((next_change_date_or_year-date)/365)]
df[, year := year(date)]
# weighted average per year
yearly_IR = df[!is.na(year), .(weight_interest_rate = weighted.mean(`interest rate`, relative_duration)), year]

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

每年加权平均值 – R

问题

答案1

答案2

使用时间条件而不是季度/月份来改变情感指标（使用dplyr）。

R的for循环用于根据另一列的ID将函数应用于数据框向量。

当使用lm_robust时，以及texreg仅获取观察数：

获取特定变量的线性回归方程如何？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。