每年加权平均值 – R

huangapple go评论64阅读模式
英文:

Weighted average each year - R

问题

我想要计算每年(2006年至2023年)的interest_rate的平均值。我有一系列有关新利率和日期的利率决策日期。

对于在一年内有多个interest_rates的年份,我想按照该特定利率在该年份中所占比例来加权平均值。

示例数据:

library(data.table)
df = data.table(date = c("03/08/2006", "09/11/2006", "11/01/2007", "10/05/2007", "05/07/2007", "06/12/2007", "07/02/2008", "10/04/2008", "08/10/2008", "06/11/2008", "04/12/2008", "08/01/2009", "05/02/2009", "05/03/2009", "04/08/2016", "02/11/2017", "02/08/2018", "11/03/2020", "19/03/2020", "16/12/2021", "03/02/2022", "17/03/2022", "05/05/2022", "16/06/2022", "04/08/2022", "22/09/2022", "03/11/2022", "15/12/2022", "02/02/2023", "23/03/2023", "11/05/2023"),
                    interest_rate = c(4.7500, 5.0000, 5.2500, 5.5000, 5.7500, 5.5000, 5.2500, 5.0000, 4.5000, 3.0000, 2.0000, 1.5000, 1.0000, 0.5000, 0.2500, 0.5000, 0.7500, 0.2500, 0.1000, 0.2500, 0.5000, 0.7500, 1.0000, 1.2500, 1.7500, 2.2500, 3.0000, 3.5000, 4.0000, 4.2500, 4.500))

结果数据将如下所示:

年份 加权平均利率
2006 4.7
2007 ...
2008 等等
英文:

I am looking to calculate the average of the interest_rate for each year (2006 to 2023). I have a series of interest rate decision dates for the new rate and the date.

For years with multiple interest_rates during a year I would like to weight the average value by the proportion of year that at that particular rate.

Example data:

library(data.table)
df = data.table(date = c("03/08/2006", "09/11/2006", "11/01/2007", "10/05/2007", "05/07/2007", "06/12/2007", "07/02/2008", "10/04/2008", "08/10/2008", "06/11/2008", "04/12/2008", "08/01/2009", "05/02/2009", "05/03/2009", "04/08/2016", "02/11/2017", "02/08/2018", "11/03/2020", "19/03/2020", "16/12/2021", "03/02/2022", "17/03/2022", "05/05/2022", "16/06/2022", "04/08/2022", "22/09/2022", "03/11/2022", "15/12/2022", "02/02/2023", "23/03/2023", "11/05/2023"),
                interest_rate = c(4.7500, 5.0000, 5.2500, 5.5000, 5.7500, 5.5000, 5.2500, 5.0000, 4.5000, 3.0000, 2.0000, 1.5000, 1.0000, 0.5000, 0.2500, 0.5000, 0.7500, 0.2500, 0.1000, 0.2500, 0.5000, 0.7500, 1.0000, 1.2500, 1.7500, 2.2500, 3.0000, 3.5000, 4.0000, 4.2500, 4.500))

The resultant data would look like:

year weighted_interest_rate
2006 4.7
2007 ...
2008 etc

答案1

得分: 1

一种方法是创建一个包含整个时间序列中每一天的利率的数据框,按年份分组,然后计算每年每天的平均利率。

一个重要的第一步是将日期字符串转换为实际日期。

library(tidyverse)

df %>%
  mutate(date = as.Date(date, '%d/%m/%Y'),
         days = c(as.numeric(diff(date)), 0)) %>%
  rowwise() %>%
  reframe(date = seq(date, by = 'day', length.out = days),
          interest_rate = interest_rate,
          year = lubridate::year(date)) %>%
  group_by(year) %>%
  summarize(interest_rate = mean(interest_rate))
#> # A tibble: 18 x 2
#>     year interest_rate
#>    <dbl>         <dbl>
#>  1  2006         4.84 
#>  2  2007         5.51 
#>  3  2008         4.67 
#>  4  2009         0.644
#>  5  2010         0.5  
#>  6  2011         0.5  
#>  7  2012         0.5  
#>  8  2013         0.5  
#>  9  2014         0.5  
#> 10  2015         0.5  
#> 11  2016         0.398
#> 12  2017         0.291
#> 13  2018         0.604
#> 14  2019         0.75 
#> 15  2020         0.228
#> 16  2021         0.107
#> 17  2022         1.47 
#> 18  2023         3.97 
英文:

One method would be to create a data frame of the interest rates for each day in the entire series, group by year, then average the interest rate for each date in the year.

An important first step is converting the dates-as-strings to actual dates.

library(tidyverse)

df %&gt;%
  mutate(date = as.Date(date, &#39;%d/%m/%Y&#39;),
         days = c(as.numeric(diff(date)), 0)) %&gt;%
  rowwise() %&gt;%
  reframe(date = seq(date, by = &#39;day&#39;, length.out = days),
          interest_rate = interest_rate,
          year = lubridate::year(date)) %&gt;%
  group_by(year) %&gt;%
  summarize(interest_rate = mean(interest_rate))
#&gt; # A tibble: 18 x 2
#&gt;     year interest_rate
#&gt;    &lt;dbl&gt;         &lt;dbl&gt;
#&gt;  1  2006         4.84 
#&gt;  2  2007         5.51 
#&gt;  3  2008         4.67 
#&gt;  4  2009         0.644
#&gt;  5  2010         0.5  
#&gt;  6  2011         0.5  
#&gt;  7  2012         0.5  
#&gt;  8  2013         0.5  
#&gt;  9  2014         0.5  
#&gt; 10  2015         0.5  
#&gt; 11  2016         0.398
#&gt; 12  2017         0.291
#&gt; 13  2018         0.604
#&gt; 14  2019         0.75 
#&gt; 15  2020         0.228
#&gt; 16  2021         0.107
#&gt; 17  2022         1.47 
#&gt; 18  2023         3.97 

答案2

得分: 1

Inelegant solution:

# 向数据框中添加每年起始日期的日期列表
df = rbind(df, data.table(date = as.Date(ISOdate(seq(2005,2023),1,1)), "interest_rate" = NA), fill=T)

# 设置排序
setorder(df, -date)

# 将新日期的值设置为当前利率
df[is.na(interest_rate), interest_rate := df[df[is.na(interest_rate)], on = .(date), interest_rate, roll=Inf]]

# 标识前一个移动日期是什么
df[, change_date := shift(date, 1L)]

# 获取下一个利率变化或年底
df[, next_change_date_or_year := min(as.Date(ISOdate(year,12,31)), change_date, na.rm=T), date]

# 获取年内相对持续时间
df[, relative_duration := as.numeric((next_change_date_or_year-date)/365)]

df[, year := year(date)]

# 每年的加权平均值
yearly_IR = df[!is.na(year), .(weight_interest_rate = weighted.mean(`interest rate`, relative_duration)), year]
英文:

Inelegant solution:

# add to the dataframe a list of dates for the start of each year
df= rbind(df, data.table(date = as.Date(ISOdate(seq(2005,2023),1,1)), &quot;interest_rate&quot; = NA), fill=T)

# set order
setorder(df, -date)

# set value for the new dates to the current interest rate
df[is.na(interest_rate), interest_rate := df[df[is.na(interest_rate)], on = .(date), interest_rate, roll=Inf]]

# identify the previous shift date was
df[, change_date := shift(date, 1L)]

# get next rate change or end of year
df[, next_change_date_or_year := min(as.Date(ISOdate(year,12,31)), change_date, na.rm=T), date]

# get relative duration within year
df[, relative_duration := as.numeric((next_change_date_or_year-date)/365)]

df[, year := year(date)]

# weighted average per year
yearly_IR = df[!is.na(year), .(weight_interest_rate = weighted.mean(`interest rate`, relative_duration)), year]

huangapple
  • 本文由 发表于 2023年6月19日 19:15:57
  • 转载请务必保留本文链接:https://go.coder-hub.com/76506096.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定