英文:
Weighted average each year - R
问题
我想要计算每年(2006年至2023年)的interest_rate
的平均值。我有一系列有关新利率和日期的利率决策日期。
对于在一年内有多个interest_rates
的年份,我想按照该特定利率在该年份中所占比例来加权平均值。
示例数据:
library(data.table)
df = data.table(date = c("03/08/2006", "09/11/2006", "11/01/2007", "10/05/2007", "05/07/2007", "06/12/2007", "07/02/2008", "10/04/2008", "08/10/2008", "06/11/2008", "04/12/2008", "08/01/2009", "05/02/2009", "05/03/2009", "04/08/2016", "02/11/2017", "02/08/2018", "11/03/2020", "19/03/2020", "16/12/2021", "03/02/2022", "17/03/2022", "05/05/2022", "16/06/2022", "04/08/2022", "22/09/2022", "03/11/2022", "15/12/2022", "02/02/2023", "23/03/2023", "11/05/2023"),
interest_rate = c(4.7500, 5.0000, 5.2500, 5.5000, 5.7500, 5.5000, 5.2500, 5.0000, 4.5000, 3.0000, 2.0000, 1.5000, 1.0000, 0.5000, 0.2500, 0.5000, 0.7500, 0.2500, 0.1000, 0.2500, 0.5000, 0.7500, 1.0000, 1.2500, 1.7500, 2.2500, 3.0000, 3.5000, 4.0000, 4.2500, 4.500))
结果数据将如下所示:
年份 | 加权平均利率 |
---|---|
2006 | 4.7 |
2007 | ... |
2008 | 等等 |
英文:
I am looking to calculate the average of the interest_rate
for each year
(2006 to 2023). I have a series of interest rate decision dates for the new rate and the date.
For years with multiple interest_rates
during a year I would like to weight the average value by the proportion of year that at that particular rate.
Example data:
library(data.table)
df = data.table(date = c("03/08/2006", "09/11/2006", "11/01/2007", "10/05/2007", "05/07/2007", "06/12/2007", "07/02/2008", "10/04/2008", "08/10/2008", "06/11/2008", "04/12/2008", "08/01/2009", "05/02/2009", "05/03/2009", "04/08/2016", "02/11/2017", "02/08/2018", "11/03/2020", "19/03/2020", "16/12/2021", "03/02/2022", "17/03/2022", "05/05/2022", "16/06/2022", "04/08/2022", "22/09/2022", "03/11/2022", "15/12/2022", "02/02/2023", "23/03/2023", "11/05/2023"),
interest_rate = c(4.7500, 5.0000, 5.2500, 5.5000, 5.7500, 5.5000, 5.2500, 5.0000, 4.5000, 3.0000, 2.0000, 1.5000, 1.0000, 0.5000, 0.2500, 0.5000, 0.7500, 0.2500, 0.1000, 0.2500, 0.5000, 0.7500, 1.0000, 1.2500, 1.7500, 2.2500, 3.0000, 3.5000, 4.0000, 4.2500, 4.500))
The resultant data would look like:
year | weighted_interest_rate |
---|---|
2006 | 4.7 |
2007 | ... |
2008 | etc |
答案1
得分: 1
一种方法是创建一个包含整个时间序列中每一天的利率的数据框,按年份分组,然后计算每年每天的平均利率。
一个重要的第一步是将日期字符串转换为实际日期。
library(tidyverse)
df %>%
mutate(date = as.Date(date, '%d/%m/%Y'),
days = c(as.numeric(diff(date)), 0)) %>%
rowwise() %>%
reframe(date = seq(date, by = 'day', length.out = days),
interest_rate = interest_rate,
year = lubridate::year(date)) %>%
group_by(year) %>%
summarize(interest_rate = mean(interest_rate))
#> # A tibble: 18 x 2
#> year interest_rate
#> <dbl> <dbl>
#> 1 2006 4.84
#> 2 2007 5.51
#> 3 2008 4.67
#> 4 2009 0.644
#> 5 2010 0.5
#> 6 2011 0.5
#> 7 2012 0.5
#> 8 2013 0.5
#> 9 2014 0.5
#> 10 2015 0.5
#> 11 2016 0.398
#> 12 2017 0.291
#> 13 2018 0.604
#> 14 2019 0.75
#> 15 2020 0.228
#> 16 2021 0.107
#> 17 2022 1.47
#> 18 2023 3.97
英文:
One method would be to create a data frame of the interest rates for each day in the entire series, group by year, then average the interest rate for each date in the year.
An important first step is converting the dates-as-strings to actual dates.
library(tidyverse)
df %>%
mutate(date = as.Date(date, '%d/%m/%Y'),
days = c(as.numeric(diff(date)), 0)) %>%
rowwise() %>%
reframe(date = seq(date, by = 'day', length.out = days),
interest_rate = interest_rate,
year = lubridate::year(date)) %>%
group_by(year) %>%
summarize(interest_rate = mean(interest_rate))
#> # A tibble: 18 x 2
#> year interest_rate
#> <dbl> <dbl>
#> 1 2006 4.84
#> 2 2007 5.51
#> 3 2008 4.67
#> 4 2009 0.644
#> 5 2010 0.5
#> 6 2011 0.5
#> 7 2012 0.5
#> 8 2013 0.5
#> 9 2014 0.5
#> 10 2015 0.5
#> 11 2016 0.398
#> 12 2017 0.291
#> 13 2018 0.604
#> 14 2019 0.75
#> 15 2020 0.228
#> 16 2021 0.107
#> 17 2022 1.47
#> 18 2023 3.97
答案2
得分: 1
Inelegant solution:
# 向数据框中添加每年起始日期的日期列表
df = rbind(df, data.table(date = as.Date(ISOdate(seq(2005,2023),1,1)), "interest_rate" = NA), fill=T)
# 设置排序
setorder(df, -date)
# 将新日期的值设置为当前利率
df[is.na(interest_rate), interest_rate := df[df[is.na(interest_rate)], on = .(date), interest_rate, roll=Inf]]
# 标识前一个移动日期是什么
df[, change_date := shift(date, 1L)]
# 获取下一个利率变化或年底
df[, next_change_date_or_year := min(as.Date(ISOdate(year,12,31)), change_date, na.rm=T), date]
# 获取年内相对持续时间
df[, relative_duration := as.numeric((next_change_date_or_year-date)/365)]
df[, year := year(date)]
# 每年的加权平均值
yearly_IR = df[!is.na(year), .(weight_interest_rate = weighted.mean(`interest rate`, relative_duration)), year]
英文:
Inelegant solution:
# add to the dataframe a list of dates for the start of each year
df= rbind(df, data.table(date = as.Date(ISOdate(seq(2005,2023),1,1)), "interest_rate" = NA), fill=T)
# set order
setorder(df, -date)
# set value for the new dates to the current interest rate
df[is.na(interest_rate), interest_rate := df[df[is.na(interest_rate)], on = .(date), interest_rate, roll=Inf]]
# identify the previous shift date was
df[, change_date := shift(date, 1L)]
# get next rate change or end of year
df[, next_change_date_or_year := min(as.Date(ISOdate(year,12,31)), change_date, na.rm=T), date]
# get relative duration within year
df[, relative_duration := as.numeric((next_change_date_or_year-date)/365)]
df[, year := year(date)]
# weighted average per year
yearly_IR = df[!is.na(year), .(weight_interest_rate = weighted.mean(`interest rate`, relative_duration)), year]
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论