英文:
How can I calculate multiple weighted means in each column?
问题
I am trying to find a way to calculate weighted means for multiple years for each country. My data is currently formatted like:
df <- data.frame(
YEAR_CALENDAR=c(2020, 2020, 2020, 2020, 2020, 2021, 2021, 2021, 2021, 2021),
age_group=c("15-29", "30-44", "45-59", "60-74", "Over 75", "15-29", "30-44", "45-59", "60-74", "Over 75"),
a=c(3.85, 3.66, 3.76, 2.70, 3.10, 4.32, 4.64, 3.67, 3.45, 4.56),
b=c(3.56, 3.67, 3.72, 3.89, 4.23, 4.28, 4.27, 3.12, 3.46, 3.97),
weights=rep(c(0.3333784699, 0.2890995261, 0.2161137441, 0.1203791469, 0.04150304671), 2))
I would like to have a data frame with the weighted average of Country a and Country b in each year, using the coefficients in the weights column. The resulting data frame would have columns for Year, Country a, Country b, with the values being the weighted average of that country in that year.
My approach was to multiply the weights column to the a column and b column. Then, I would try to add the values of 5 rows together (for each year) to find the weighted average. In the first step of multiplying the weights, I am getting a "non-numeric argument to binary operator" error. The code is below:
new_df <- df %>%
mutate(across(3:4, .*df$weights))
I was in need of a fix for my approach, or for someone to help me get the results through a different method.
EDIT:
Okay, the multiplication part works! Thank you for the help. I was wondering if there was a way to add the values in each age_group based on the country and year, so that I have one value for each country in each year.
Many thanks!
英文:
strong textI am trying to find a way to calculate weighted means for multiple years for each country. My data is currently formatted like:
df <- data.frame(
YEAR_CALENDAR=c(2020, 2020, 2020, 2020, 2020, 2021, 2021, 2021, 2021, 2021),
age_group=c("15-29", "30-44", "45-59", "60-74", "Over 75", "15-29", "30-44", "45-59", "60-74", "Over 75"),
a=c(3.85, 3.66, 3.76, 2.70, 3.10, 4.32, 4.64, 3.67, 3.45, 4.56),
b=c(3.56, 3.67, 3.72, 3.89, 4.23, 4.28, 4.27, 3.12, 3.46, 3.97),
weights=rep(c(0.3333784699, 0.2890995261, 0.2161137441,
0.1203791469, 0.04150304671), 2))
I would like to have a data frame with the weighted average of Country a and and Country b in each year, using the coefficients in the weights column. The resulting data frame would have columns for Year, Country a, Country b, with the values being the weighted average of that country in that year.
My approach was to multiply the weights column to the a column and b column. Then, I would try to add the values of 5 rows together (for each year) to find the weighted average. In the first step of multiplying the weights, I am getting a "non-numeric argument to binary operator" error. The code is below:
new_df <- df %>%
mutate(across(3:4, .*df$weights))
I was in need of a fix for my approach, or for someone to help me get the results through a different method.
EDIT:
Okay, the multiplication part works! Thank you for the help. I was wondering if there was a way to add the values in each age_group based on the country and year, so that I have one value for each country in each year.
Many thanks!
答案1
得分: 0
你可以通过在 summarize
内部使用基础的 R weighted.mean
函数来简化操作
df %>%
group_by(YEAR_CALENDAR) %>%
summarise(across(a:b, ~ weighted.mean(.x, weights)))
#> # A tibble: 2 x 3
#> YEAR_CALENDAR a b
#> <dbl> <dbl> <dbl>
#> 1 2020 3.61 3.69
#> 2 2021 4.18 3.92
英文:
You can simplify things by using the base R weighted.mean
function inside summarize
df %>%
group_by(YEAR_CALENDAR) %>%
summarise(across(a:b, ~ weighted.mean(.x, weights)))
#> # A tibble: 2 x 3
#> YEAR_CALENDAR a b
#> <dbl> <dbl> <dbl>
#> 1 2020 3.61 3.69
#> 2 2021 4.18 3.92
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论