英文:
Calculate Row Decile/Quantile by Column Dplyr R
问题
我有一个数据框,其中包含按行显示在不同年份列中的价格回报。我想要在每个现有列之后插入一个新列,其中包含左侧年度回报列的值的十分位数或分位数值。
我可以为每行的每个列值添加排名,通过创建一个新数据框,使用以下代码:
test <- yearRetsMSA2 %>%
mutate(across(c(cnam_year2[1]:cnam_year2[length(cnam_year2)]), rank))
其中yearRetsMSA2
是按年份列出的表,行名按地区列出。cnam_year2
是数据框yearRetsMSA2
的列名的字符向量。每个列向量中都有一些NA值,但下面的脚本不起作用:
test2 <- yearRetsMSA2 %>%
mutate(across(c(cnam_year2[1]:cnam_year2[length(cnam_year2)]), quantile(na.omit())))
此外,对于解决分位数问题的解决方案,如何修改当前脚本以类似地在每个现有列旁插入一个排名列?最终表的格式将保留包含分位数/分位数脚本的卓越结构。
感激不尽地帮助解决这个问题!
英文:
I have a data frame that contains price returns by row displayed in columns that are each a different year. I'd like to either insert a new column after each existing column with a decile or quantile value for the value in the row of the annual return column to the left.
I can add a ranking for each row's value by column by creating a new data frame using the following:
test <- yearRetsMSA2 %>%
mutate(across(c(cnam_year2[1]:cnam_year2[length(cnam_year2)]), rank))
Where yearRetsMSA2 is the table with column names by year and row names by territory. cnam_year2 is a character vector with the column names for the data frame yearRetsMSA2. There are some NA values in certain spots of each column vector, but the below script is not working:
test2 <- yearRetsMSA2 %>%
mutate(across(c(cnam_year2[1]:cnam_year2[length(cnam_year2)]), quantile(na.omit())))
structure(list(`1995 Return` = c(0.0151000000000001, 0.0463),
`1996 Return` = c(0.0361540734902965, 0.050750262830928),
`1997 Return` = c(0.036223616657159, 0.049208659268692),
`1998 Return` = c(0.0213781080833104, 0.0508019072388384),
`1999 Return` = c(0.0369205892921309, 0.023265407144625),
`2000 Return` = c(0.0177596811920644, 0.042892848504394),
`2001 Return` = c(0.0474123255022132, 0.0538074990336297),
`2002 Return` = c(0.0282811865095489, 0.0258968527620864),
`2003 Return` = c(-0.00505808899075322, 0.0240989702517163
), `2004 Return` = c(0.0660100087377868, 0.0309335940227635
), `2005 Return` = c(0.0777943368107303, 0.0308859387699811
), `2006 Return` = c(0.0893252212389382, -0.00683311432325884
), `2007 Return` = c(0.0338283828382837, -0.0302990209050013
), `2008 Return` = c(0.0355454601264658, -0.0375221721926593
), `2009 Return` = c(0.00361631491581682, -0.0233909838389567
), `2010 Return` = c(0.000472561876070809, -0.0121933517201336
), `2011 Return` = c(-0.0144653716714885, -0.0449669360764144
), `2012 Return` = c(0.0181524083393243, -0.012925065394676
), `2013 Return` = c(0.0614886731391586, 0.0127825409197193
), `2014 Return` = c(0.0437361419068736, 0.0333230721871633
), `2015 Return` = c(0.0364331616124065, 0.0430475906755046
), `2016 Return` = c(0.0472457084294133, 0.0165655123170296
), `2017 Return` = c(0.0218231638694526, 0.0523986794970852
), `2018 Return` = c(0.0755159699276924, 0.036975238603751
), `2019 Return` = c(0.0231967943009797, 0.0610800025744997
), `2020 Return` = c(0.0486488838605805, 0.0724857454810142
), `2021 Return` = c(0.196107722312129, 0.140093886092416
), `2022 Return` = c(0.069071986123157, 0.119059430499058
)), row.names = c("Abilene, TX", "Akron, OH"), class = "data.frame")
Additionally, with the solution here that inserts a new column next to each column solving the quantile issue, how would you modify the current script to similarly insert a column with the ranking next to each existing column? The final table's format would retain the excellent structure included with the quantile/decile script.
Help solving this is much appreciated!
答案1
得分: 0
a generic approach with {dplyr}:
library(dplyr)
mtcars %>%
mutate(across(where(is.numeric),
.fns = ~ findInterval(.x, quantile(.x, c(.2 * 1:5), na.rm = TRUE)),
.names = "{.col}_quantile")) %>%
select(names(.) %>% sort)
Anyhow, the tidyverse way would be to pivot_longer
your dataframe, group it, apply the desired mutate
ions and (if need be) pivot_wider
it to wide format again.
Example (df
being your sample dataframe):
library(tidyr)
library(dplyr)
df_long <-
df %>%
tibble::rownames_to_column('state') %>%
pivot_longer(cols = ends_with('Return'),
names_to = 'year',
values_to = 'return'
) %>%
mutate(year = gsub(' .*', '', year)) %>%
group_by(state) %>%
mutate(quant = findInterval(return, quantile(return, 1:5 * .2, na.rm = TRUE))) %>%
ungroup()
df_long %>%
pivot_wider(names_from = year,
values_from = c('return', 'quant'),
names_vary = 'slowest'
)
英文:
a generic approach with {dplyr}:
library(dplyr)
mtcars %>%
mutate(across(where(is.numeric),
.fns = ~ findInterval(.x, quantile(.x, c(.2 * 1:5), na.rm = TRUE)),
.names = "{.col}_quantile")) %>%
select(names(.) %>% sort)
Anyhow, the tidyverse way would be to pivot_longer
your dataframe, group it, apply the desired mutate
ions and (if need be) pivot_wider
it to wide format again.
Example (df
being your sample dataframe):
library(tidyr)
library(dplyr)
df_long <-
df |>
tibble::rownames_to_column('state') |>
pivot_longer(cols = ends_with('Return'),
names_to = 'year',
values_to = 'return'
) |>
mutate(year = gsub(' .*', '', year)) |>
group_by(state) |>
mutate(quant = findInterval(return, quantile(return, 1:5 * .2, na.rm = TRUE))) |>
ungroup()
df_long |>
pivot_wider(names_from = year,
values_from = c('return', 'quant'),
names_vary = 'slowest'
)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论