Calculate Row Decile/Quantile by Column Dplyr R

huangapple go评论59阅读模式
英文:

Calculate Row Decile/Quantile by Column Dplyr R

问题

我有一个数据框,其中包含按行显示在不同年份列中的价格回报。我想要在每个现有列之后插入一个新列,其中包含左侧年度回报列的值的十分位数或分位数值。

我可以为每行的每个列值添加排名,通过创建一个新数据框,使用以下代码:

test <- yearRetsMSA2 %>%
  mutate(across(c(cnam_year2[1]:cnam_year2[length(cnam_year2)]), rank))

其中yearRetsMSA2是按年份列出的表,行名按地区列出。cnam_year2是数据框yearRetsMSA2的列名的字符向量。每个列向量中都有一些NA值,但下面的脚本不起作用:

test2 <- yearRetsMSA2 %>%
mutate(across(c(cnam_year2[1]:cnam_year2[length(cnam_year2)]), quantile(na.omit())))

此外,对于解决分位数问题的解决方案,如何修改当前脚本以类似地在每个现有列旁插入一个排名列?最终表的格式将保留包含分位数/分位数脚本的卓越结构。

感激不尽地帮助解决这个问题!

英文:

I have a data frame that contains price returns by row displayed in columns that are each a different year. I'd like to either insert a new column after each existing column with a decile or quantile value for the value in the row of the annual return column to the left.

I can add a ranking for each row's value by column by creating a new data frame using the following:

test &lt;- yearRetsMSA2 %&gt;%
  mutate(across(c(cnam_year2[1]:cnam_year2[length(cnam_year2)]), rank))

Where yearRetsMSA2 is the table with column names by year and row names by territory. cnam_year2 is a character vector with the column names for the data frame yearRetsMSA2. There are some NA values in certain spots of each column vector, but the below script is not working:

test2 &lt;- yearRetsMSA2 %&gt;%
mutate(across(c(cnam_year2[1]:cnam_year2[length(cnam_year2)]), quantile(na.omit())))

structure(list(`1995 Return` = c(0.0151000000000001, 0.0463), 
    `1996 Return` = c(0.0361540734902965, 0.050750262830928), 
    `1997 Return` = c(0.036223616657159, 0.049208659268692), 
    `1998 Return` = c(0.0213781080833104, 0.0508019072388384), 
    `1999 Return` = c(0.0369205892921309, 0.023265407144625), 
    `2000 Return` = c(0.0177596811920644, 0.042892848504394), 
    `2001 Return` = c(0.0474123255022132, 0.0538074990336297), 
    `2002 Return` = c(0.0282811865095489, 0.0258968527620864), 
    `2003 Return` = c(-0.00505808899075322, 0.0240989702517163
    ), `2004 Return` = c(0.0660100087377868, 0.0309335940227635
    ), `2005 Return` = c(0.0777943368107303, 0.0308859387699811
    ), `2006 Return` = c(0.0893252212389382, -0.00683311432325884
    ), `2007 Return` = c(0.0338283828382837, -0.0302990209050013
    ), `2008 Return` = c(0.0355454601264658, -0.0375221721926593
    ), `2009 Return` = c(0.00361631491581682, -0.0233909838389567
    ), `2010 Return` = c(0.000472561876070809, -0.0121933517201336
    ), `2011 Return` = c(-0.0144653716714885, -0.0449669360764144
    ), `2012 Return` = c(0.0181524083393243, -0.012925065394676
    ), `2013 Return` = c(0.0614886731391586, 0.0127825409197193
    ), `2014 Return` = c(0.0437361419068736, 0.0333230721871633
    ), `2015 Return` = c(0.0364331616124065, 0.0430475906755046
    ), `2016 Return` = c(0.0472457084294133, 0.0165655123170296
    ), `2017 Return` = c(0.0218231638694526, 0.0523986794970852
    ), `2018 Return` = c(0.0755159699276924, 0.036975238603751
    ), `2019 Return` = c(0.0231967943009797, 0.0610800025744997
    ), `2020 Return` = c(0.0486488838605805, 0.0724857454810142
    ), `2021 Return` = c(0.196107722312129, 0.140093886092416
    ), `2022 Return` = c(0.069071986123157, 0.119059430499058
    )), row.names = c(&quot;Abilene, TX&quot;, &quot;Akron, OH&quot;), class = &quot;data.frame&quot;)

Additionally, with the solution here that inserts a new column next to each column solving the quantile issue, how would you modify the current script to similarly insert a column with the ranking next to each existing column? The final table's format would retain the excellent structure included with the quantile/decile script.

Help solving this is much appreciated!

答案1

得分: 0

a generic approach with {dplyr}:

library(dplyr)

mtcars %>%
  mutate(across(where(is.numeric),
                .fns = ~ findInterval(.x, quantile(.x, c(.2 * 1:5), na.rm = TRUE)),
                .names = "{.col}_quantile")) %>%
  select(names(.) %>% sort)

Anyhow, the tidyverse way would be to pivot_longer your dataframe, group it, apply the desired mutateions and (if need be) pivot_wider it to wide format again.

Example (df being your sample dataframe):

library(tidyr)
library(dplyr)

df_long <- 
  df %>%
  tibble::rownames_to_column('state') %>%
  pivot_longer(cols = ends_with('Return'),
               names_to = 'year',
               values_to = 'return'
               ) %>%
  mutate(year = gsub(' .*', '', year)) %>%
  group_by(state) %>%
  mutate(quant = findInterval(return, quantile(return, 1:5 * .2, na.rm = TRUE))) %>%
  ungroup()

df_long %>%
  pivot_wider(names_from = year, 
              values_from = c('return', 'quant'),
              names_vary = 'slowest'
              )
英文:

a generic approach with {dplyr}:

library(dplyr)

mtcars %&gt;%
  mutate(across(where(is.numeric),
                .fns = ~ findInterval(.x, quantile(.x, c(.2 * 1:5), na.rm = TRUE)),
                .names = &quot;{.col}_quantile&quot;)) %&gt;%
  select(names(.) %&gt;% sort)

Anyhow, the tidyverse way would be to pivot_longer your dataframe, group it, apply the desired mutateions and (if need be) pivot_wider it to wide format again.

Example (df being your sample dataframe):

library(tidyr)
library(dplyr)

df_long &lt;- 
  df |&gt;
  tibble::rownames_to_column(&#39;state&#39;) |&gt;
  pivot_longer(cols = ends_with(&#39;Return&#39;),
               names_to = &#39;year&#39;,
               values_to = &#39;return&#39;
               ) |&gt;
  mutate(year = gsub(&#39; .*&#39;, &#39;&#39;, year)) |&gt;
  group_by(state) |&gt;
  mutate(quant = findInterval(return, quantile(return, 1:5 * .2, na.rm = TRUE))) |&gt;
  ungroup()


df_long |&gt;
  pivot_wider(names_from = year, 
              values_from = c(&#39;return&#39;, &#39;quant&#39;),
              names_vary = &#39;slowest&#39;
              )

huangapple
  • 本文由 发表于 2023年3月4日 07:17:46
  • 转载请务必保留本文链接:https://go.coder-hub.com/75632632.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定