Calculate Row Decile/Quantile by Column Dplyr R

huangapple go评论107阅读模式
英文:

Calculate Row Decile/Quantile by Column Dplyr R

问题

我有一个数据框,其中包含按行显示在不同年份列中的价格回报。我想要在每个现有列之后插入一个新列,其中包含左侧年度回报列的值的十分位数或分位数值。

我可以为每行的每个列值添加排名,通过创建一个新数据框,使用以下代码:

  1. test <- yearRetsMSA2 %>%
  2. mutate(across(c(cnam_year2[1]:cnam_year2[length(cnam_year2)]), rank))

其中yearRetsMSA2是按年份列出的表,行名按地区列出。cnam_year2是数据框yearRetsMSA2的列名的字符向量。每个列向量中都有一些NA值,但下面的脚本不起作用:

  1. test2 <- yearRetsMSA2 %>%
  2. mutate(across(c(cnam_year2[1]:cnam_year2[length(cnam_year2)]), quantile(na.omit())))

此外,对于解决分位数问题的解决方案,如何修改当前脚本以类似地在每个现有列旁插入一个排名列?最终表的格式将保留包含分位数/分位数脚本的卓越结构。

感激不尽地帮助解决这个问题!

英文:

I have a data frame that contains price returns by row displayed in columns that are each a different year. I'd like to either insert a new column after each existing column with a decile or quantile value for the value in the row of the annual return column to the left.

I can add a ranking for each row's value by column by creating a new data frame using the following:

  1. test &lt;- yearRetsMSA2 %&gt;%
  2. mutate(across(c(cnam_year2[1]:cnam_year2[length(cnam_year2)]), rank))

Where yearRetsMSA2 is the table with column names by year and row names by territory. cnam_year2 is a character vector with the column names for the data frame yearRetsMSA2. There are some NA values in certain spots of each column vector, but the below script is not working:

  1. test2 &lt;- yearRetsMSA2 %&gt;%
  2. mutate(across(c(cnam_year2[1]:cnam_year2[length(cnam_year2)]), quantile(na.omit())))
  3. structure(list(`1995 Return` = c(0.0151000000000001, 0.0463),
  4. `1996 Return` = c(0.0361540734902965, 0.050750262830928),
  5. `1997 Return` = c(0.036223616657159, 0.049208659268692),
  6. `1998 Return` = c(0.0213781080833104, 0.0508019072388384),
  7. `1999 Return` = c(0.0369205892921309, 0.023265407144625),
  8. `2000 Return` = c(0.0177596811920644, 0.042892848504394),
  9. `2001 Return` = c(0.0474123255022132, 0.0538074990336297),
  10. `2002 Return` = c(0.0282811865095489, 0.0258968527620864),
  11. `2003 Return` = c(-0.00505808899075322, 0.0240989702517163
  12. ), `2004 Return` = c(0.0660100087377868, 0.0309335940227635
  13. ), `2005 Return` = c(0.0777943368107303, 0.0308859387699811
  14. ), `2006 Return` = c(0.0893252212389382, -0.00683311432325884
  15. ), `2007 Return` = c(0.0338283828382837, -0.0302990209050013
  16. ), `2008 Return` = c(0.0355454601264658, -0.0375221721926593
  17. ), `2009 Return` = c(0.00361631491581682, -0.0233909838389567
  18. ), `2010 Return` = c(0.000472561876070809, -0.0121933517201336
  19. ), `2011 Return` = c(-0.0144653716714885, -0.0449669360764144
  20. ), `2012 Return` = c(0.0181524083393243, -0.012925065394676
  21. ), `2013 Return` = c(0.0614886731391586, 0.0127825409197193
  22. ), `2014 Return` = c(0.0437361419068736, 0.0333230721871633
  23. ), `2015 Return` = c(0.0364331616124065, 0.0430475906755046
  24. ), `2016 Return` = c(0.0472457084294133, 0.0165655123170296
  25. ), `2017 Return` = c(0.0218231638694526, 0.0523986794970852
  26. ), `2018 Return` = c(0.0755159699276924, 0.036975238603751
  27. ), `2019 Return` = c(0.0231967943009797, 0.0610800025744997
  28. ), `2020 Return` = c(0.0486488838605805, 0.0724857454810142
  29. ), `2021 Return` = c(0.196107722312129, 0.140093886092416
  30. ), `2022 Return` = c(0.069071986123157, 0.119059430499058
  31. )), row.names = c(&quot;Abilene, TX&quot;, &quot;Akron, OH&quot;), class = &quot;data.frame&quot;)

Additionally, with the solution here that inserts a new column next to each column solving the quantile issue, how would you modify the current script to similarly insert a column with the ranking next to each existing column? The final table's format would retain the excellent structure included with the quantile/decile script.

Help solving this is much appreciated!

答案1

得分: 0

a generic approach with {dplyr}:

  1. library(dplyr)
  2. mtcars %>%
  3. mutate(across(where(is.numeric),
  4. .fns = ~ findInterval(.x, quantile(.x, c(.2 * 1:5), na.rm = TRUE)),
  5. .names = "{.col}_quantile")) %>%
  6. select(names(.) %>% sort)

Anyhow, the tidyverse way would be to pivot_longer your dataframe, group it, apply the desired mutateions and (if need be) pivot_wider it to wide format again.

Example (df being your sample dataframe):

  1. library(tidyr)
  2. library(dplyr)
  3. df_long <-
  4. df %>%
  5. tibble::rownames_to_column('state') %>%
  6. pivot_longer(cols = ends_with('Return'),
  7. names_to = 'year',
  8. values_to = 'return'
  9. ) %>%
  10. mutate(year = gsub(' .*', '', year)) %>%
  11. group_by(state) %>%
  12. mutate(quant = findInterval(return, quantile(return, 1:5 * .2, na.rm = TRUE))) %>%
  13. ungroup()
  14. df_long %>%
  15. pivot_wider(names_from = year,
  16. values_from = c('return', 'quant'),
  17. names_vary = 'slowest'
  18. )
英文:

a generic approach with {dplyr}:

  1. library(dplyr)
  2. mtcars %&gt;%
  3. mutate(across(where(is.numeric),
  4. .fns = ~ findInterval(.x, quantile(.x, c(.2 * 1:5), na.rm = TRUE)),
  5. .names = &quot;{.col}_quantile&quot;)) %&gt;%
  6. select(names(.) %&gt;% sort)

Anyhow, the tidyverse way would be to pivot_longer your dataframe, group it, apply the desired mutateions and (if need be) pivot_wider it to wide format again.

Example (df being your sample dataframe):

  1. library(tidyr)
  2. library(dplyr)
  3. df_long &lt;-
  4. df |&gt;
  5. tibble::rownames_to_column(&#39;state&#39;) |&gt;
  6. pivot_longer(cols = ends_with(&#39;Return&#39;),
  7. names_to = &#39;year&#39;,
  8. values_to = &#39;return&#39;
  9. ) |&gt;
  10. mutate(year = gsub(&#39; .*&#39;, &#39;&#39;, year)) |&gt;
  11. group_by(state) |&gt;
  12. mutate(quant = findInterval(return, quantile(return, 1:5 * .2, na.rm = TRUE))) |&gt;
  13. ungroup()
  14. df_long |&gt;
  15. pivot_wider(names_from = year,
  16. values_from = c(&#39;return&#39;, &#39;quant&#39;),
  17. names_vary = &#39;slowest&#39;
  18. )

huangapple
  • 本文由 发表于 2023年3月4日 07:17:46
  • 转载请务必保留本文链接:https://go.coder-hub.com/75632632.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定