2023年3月4日 07:17:46go评论107阅读模式

英文:

Calculate Row Decile/Quantile by Column Dplyr R

问题

我有一个数据框，其中包含按行显示在不同年份列中的价格回报。我想要在每个现有列之后插入一个新列，其中包含左侧年度回报列的值的十分位数或分位数值。

我可以为每行的每个列值添加排名，通过创建一个新数据框，使用以下代码：

test <- yearRetsMSA2 %>%
  mutate(across(c(cnam_year2[1]:cnam_year2[length(cnam_year2)]), rank))

其中yearRetsMSA2是按年份列出的表，行名按地区列出。cnam_year2是数据框yearRetsMSA2的列名的字符向量。每个列向量中都有一些NA值，但下面的脚本不起作用：

test2 <- yearRetsMSA2 %>%
mutate(across(c(cnam_year2[1]:cnam_year2[length(cnam_year2)]), quantile(na.omit())))

此外，对于解决分位数问题的解决方案，如何修改当前脚本以类似地在每个现有列旁插入一个排名列？最终表的格式将保留包含分位数/分位数脚本的卓越结构。

感激不尽地帮助解决这个问题！

英文:

I have a data frame that contains price returns by row displayed in columns that are each a different year. I'd like to either insert a new column after each existing column with a decile or quantile value for the value in the row of the annual return column to the left.

I can add a ranking for each row's value by column by creating a new data frame using the following:

test &lt;- yearRetsMSA2 %&gt;%
  mutate(across(c(cnam_year2[1]:cnam_year2[length(cnam_year2)]), rank))

Where yearRetsMSA2 is the table with column names by year and row names by territory. cnam_year2 is a character vector with the column names for the data frame yearRetsMSA2. There are some NA values in certain spots of each column vector, but the below script is not working:

test2 &lt;- yearRetsMSA2 %&gt;%
mutate(across(c(cnam_year2[1]:cnam_year2[length(cnam_year2)]), quantile(na.omit())))
structure(list(`1995 Return` = c(0.0151000000000001, 0.0463), 
    `1996 Return` = c(0.0361540734902965, 0.050750262830928), 
    `1997 Return` = c(0.036223616657159, 0.049208659268692), 
    `1998 Return` = c(0.0213781080833104, 0.0508019072388384), 
    `1999 Return` = c(0.0369205892921309, 0.023265407144625), 
    `2000 Return` = c(0.0177596811920644, 0.042892848504394), 
    `2001 Return` = c(0.0474123255022132, 0.0538074990336297), 
    `2002 Return` = c(0.0282811865095489, 0.0258968527620864), 
    `2003 Return` = c(-0.00505808899075322, 0.0240989702517163
    ), `2004 Return` = c(0.0660100087377868, 0.0309335940227635
    ), `2005 Return` = c(0.0777943368107303, 0.0308859387699811
    ), `2006 Return` = c(0.0893252212389382, -0.00683311432325884
    ), `2007 Return` = c(0.0338283828382837, -0.0302990209050013
    ), `2008 Return` = c(0.0355454601264658, -0.0375221721926593
    ), `2009 Return` = c(0.00361631491581682, -0.0233909838389567
    ), `2010 Return` = c(0.000472561876070809, -0.0121933517201336
    ), `2011 Return` = c(-0.0144653716714885, -0.0449669360764144
    ), `2012 Return` = c(0.0181524083393243, -0.012925065394676
    ), `2013 Return` = c(0.0614886731391586, 0.0127825409197193
    ), `2014 Return` = c(0.0437361419068736, 0.0333230721871633
    ), `2015 Return` = c(0.0364331616124065, 0.0430475906755046
    ), `2016 Return` = c(0.0472457084294133, 0.0165655123170296
    ), `2017 Return` = c(0.0218231638694526, 0.0523986794970852
    ), `2018 Return` = c(0.0755159699276924, 0.036975238603751
    ), `2019 Return` = c(0.0231967943009797, 0.0610800025744997
    ), `2020 Return` = c(0.0486488838605805, 0.0724857454810142
    ), `2021 Return` = c(0.196107722312129, 0.140093886092416
    ), `2022 Return` = c(0.069071986123157, 0.119059430499058
    )), row.names = c(&quot;Abilene, TX&quot;, &quot;Akron, OH&quot;), class = &quot;data.frame&quot;)

Additionally, with the solution here that inserts a new column next to each column solving the quantile issue, how would you modify the current script to similarly insert a column with the ranking next to each existing column? The final table's format would retain the excellent structure included with the quantile/decile script.

Help solving this is much appreciated!

答案1

得分: 0

a generic approach with {dplyr}:

library(dplyr)
mtcars %>%
  mutate(across(where(is.numeric),
                .fns = ~ findInterval(.x, quantile(.x, c(.2 * 1:5), na.rm = TRUE)),
                .names = "{.col}_quantile")) %>%
  select(names(.) %>% sort)

Anyhow, the tidyverse way would be to pivot_longer your dataframe, group it, apply the desired mutateions and (if need be) pivot_wider it to wide format again.

Example (df being your sample dataframe):

library(tidyr)
library(dplyr)
df_long <- 
  df %>%
  tibble::rownames_to_column('state') %>%
  pivot_longer(cols = ends_with('Return'),
               names_to = 'year',
               values_to = 'return'
               ) %>%
  mutate(year = gsub(' .*', '', year)) %>%
  group_by(state) %>%
  mutate(quant = findInterval(return, quantile(return, 1:5 * .2, na.rm = TRUE))) %>%
  ungroup()
df_long %>%
  pivot_wider(names_from = year, 
              values_from = c('return', 'quant'),
              names_vary = 'slowest'
              )

英文:

a generic approach with {dplyr}:

library(dplyr)
mtcars %&gt;%
  mutate(across(where(is.numeric),
                .fns = ~ findInterval(.x, quantile(.x, c(.2 * 1:5), na.rm = TRUE)),
                .names = &quot;{.col}_quantile&quot;)) %&gt;%
  select(names(.) %&gt;% sort)

Anyhow, the tidyverse way would be to pivot_longer your dataframe, group it, apply the desired mutateions and (if need be) pivot_wider it to wide format again.

Example (df being your sample dataframe):

library(tidyr)
library(dplyr)
df_long &lt;- 
  df |&gt;
  tibble::rownames_to_column(&#39;state&#39;) |&gt;
  pivot_longer(cols = ends_with(&#39;Return&#39;),
               names_to = &#39;year&#39;,
               values_to = &#39;return&#39;
               ) |&gt;
  mutate(year = gsub(&#39; .*&#39;, &#39;&#39;, year)) |&gt;
  group_by(state) |&gt;
  mutate(quant = findInterval(return, quantile(return, 1:5 * .2, na.rm = TRUE))) |&gt;
  ungroup()
df_long |&gt;
  pivot_wider(names_from = year, 
              values_from = c(&#39;return&#39;, &#39;quant&#39;),
              names_vary = &#39;slowest&#39;
              )

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Calculate Row Decile/Quantile by Column Dplyr R

问题

答案1

组织结构图与独立节点

如何在RStudio中加载一个加密的sqlite文件？

R – ggplot 在国际国家上映射数据

如何基于一个起始数字在一列中创建重复数字序列？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。