dplyr – 获取数据框架多列的特定摘要统计信息

huangapple go评论76阅读模式
英文:

dplyr – get certain summary statics for multiple columns of a dataframe

问题

I want to create a summary statistics table for some summary functions for multiple variables. I've managed to do it using summarise and across, but I get a wide dataframe which is hard to read. Is there a better alternative (perhaps using purrr), or is there an easy way of reshaping the data?

Here is a reproducible example (the funs list contains additional functions I've created myself):

data <- as.data.frame(cbind(estimator1 = rnorm(3), 
                            estimator2 = runif(3)))
funs <- list(mean = mean, median = median)

If I use summarise and across I obtain:

estimator1_mean estimator1_median estimator2_mean estimator2_median
0.9506083          1.138536       0.5789924         0.7598719

What I would like to obtain is:

         estimator1 estimator2
mean     0.9506083  0.5789924        
median   1.138536   0.7598719
英文:

I want to create a summary statistics table for some summary functions for multiple variables. I've managed to do it using summarise and across, but I get a wide dataframe which is hard to read. Is there a better alternative (perhaps using purrr), or is there an easy way of reshaping the data?

Here is a reproducible example (the funs list contains additional functions I've created myself):

data &lt;- as.data.frame(cbind(estimator1 = rnorm(3), 
                            estimator2 = runif(3)))
funs &lt;- list(mean = mean, median = median)

If I use summarise and across I obtain:

estimator1_mean estimator1_median estimator2_mean estimator2_median
0.9506083          1.138536       0.5789924         0.7598719

What I would like to obtain is:

         estimator1 estimator2
mean     0.9506083  0.5789924        
median   1.138536   0.7598719

答案1

得分: 2

基于 R 的方法:

使用 sapply

sapply(data, \(x) sapply(funs, \(f) f(x) )) 将嵌套应用 sapply() 函数到 datafuns。对于 data 的每个元素 x,它使用内部的 sapply() 函数将每个 funs 中的函数 f 应用到 x 上。

两个被应用的函数都是匿名函数,使用 \(f) 语法定义,它们接受一个参数 f

假设我们有给定的 funs &lt;- list(mean = mean, median = median)

这段代码 sapply(data, \(x) sapply(funs, \(f) f(x) )) 将应用 mean()median()data 的每个元素,并返回一个包含结果的矩阵:

sapply(data, \(x) sapply(funs, \(f) f(x) ))
       estimator1 estimator2
mean    0.3081365  0.4251447
median  0.2159416  0.3198206
英文:

base R approach:

Using sapply:

sapply(data, \(x) sapply(funs, \(f) f(x) )) is applying a nested sapply() function to data and funs. For each element x of data, it applies each function f in funs to x using the inner sapply() function.

Both functions applied are anonymous functions defined with the \(f) syntax, which takes one argument f.

Having our given funs &lt;- list(mean = mean, median = median)

The code sapply(data, \(x) sapply(funs, \(f) f(x) )) will apply mean() and median() to each element of data and return a matrix with the results:

sapply(data, \(x) sapply(funs, \(f) f(x) ))
       estimator1 estimator2
mean    0.3081365  0.4251447
median  0.2159416  0.3198206

答案2

得分: 1

你可以使用 pivot_longer().value(".value" 表示列名的相应部分定义了包含单元格值的输出列名,完全覆盖了 values_to,请参阅这里),例如:

library(dplyr)  
data %>%
  summarise(across(everything(), list(mean = mean, median = median, var = var))) %>%
  tidyr::pivot_longer(cols = everything(), names_to = c(".value", "stats"), names_sep = "_")

输出如下:

# A tibble: 3 × 3
  stats    estimator1 estimator2
  <chr>        <dbl>        <dbl>
1 mean         0.221        0.448
2 median       0.110        0.429
3 var          0.770        0.00288
英文:

You can use pivot_longer() with .value (".value" indicates that the corresponding component of the column name defines the name of the output column containing the cell values, overriding values_to entirely, see here), eg.

  library(dplyr)  
  data |&gt;
    summarise(across(everything(), list(mean = mean, median = median, var = var))) |&gt;
    tidyr::pivot_longer(cols = everything(), names_to = c(&quot;.value&quot;, &quot;stats&quot;), names_sep = &quot;_&quot;)

  stats  estimator1 estimator2
  &lt;chr&gt;       &lt;dbl&gt;      &lt;dbl&gt;
1 mean        0.221    0.448  
2 median      0.110    0.429  
3 var         0.770    0.00288

huangapple
  • 本文由 发表于 2023年3月8日 18:24:13
  • 转载请务必保留本文链接:https://go.coder-hub.com/75671819.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定