英文:
dplyr – get certain summary statics for multiple columns of a dataframe
问题
I want to create a summary statistics table for some summary functions for multiple variables. I've managed to do it using summarise and across, but I get a wide dataframe which is hard to read. Is there a better alternative (perhaps using purrr), or is there an easy way of reshaping the data?
Here is a reproducible example (the funs list contains additional functions I've created myself):
data <- as.data.frame(cbind(estimator1 = rnorm(3),
estimator2 = runif(3)))
funs <- list(mean = mean, median = median)
If I use summarise and across I obtain:
estimator1_mean estimator1_median estimator2_mean estimator2_median
0.9506083 1.138536 0.5789924 0.7598719
What I would like to obtain is:
estimator1 estimator2
mean 0.9506083 0.5789924
median 1.138536 0.7598719
英文:
I want to create a summary statistics table for some summary functions for multiple variables. I've managed to do it using summarise and across, but I get a wide dataframe which is hard to read. Is there a better alternative (perhaps using purrr), or is there an easy way of reshaping the data?
Here is a reproducible example (the funs list contains additional functions I've created myself):
data <- as.data.frame(cbind(estimator1 = rnorm(3),
estimator2 = runif(3)))
funs <- list(mean = mean, median = median)
If I use summarise and across I obtain:
estimator1_mean estimator1_median estimator2_mean estimator2_median
0.9506083 1.138536 0.5789924 0.7598719
What I would like to obtain is:
estimator1 estimator2
mean 0.9506083 0.5789924
median 1.138536 0.7598719
答案1
得分: 2
基于 R 的方法:
使用 sapply:
sapply(data, \(x) sapply(funs, \(f) f(x) )) 将嵌套应用 sapply() 函数到 data 和 funs。对于 data 的每个元素 x,它使用内部的 sapply() 函数将每个 funs 中的函数 f 应用到 x 上。
两个被应用的函数都是匿名函数,使用 \(f) 语法定义,它们接受一个参数 f。
假设我们有给定的 funs <- list(mean = mean, median = median)。
这段代码 sapply(data, \(x) sapply(funs, \(f) f(x) )) 将应用 mean() 和 median() 到 data 的每个元素,并返回一个包含结果的矩阵:
sapply(data, \(x) sapply(funs, \(f) f(x) ))
estimator1 estimator2
mean 0.3081365 0.4251447
median 0.2159416 0.3198206
英文:
base R approach:
Using sapply:
sapply(data, \(x) sapply(funs, \(f) f(x) )) is applying a nested sapply() function to data and funs. For each element x of data, it applies each function f in funs to x using the inner sapply() function.
Both functions applied are anonymous functions defined with the \(f) syntax, which takes one argument f.
Having our given funs <- list(mean = mean, median = median)
The code sapply(data, \(x) sapply(funs, \(f) f(x) )) will apply mean() and median() to each element of data and return a matrix with the results:
sapply(data, \(x) sapply(funs, \(f) f(x) ))
estimator1 estimator2
mean 0.3081365 0.4251447
median 0.2159416 0.3198206
答案2
得分: 1
你可以使用 pivot_longer() 与 .value(".value" 表示列名的相应部分定义了包含单元格值的输出列名,完全覆盖了 values_to,请参阅这里),例如:
library(dplyr)
data %>%
summarise(across(everything(), list(mean = mean, median = median, var = var))) %>%
tidyr::pivot_longer(cols = everything(), names_to = c(".value", "stats"), names_sep = "_")
输出如下:
# A tibble: 3 × 3
stats estimator1 estimator2
<chr> <dbl> <dbl>
1 mean 0.221 0.448
2 median 0.110 0.429
3 var 0.770 0.00288
英文:
You can use pivot_longer() with .value (".value" indicates that the corresponding component of the column name defines the name of the output column containing the cell values, overriding values_to entirely, see here), eg.
library(dplyr)
data |>
summarise(across(everything(), list(mean = mean, median = median, var = var))) |>
tidyr::pivot_longer(cols = everything(), names_to = c(".value", "stats"), names_sep = "_")
stats estimator1 estimator2
<chr> <dbl> <dbl>
1 mean 0.221 0.448
2 median 0.110 0.429
3 var 0.770 0.00288
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。


评论