英文:
dplyr – get certain summary statics for multiple columns of a dataframe
问题
I want to create a summary statistics table for some summary functions for multiple variables. I've managed to do it using summarise
and across
, but I get a wide dataframe which is hard to read. Is there a better alternative (perhaps using purrr
), or is there an easy way of reshaping the data?
Here is a reproducible example (the funs
list contains additional functions I've created myself):
data <- as.data.frame(cbind(estimator1 = rnorm(3),
estimator2 = runif(3)))
funs <- list(mean = mean, median = median)
If I use summarise
and across
I obtain:
estimator1_mean estimator1_median estimator2_mean estimator2_median
0.9506083 1.138536 0.5789924 0.7598719
What I would like to obtain is:
estimator1 estimator2
mean 0.9506083 0.5789924
median 1.138536 0.7598719
英文:
I want to create a summary statistics table for some summary functions for multiple variables. I've managed to do it using summarise
and across
, but I get a wide dataframe which is hard to read. Is there a better alternative (perhaps using purrr
), or is there an easy way of reshaping the data?
Here is a reproducible example (the funs
list contains additional functions I've created myself):
data <- as.data.frame(cbind(estimator1 = rnorm(3),
estimator2 = runif(3)))
funs <- list(mean = mean, median = median)
If I use summarise
and across
I obtain:
estimator1_mean estimator1_median estimator2_mean estimator2_median
0.9506083 1.138536 0.5789924 0.7598719
What I would like to obtain is:
estimator1 estimator2
mean 0.9506083 0.5789924
median 1.138536 0.7598719
答案1
得分: 2
基于 R 的方法:
使用 sapply
:
sapply(data, \(x) sapply(funs, \(f) f(x) ))
将嵌套应用 sapply()
函数到 data
和 funs
。对于 data
的每个元素 x,它使用内部的 sapply()
函数将每个 funs
中的函数 f 应用到 x 上。
两个被应用的函数都是匿名函数,使用 \(f)
语法定义,它们接受一个参数 f
。
假设我们有给定的 funs <- list(mean = mean, median = median)
。
这段代码 sapply(data, \(x) sapply(funs, \(f) f(x) ))
将应用 mean()
和 median()
到 data
的每个元素,并返回一个包含结果的矩阵:
sapply(data, \(x) sapply(funs, \(f) f(x) ))
estimator1 estimator2
mean 0.3081365 0.4251447
median 0.2159416 0.3198206
英文:
base R approach:
Using sapply
:
sapply(data, \(x) sapply(funs, \(f) f(x) ))
is applying a nested sapply()
function to data
and funs
. For each element x of data
, it applies each function f in funs
to x using the inner sapply()
function.
Both functions applied are anonymous functions defined with the \(f)
syntax, which takes one argument f
.
Having our given funs <- list(mean = mean, median = median)
The code sapply(data, \(x) sapply(funs, \(f) f(x) ))
will apply mean()
and median()
to each element of data
and return a matrix with the results:
sapply(data, \(x) sapply(funs, \(f) f(x) ))
estimator1 estimator2
mean 0.3081365 0.4251447
median 0.2159416 0.3198206
答案2
得分: 1
你可以使用 pivot_longer()
与 .value
(".value
" 表示列名的相应部分定义了包含单元格值的输出列名,完全覆盖了 values_to
,请参阅这里),例如:
library(dplyr)
data %>%
summarise(across(everything(), list(mean = mean, median = median, var = var))) %>%
tidyr::pivot_longer(cols = everything(), names_to = c(".value", "stats"), names_sep = "_")
输出如下:
# A tibble: 3 × 3
stats estimator1 estimator2
<chr> <dbl> <dbl>
1 mean 0.221 0.448
2 median 0.110 0.429
3 var 0.770 0.00288
英文:
You can use pivot_longer()
with .value
(".value
" indicates that the corresponding component of the column name defines the name of the output column containing the cell values, overriding values_to
entirely, see here), eg.
library(dplyr)
data |>
summarise(across(everything(), list(mean = mean, median = median, var = var))) |>
tidyr::pivot_longer(cols = everything(), names_to = c(".value", "stats"), names_sep = "_")
stats estimator1 estimator2
<chr> <dbl> <dbl>
1 mean 0.221 0.448
2 median 0.110 0.429
3 var 0.770 0.00288
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论