2023年3月8日 18:24:13go评论102阅读模式

英文:

dplyr – get certain summary statics for multiple columns of a dataframe

问题

I want to create a summary statistics table for some summary functions for multiple variables. I've managed to do it using summarise and across, but I get a wide dataframe which is hard to read. Is there a better alternative (perhaps using purrr), or is there an easy way of reshaping the data?

Here is a reproducible example (the funs list contains additional functions I've created myself):

data <- as.data.frame(cbind(estimator1 = rnorm(3), 
                            estimator2 = runif(3)))
funs <- list(mean = mean, median = median)

If I use summarise and across I obtain:

estimator1_mean estimator1_median estimator2_mean estimator2_median
0.9506083          1.138536       0.5789924         0.7598719

What I would like to obtain is:

         estimator1 estimator2
mean     0.9506083  0.5789924        
median   1.138536   0.7598719

英文:

Here is a reproducible example (the funs list contains additional functions I've created myself):

data &lt;- as.data.frame(cbind(estimator1 = rnorm(3), 
                            estimator2 = runif(3)))
funs &lt;- list(mean = mean, median = median)

If I use summarise and across I obtain:

estimator1_mean estimator1_median estimator2_mean estimator2_median
0.9506083          1.138536       0.5789924         0.7598719

What I would like to obtain is:

         estimator1 estimator2
mean     0.9506083  0.5789924        
median   1.138536   0.7598719

答案1

得分: 2

基于 R 的方法：

使用 sapply：

sapply(data, \(x) sapply(funs, \(f) f(x) )) 将嵌套应用 sapply() 函数到 data 和 funs。对于 data 的每个元素 x，它使用内部的 sapply() 函数将每个 funs 中的函数 f 应用到 x 上。

两个被应用的函数都是匿名函数，使用 \(f) 语法定义，它们接受一个参数 f。

假设我们有给定的 funs <- list(mean = mean, median = median)。

这段代码 sapply(data, \(x) sapply(funs, \(f) f(x) )) 将应用 mean() 和 median() 到 data 的每个元素，并返回一个包含结果的矩阵：

sapply(data, \(x) sapply(funs, \(f) f(x) ))

       estimator1 estimator2
mean    0.3081365  0.4251447
median  0.2159416  0.3198206

英文:

base R approach:

Using sapply:

sapply(data, \(x) sapply(funs, \(f) f(x) )) is applying a nested sapply() function to data and funs. For each element x of data, it applies each function f in funs to x using the inner sapply() function.

Both functions applied are anonymous functions defined with the \(f) syntax, which takes one argument f.

Having our given funs <- list(mean = mean, median = median)

The code sapply(data, \(x) sapply(funs, \(f) f(x) )) will apply mean() and median() to each element of data and return a matrix with the results:

sapply(data, \(x) sapply(funs, \(f) f(x) ))

       estimator1 estimator2
mean    0.3081365  0.4251447
median  0.2159416  0.3198206

答案2

得分: 1

你可以使用 pivot_longer() 与 .value（".value" 表示列名的相应部分定义了包含单元格值的输出列名，完全覆盖了 values_to，请参阅这里），例如：

library(dplyr)  
data %>%
  summarise(across(everything(), list(mean = mean, median = median, var = var))) %>%
  tidyr::pivot_longer(cols = everything(), names_to = c(".value", "stats"), names_sep = "_")

输出如下：

# A tibble: 3 × 3
  stats    estimator1 estimator2
  <chr>        <dbl>        <dbl>
1 mean         0.221        0.448
2 median       0.110        0.429
3 var          0.770        0.00288

英文:

You can use pivot_longer() with .value (".value" indicates that the corresponding component of the column name defines the name of the output column containing the cell values, overriding values_to entirely, see here), eg.

  library(dplyr)  
  data |&gt;
    summarise(across(everything(), list(mean = mean, median = median, var = var))) |&gt;
    tidyr::pivot_longer(cols = everything(), names_to = c(&quot;.value&quot;, &quot;stats&quot;), names_sep = &quot;_&quot;)
  stats  estimator1 estimator2
  &lt;chr&gt;       &lt;dbl&gt;      &lt;dbl&gt;
1 mean        0.221    0.448  
2 median      0.110    0.429  
3 var         0.770    0.00288

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

dplyr – 获取数据框架多列的特定摘要统计信息

问题

答案1

答案2

制作一个堆叠的ggplot条形图，使用非分类值。

如何在R中合并跨多行的文本

如何使用逗号作为千位分隔符来格式化表格1的数值。

如何在R中将所有可能的列组合相乘，并将它们用于多元线性回归模型中？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。