2023年2月7日 00:51:07go评论93阅读模式

英文:

Calculate the average of specific vectors of various elements of a list() in R and convert to data.frame

问题

df_mean <- data.frame(x = c(2, 3.333, 1.8))

英文:

I have a very large list() which has +2000 elements, where each element has two vectors (x and y) with different sizes between the elements of the list.

Example:

new_list&lt;-list(data.frame(x = c(1,2,3),
                          y = c(3,4,5)),
               data.frame(x = c(3,2,2,2,3,8),
                          y = c(5,2,3,5,6,7)),
               data.frame(x = c(3,2,2,1,1),
                          y = c(5,2,3,3,2)))

I would like to average only the x vectors in this list to get something like this:

df_mean&lt;-data.frame(x=c(2,3.333,1.8))

答案1

得分: 5

你可以使用sapply来计算每列带有"x"的colMeans，像这样：

data.frame(x = sapply(new_list, \(x) colMeans(x[grepl('x', names(x))])))
#>          x
#> 1 2.000000
#> 2 3.333333
#> 3 1.800000

@nicola建议了一个更好的选项，如下（谢谢！）：

data.frame(x = sapply(new_list, \(x) mean(x$x)))
#>          x
#> 1 2.000000
#> 2 3.333333
#> 3 1.800000

^{创建于2023-02-06，使用reprex v2.0.2}

英文:

You could calculate the colMeans per column that has x with sapply like this:

data.frame(x = sapply(new_list, \(x) colMeans(x[grepl(&#39;x&#39;, names(x))])))
#&gt;          x
#&gt; 1 2.000000
#&gt; 2 3.333333
#&gt; 3 1.800000

@nicola suggested a better option like this (thanks!):

data.frame(x = sapply(new_list, \(x) mean(x$x)))
#&gt;          x
#&gt; 1 2.000000
#&gt; 2 3.333333
#&gt; 3 1.800000

<sup>Created on 2023-02-06 with reprex v2.0.2</sup>

答案2

得分: 2

Using map

library(purrr)
library(dplyr)
map_dfr(new_list, ~ .x %>% 
    summarise(x = mean(x)))
         x
1 2.000000
2 3.333333
3 1.800000

英文:

Using map

library(purrr)
library(dplyr)
map_dfr(new_list, ~ .x %&gt;% 
    summarise(x = mean(x)))
         x
1 2.000000
2 3.333333
3 1.800000
</details>
# 答案3
**得分**: 2
好的，以下是代码部分的翻译：
```R
良好的回答由Quinten提供。我通常更喜欢遵循KISS原则。以下是我发现语法上更简单的格式：
    len <- length(new_list)
    sapply(1:len, function(z) mean(new_list[[z]][[1]]))
    [1] 2.000000 3.333333 1.800000

英文:

Good answer by Quinten. I usually prefer to follow the KISS principle. Here is a format that I find syntactically simpler:

len &lt;- length(new_list)
sapply(1:len, function(z) mean(new_list[[z]][[1]]))
[1] 2.000000 3.333333 1.800000

答案4

得分: 2

在这个相对简单的情况下，我认为我更喜欢@quinten建议的更为简洁的解决方案。但是，如果你需要在嵌套数据框上计算更多的统计信息，你可以考虑类似这样的方法：

library(tidyverse)
tibble(data = new_list) |&gt; 
  rowwise() |&gt; 
  summarise(
    x = mean(data$x)
  )

或者另外一种方法：

tibble(data = new_list) |&gt; 
  rowwise() |&gt; 
  summarise(
    data |&gt; 
      summarise(x = mean(x))
  )

英文:

In this relatively simple case, I think I would prefer the more concise solution suggested by @quinten. However, if you need to calculate more statistics on the nested data frames, you could consider something like this:

library(tidyverse)
tibble(data = new_list) |&gt; 
  rowwise() |&gt; 
  summarise(
    x = mean(data$x)
  )
#&gt; # A tibble: 3 &#215; 1
#&gt;       x
#&gt;   &lt;dbl&gt;
#&gt; 1  2   
#&gt; 2  3.33
#&gt; 3  1.8

or alternatively

tibble(data = new_list) |&gt; 
  rowwise() |&gt; 
  summarise(
    data |&gt; 
      summarise(x = mean(x))
  )
#&gt; # A tibble: 3 &#215; 1
#&gt;       x
#&gt;   &lt;dbl&gt;
#&gt; 1  2   
#&gt; 2  3.33
#&gt; 3  1.8

答案5

得分: 1

你还可以使用enframe函数将列表转换为数据框，并按组进行均值计算：

library(dplyr) #1.1.0或更高版本
library(tibble)
enframe(new_list) %>%
  unnest(value) %>%
  summarise(x = mean(x), .by = name)
#   name     x
#1     1  2   
#2     2  3.33
#3     3  1.8

英文:

You can also enframe the list and do a mean by group:

library(dplyr) #1.1.0 or higher
library(tibble)
enframe(new_list) %&gt;% 
  unnest(value) %&gt;% 
  summarise(x = mean(x), .by = name)
#   name     x
#1     1  2   
#2     2  3.33
#3     3  1.8

答案6

得分: 0

使用data.table的rbindlist：

data.table::rbindlist(new_list, idcol = TRUE)[, .(x = mean(x)), .id][, 2]
#>           x
#> 1: 2.000000
#> 2: 3.333333
#> 3: 1.800000

英文:

Using data.table's rbindlist:

data.table::rbindlist(new_list, idcol = TRUE)[, .(x = mean(x)), .id][, 2]
#&gt;           x
#&gt; 1: 2.000000
#&gt; 2: 3.333333
#&gt; 3: 1.800000

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

计算R中列表的各元素的特定向量的平均值，并转换为data.frame。

问题

答案1

答案2

答案4

答案5

答案6

在`map_if`函数中编写测试条件：对所有包含特定值的列进行函数应用。

ifelse()函数中语句的顺序在R中重要吗？

R中用于“连接”/“拼接”单词列表的函数。

dplyr中的if else没有else/在一个块中进行条件mutate

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。