英文:
Calculate the average of specific vectors of various elements of a list() in R and convert to data.frame
问题
df_mean <- data.frame(x = c(2, 3.333, 1.8))
英文:
I have a very large list() which has +2000 elements, where each element has two vectors (x and y) with different sizes between the elements of the list.
Example:
new_list<-list(data.frame(x = c(1,2,3),
y = c(3,4,5)),
data.frame(x = c(3,2,2,2,3,8),
y = c(5,2,3,5,6,7)),
data.frame(x = c(3,2,2,1,1),
y = c(5,2,3,3,2)))
I would like to average only the x
vectors in this list to get something like this:
df_mean<-data.frame(x=c(2,3.333,1.8))
答案1
得分: 5
你可以使用sapply
来计算每列带有"x"的colMeans
,像这样:
data.frame(x = sapply(new_list, \(x) colMeans(x[grepl('x', names(x))])))
#> x
#> 1 2.000000
#> 2 3.333333
#> 3 1.800000
@nicola建议了一个更好的选项,如下(谢谢!):
data.frame(x = sapply(new_list, \(x) mean(x$x)))
#> x
#> 1 2.000000
#> 2 3.333333
#> 3 1.800000
创建于2023-02-06,使用reprex v2.0.2
英文:
You could calculate the colMeans
per column that has x with sapply
like this:
data.frame(x = sapply(new_list, \(x) colMeans(x[grepl('x', names(x))])))
#> x
#> 1 2.000000
#> 2 3.333333
#> 3 1.800000
@nicola suggested a better option like this (thanks!):
data.frame(x = sapply(new_list, \(x) mean(x$x)))
#> x
#> 1 2.000000
#> 2 3.333333
#> 3 1.800000
<sup>Created on 2023-02-06 with reprex v2.0.2</sup>
答案2
得分: 2
Using map
library(purrr)
library(dplyr)
map_dfr(new_list, ~ .x %>%
summarise(x = mean(x)))
x
1 2.000000
2 3.333333
3 1.800000
英文:
Using map
library(purrr)
library(dplyr)
map_dfr(new_list, ~ .x %>%
summarise(x = mean(x)))
x
1 2.000000
2 3.333333
3 1.800000
</details>
# 答案3
**得分**: 2
好的,以下是代码部分的翻译:
```R
良好的回答由Quinten提供。我通常更喜欢遵循KISS原则。以下是我发现语法上更简单的格式:
len <- length(new_list)
sapply(1:len, function(z) mean(new_list[[z]][[1]]))
[1] 2.000000 3.333333 1.800000
英文:
Good answer by Quinten. I usually prefer to follow the KISS principle. Here is a format that I find syntactically simpler:
len <- length(new_list)
sapply(1:len, function(z) mean(new_list[[z]][[1]]))
[1] 2.000000 3.333333 1.800000
答案4
得分: 2
在这个相对简单的情况下,我认为我更喜欢@quinten建议的更为简洁的解决方案。但是,如果你需要在嵌套数据框上计算更多的统计信息,你可以考虑类似这样的方法:
library(tidyverse)
tibble(data = new_list) |>
rowwise() |>
summarise(
x = mean(data$x)
)
或者另外一种方法:
tibble(data = new_list) |>
rowwise() |>
summarise(
data |>
summarise(x = mean(x))
)
英文:
In this relatively simple case, I think I would prefer the more concise solution suggested by @quinten. However, if you need to calculate more statistics on the nested data frames, you could consider something like this:
library(tidyverse)
tibble(data = new_list) |>
rowwise() |>
summarise(
x = mean(data$x)
)
#> # A tibble: 3 × 1
#> x
#> <dbl>
#> 1 2
#> 2 3.33
#> 3 1.8
or alternatively
tibble(data = new_list) |>
rowwise() |>
summarise(
data |>
summarise(x = mean(x))
)
#> # A tibble: 3 × 1
#> x
#> <dbl>
#> 1 2
#> 2 3.33
#> 3 1.8
答案5
得分: 1
你还可以使用enframe
函数将列表转换为数据框,并按组进行均值计算:
library(dplyr) #1.1.0或更高版本
library(tibble)
enframe(new_list) %>%
unnest(value) %>%
summarise(x = mean(x), .by = name)
# name x
#1 1 2
#2 2 3.33
#3 3 1.8
英文:
You can also enframe
the list and do a mean
by group:
library(dplyr) #1.1.0 or higher
library(tibble)
enframe(new_list) %>%
unnest(value) %>%
summarise(x = mean(x), .by = name)
# name x
#1 1 2
#2 2 3.33
#3 3 1.8
答案6
得分: 0
使用data.table
的rbindlist
:
data.table::rbindlist(new_list, idcol = TRUE)[, .(x = mean(x)), .id][, 2]
#> x
#> 1: 2.000000
#> 2: 3.333333
#> 3: 1.800000
英文:
Using data.table
's rbindlist
:
data.table::rbindlist(new_list, idcol = TRUE)[, .(x = mean(x)), .id][, 2]
#> x
#> 1: 2.000000
#> 2: 3.333333
#> 3: 1.800000
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论