计算R中列表的各元素的特定向量的平均值,并转换为data.frame。

huangapple go评论93阅读模式
英文:

Calculate the average of specific vectors of various elements of a list() in R and convert to data.frame

问题

df_mean <- data.frame(x = c(2, 3.333, 1.8))

英文:

I have a very large list() which has +2000 elements, where each element has two vectors (x and y) with different sizes between the elements of the list.

Example:

  1. new_list&lt;-list(data.frame(x = c(1,2,3),
  2. y = c(3,4,5)),
  3. data.frame(x = c(3,2,2,2,3,8),
  4. y = c(5,2,3,5,6,7)),
  5. data.frame(x = c(3,2,2,1,1),
  6. y = c(5,2,3,3,2)))

I would like to average only the x vectors in this list to get something like this:

  1. df_mean&lt;-data.frame(x=c(2,3.333,1.8))

答案1

得分: 5

你可以使用sapply来计算每列带有"x"的colMeans,像这样:

  1. data.frame(x = sapply(new_list, \(x) colMeans(x[grepl('x', names(x))])))
  2. #> x
  3. #> 1 2.000000
  4. #> 2 3.333333
  5. #> 3 1.800000

@nicola建议了一个更好的选项,如下(谢谢!):

  1. data.frame(x = sapply(new_list, \(x) mean(x$x)))
  2. #> x
  3. #> 1 2.000000
  4. #> 2 3.333333
  5. #> 3 1.800000

创建于2023-02-06,使用reprex v2.0.2

英文:

You could calculate the colMeans per column that has x with sapply like this:

  1. data.frame(x = sapply(new_list, \(x) colMeans(x[grepl(&#39;x&#39;, names(x))])))
  2. #&gt; x
  3. #&gt; 1 2.000000
  4. #&gt; 2 3.333333
  5. #&gt; 3 1.800000

@nicola suggested a better option like this (thanks!):

  1. data.frame(x = sapply(new_list, \(x) mean(x$x)))
  2. #&gt; x
  3. #&gt; 1 2.000000
  4. #&gt; 2 3.333333
  5. #&gt; 3 1.800000

<sup>Created on 2023-02-06 with reprex v2.0.2</sup>

答案2

得分: 2

Using map

  1. library(purrr)
  2. library(dplyr)
  3. map_dfr(new_list, ~ .x %>%
  4. summarise(x = mean(x)))
  5. x
  6. 1 2.000000
  7. 2 3.333333
  8. 3 1.800000
英文:

Using map

  1. library(purrr)
  2. library(dplyr)
  3. map_dfr(new_list, ~ .x %&gt;%
  4. summarise(x = mean(x)))
  5. x
  6. 1 2.000000
  7. 2 3.333333
  8. 3 1.800000
  9. </details>
  10. # 答案3
  11. **得分**: 2
  12. 好的,以下是代码部分的翻译:
  13. ```R
  14. 良好的回答由Quinten提供。我通常更喜欢遵循KISS原则。以下是我发现语法上更简单的格式:
  15. len <- length(new_list)
  16. sapply(1:len, function(z) mean(new_list[[z]][[1]]))
  17. [1] 2.000000 3.333333 1.800000
英文:

Good answer by Quinten. I usually prefer to follow the KISS principle. Here is a format that I find syntactically simpler:

  1. len &lt;- length(new_list)
  2. sapply(1:len, function(z) mean(new_list[[z]][[1]]))
  3. [1] 2.000000 3.333333 1.800000

答案4

得分: 2

在这个相对简单的情况下,我认为我更喜欢@quinten建议的更为简洁的解决方案。但是,如果你需要在嵌套数据框上计算更多的统计信息,你可以考虑类似这样的方法:

  1. library(tidyverse)
  2. tibble(data = new_list) |&gt;
  3. rowwise() |&gt;
  4. summarise(
  5. x = mean(data$x)
  6. )

或者另外一种方法:

  1. tibble(data = new_list) |&gt;
  2. rowwise() |&gt;
  3. summarise(
  4. data |&gt;
  5. summarise(x = mean(x))
  6. )
英文:

In this relatively simple case, I think I would prefer the more concise solution suggested by @quinten. However, if you need to calculate more statistics on the nested data frames, you could consider something like this:

  1. library(tidyverse)
  2. tibble(data = new_list) |&gt;
  3. rowwise() |&gt;
  4. summarise(
  5. x = mean(data$x)
  6. )
  7. #&gt; # A tibble: 3 &#215; 1
  8. #&gt; x
  9. #&gt; &lt;dbl&gt;
  10. #&gt; 1 2
  11. #&gt; 2 3.33
  12. #&gt; 3 1.8

or alternatively

  1. tibble(data = new_list) |&gt;
  2. rowwise() |&gt;
  3. summarise(
  4. data |&gt;
  5. summarise(x = mean(x))
  6. )
  7. #&gt; # A tibble: 3 &#215; 1
  8. #&gt; x
  9. #&gt; &lt;dbl&gt;
  10. #&gt; 1 2
  11. #&gt; 2 3.33
  12. #&gt; 3 1.8

答案5

得分: 1

你还可以使用enframe函数将列表转换为数据框,并按组进行均值计算:

  1. library(dplyr) #1.1.0或更高版本
  2. library(tibble)
  3. enframe(new_list) %>%
  4. unnest(value) %>%
  5. summarise(x = mean(x), .by = name)
  6. # name x
  7. #1 1 2
  8. #2 2 3.33
  9. #3 3 1.8
英文:

You can also enframe the list and do a mean by group:

  1. library(dplyr) #1.1.0 or higher
  2. library(tibble)
  3. enframe(new_list) %&gt;%
  4. unnest(value) %&gt;%
  5. summarise(x = mean(x), .by = name)
  6. # name x
  7. #1 1 2
  8. #2 2 3.33
  9. #3 3 1.8

答案6

得分: 0

使用data.tablerbindlist

  1. data.table::rbindlist(new_list, idcol = TRUE)[, .(x = mean(x)), .id][, 2]
  2. #> x
  3. #> 1: 2.000000
  4. #> 2: 3.333333
  5. #> 3: 1.800000
英文:

Using data.table's rbindlist:

  1. data.table::rbindlist(new_list, idcol = TRUE)[, .(x = mean(x)), .id][, 2]
  2. #&gt; x
  3. #&gt; 1: 2.000000
  4. #&gt; 2: 3.333333
  5. #&gt; 3: 1.800000

huangapple
  • 本文由 发表于 2023年2月7日 00:51:07
  • 转载请务必保留本文链接:https://go.coder-hub.com/75364264.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定