英文:
building a new data frame with vectorized function
问题
我试图从现有的数据框中通过在几列上执行重复计算来构建一个新的数据框。目前我有一个类似这样的解决方案:
library(tidyverse)
iris_avg <- data.frame(SLength = colMeans(matrix(iris$Sepal.Length, nrow = 10)),
SWidth = colMeans(matrix(iris$Sepal.Width, nrow = 10)),
PLength = colMeans(matrix(iris$Petal.Length, nrow = 10)),
PWidth = colMeans(matrix(iris$Petal.Width, nrow = 10)))
> iris_avg
SLength SWidth PLength PWidth
1 4.86 3.31 1.45 0.22
2 5.21 3.65 1.42 0.25
3 5.01 3.39 1.55 0.27
4 5.07 3.46 1.42 0.20
5 4.88 3.33 1.47 0.29
6 6.10 2.87 4.37 1.38
7 5.85 2.65 4.14 1.27
8 6.26 2.85 4.49 1.41
9 5.83 2.75 4.27 1.34
10 5.64 2.73 4.03 1.23
11 6.57 2.94 5.77 2.04
12 6.55 2.90 5.54 2.05
13 6.63 2.96 5.50 1.93
14 6.74 3.04 5.62 1.94
15 6.45 3.03 5.33 2.17
我觉得应该有一种简单的方法可以使用类似lapply或map的东西,但是我一直在努力让它工作,因为我是R的新手。非常感谢任何建议!
英文:
I am trying to build a new data frame from an existing dataframe by performing a repitive calculation across a few columns. Currently I have a solution that looks something like this:
library(tidyverse)
iris_avg <- data.frame(SLength = colMeans(matrix(iris$Sepal.Length, nrow = 10)),
SWidth = colMeans(matrix(iris$Sepal.Width, nrow = 10)),
PLength = colMeans(matrix(iris$Petal.Length, nrow = 10)),
PWidth = colMeans(matrix(iris$Petal.Width, nrow = 10)))
> iris_avg
SLength SWidth PLength PWidth
1 4.86 3.31 1.45 0.22
2 5.21 3.65 1.42 0.25
3 5.01 3.39 1.55 0.27
4 5.07 3.46 1.42 0.20
5 4.88 3.33 1.47 0.29
6 6.10 2.87 4.37 1.38
7 5.85 2.65 4.14 1.27
8 6.26 2.85 4.49 1.41
9 5.83 2.75 4.27 1.34
10 5.64 2.73 4.03 1.23
11 6.57 2.94 5.77 2.04
12 6.55 2.90 5.54 2.05
13 6.63 2.96 5.50 1.93
14 6.74 3.04 5.62 1.94
15 6.45 3.03 5.33 2.17
I feel like there should be a simple way to use something like lapply or map, but I have struggled to get it to work, as I am new to R. Any advice would be greatly appreciated!
答案1
得分: 2
以下是代码的翻译部分:
# 使用基本的R语言,可以将这个操作压缩成一行代码:
as.data.frame(lapply(iris[1:4], \(x) sapply(split(x, 0:149 %/% 10), mean)))
#> Sepal.Length Sepal.Width Petal.Length Petal.Width
#> 0 4.86 3.31 1.45 0.22
#> 1 5.21 3.65 1.42 0.25
#> 2 5.01 3.39 1.55 0.27
#> 3 5.07 3.46 1.42 0.20
#> 4 4.88 3.33 1.47 0.29
#> 5 6.10 2.87 4.37 1.38
#> 6 5.85 2.65 4.14 1.27
#> 7 6.26 2.85 4.49 1.41
#> 8 5.83 2.75 4.27 1.34
#> 9 5.64 2.73 4.03 1.23
#> 10 6.57 2.94 5.77 2.04
#> 11 6.55 2.90 5.54 2.05
#> 12 6.63 2.96 5.50 1.93
#> 13 6.74 3.04 5.62 1.94
#> 14 6.45 3.03 5.33 2.17
希望这有所帮助。
英文:
You could squeeze this into a single line of code in base R:
as.data.frame(lapply(iris[1:4], \(x) sapply(split(x, 0:149 %/% 10), mean)))
#> Sepal.Length Sepal.Width Petal.Length Petal.Width
#> 0 4.86 3.31 1.45 0.22
#> 1 5.21 3.65 1.42 0.25
#> 2 5.01 3.39 1.55 0.27
#> 3 5.07 3.46 1.42 0.20
#> 4 4.88 3.33 1.47 0.29
#> 5 6.10 2.87 4.37 1.38
#> 6 5.85 2.65 4.14 1.27
#> 7 6.26 2.85 4.49 1.41
#> 8 5.83 2.75 4.27 1.34
#> 9 5.64 2.73 4.03 1.23
#> 10 6.57 2.94 5.77 2.04
#> 11 6.55 2.90 5.54 2.05
#> 12 6.63 2.96 5.50 1.93
#> 13 6.74 3.04 5.62 1.94
#> 14 6.45 3.03 5.33 2.17
The following explanation should make this a bit clearer. Instead of using an anonymous function, we can define a function that takes a vector (or data frame column), splits it into chunks of length 10, and uses sapply
to get the average of each chunk in a single vector:
mean_of_every_10_items <- function(x) {
groups_numbers <- (seq_along(x) - 1) %/% 10
groups_of_10 <- split(x, group_numbers)
return(sapply(groups_of_10, mean))
}
We can apply this function to each numeric column of iris
using lapply
, to get a list containing the result of our function on each column. As a last step, we turn this list back into a data frame:
iris[1:4] |>
lapply(mean_of_every_10_items) |>
as.data.frame()
#> Sepal.Length Sepal.Width Petal.Length Petal.Width
#> 0 4.86 3.31 1.45 0.22
#> 1 5.21 3.65 1.42 0.25
#> 2 5.01 3.39 1.55 0.27
#> 3 5.07 3.46 1.42 0.20
#> 4 4.88 3.33 1.47 0.29
#> 5 6.10 2.87 4.37 1.38
#> 6 5.85 2.65 4.14 1.27
#> 7 6.26 2.85 4.49 1.41
#> 8 5.83 2.75 4.27 1.34
#> 9 5.64 2.73 4.03 1.23
#> 10 6.57 2.94 5.77 2.04
#> 11 6.55 2.90 5.54 2.05
#> 12 6.63 2.96 5.50 1.93
#> 13 6.74 3.04 5.62 1.94
#> 14 6.45 3.03 5.33 2.17
The "one-liner" version does the same thing, but it's a lot harder to see what it's doing.
<sup>Created on 2023-07-17 with reprex v2.0.2</sup>
答案2
得分: 2
在tidyverse中:
library(tidyverse)
iris %>%
reframe(across(-Species, ~colMeans(matrix(.x, 10))))
另一种基本R的方法是找到正确的分组,并在tapply中使用它:
x <- data.matrix(iris[-5])
tapply(x, list((row(x) - 1) %/% 10, col(x)), mean)
另一种方法是使用sapply
+ split
:
t(sapply(split(iris[-5], gl(nrow(iris)/10, 10)), colMeans))
希望这可以帮助你。
英文:
in tidyverse:
library(tidyverse)
iris %>%
reframe(across(-Species, ~colMeans(matrix(.x, 10))))
Sepal.Length Sepal.Width Petal.Length Petal.Width
1 4.86 3.31 1.45 0.22
2 5.21 3.65 1.42 0.25
3 5.01 3.39 1.55 0.27
4 5.07 3.46 1.42 0.20
5 4.88 3.33 1.47 0.29
6 6.10 2.87 4.37 1.38
7 5.85 2.65 4.14 1.27
8 6.26 2.85 4.49 1.41
9 5.83 2.75 4.27 1.34
10 5.64 2.73 4.03 1.23
11 6.57 2.94 5.77 2.04
12 6.55 2.90 5.54 2.05
13 6.63 2.96 5.50 1.93
14 6.74 3.04 5.62 1.94
15 6.45 3.03 5.33 2.17
Another base R approach is to find correct groupings and use that within tapply:
x <- data.matrix(iris[-5])
tapply(x, list((row(x) -1 ) %/% 10, col(x)), mean)
1 2 3 4
0 4.86 3.31 1.45 0.22
1 5.21 3.65 1.42 0.25
2 5.01 3.39 1.55 0.27
3 5.07 3.46 1.42 0.20
4 4.88 3.33 1.47 0.29
5 6.10 2.87 4.37 1.38
6 5.85 2.65 4.14 1.27
7 6.26 2.85 4.49 1.41
8 5.83 2.75 4.27 1.34
9 5.64 2.73 4.03 1.23
10 6.57 2.94 5.77 2.04
11 6.55 2.90 5.54 2.05
12 6.63 2.96 5.50 1.93
13 6.74 3.04 5.62 1.94
14 6.45 3.03 5.33 2.17
Another way using sapply
+ split
t(sapply(split(iris[-5], gl(nrow(iris)/10, 10)), colMeans))
Sepal.Length Sepal.Width Petal.Length Petal.Width
1 4.86 3.31 1.45 0.22
2 5.21 3.65 1.42 0.25
3 5.01 3.39 1.55 0.27
4 5.07 3.46 1.42 0.20
5 4.88 3.33 1.47 0.29
6 6.10 2.87 4.37 1.38
7 5.85 2.65 4.14 1.27
8 6.26 2.85 4.49 1.41
9 5.83 2.75 4.27 1.34
10 5.64 2.73 4.03 1.23
11 6.57 2.94 5.77 2.04
12 6.55 2.90 5.54 2.05
13 6.63 2.96 5.50 1.93
14 6.74 3.04 5.62 1.94
15 6.45 3.03 5.33 2.17
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论