
huangapple go评论81阅读模式

building a new data frame with vectorized function




iris_avg <- data.frame(SLength = colMeans(matrix(iris$Sepal.Length, nrow = 10)),
                       SWidth = colMeans(matrix(iris$Sepal.Width, nrow = 10)),
                       PLength = colMeans(matrix(iris$Petal.Length, nrow = 10)),
                       PWidth = colMeans(matrix(iris$Petal.Width, nrow = 10)))
> iris_avg
   SLength SWidth PLength PWidth
1     4.86   3.31    1.45   0.22
2     5.21   3.65    1.42   0.25
3     5.01   3.39    1.55   0.27
4     5.07   3.46    1.42   0.20
5     4.88   3.33    1.47   0.29
6     6.10   2.87    4.37   1.38
7     5.85   2.65    4.14   1.27
8     6.26   2.85    4.49   1.41
9     5.83   2.75    4.27   1.34
10    5.64   2.73    4.03   1.23
11    6.57   2.94    5.77   2.04
12    6.55   2.90    5.54   2.05
13    6.63   2.96    5.50   1.93
14    6.74   3.04    5.62   1.94
15    6.45   3.03    5.33   2.17



I am trying to build a new data frame from an existing dataframe by performing a repitive calculation across a few columns. Currently I have a solution that looks something like this:


iris_avg &lt;- data.frame(SLength = colMeans(matrix(iris$Sepal.Length, nrow = 10)),
                       SWidth = colMeans(matrix(iris$Sepal.Width, nrow = 10)),
                       PLength = colMeans(matrix(iris$Petal.Length, nrow = 10)),
                       PWidth = colMeans(matrix(iris$Petal.Width, nrow = 10)))
&gt; iris_avg
   SLength SWidth PLength PWidth
1     4.86   3.31    1.45   0.22
2     5.21   3.65    1.42   0.25
3     5.01   3.39    1.55   0.27
4     5.07   3.46    1.42   0.20
5     4.88   3.33    1.47   0.29
6     6.10   2.87    4.37   1.38
7     5.85   2.65    4.14   1.27
8     6.26   2.85    4.49   1.41
9     5.83   2.75    4.27   1.34
10    5.64   2.73    4.03   1.23
11    6.57   2.94    5.77   2.04
12    6.55   2.90    5.54   2.05
13    6.63   2.96    5.50   1.93
14    6.74   3.04    5.62   1.94
15    6.45   3.03    5.33   2.17

I feel like there should be a simple way to use something like lapply or map, but I have struggled to get it to work, as I am new to R. Any advice would be greatly appreciated!


得分: 2


# 使用基本的R语言,可以将这个操作压缩成一行代码:
as.data.frame(lapply(iris[1:4], \(x) sapply(split(x, 0:149 %/% 10), mean)))
#&gt;    Sepal.Length Sepal.Width Petal.Length Petal.Width
#&gt; 0          4.86        3.31         1.45        0.22
#&gt; 1          5.21        3.65         1.42        0.25
#&gt; 2          5.01        3.39         1.55        0.27
#&gt; 3          5.07        3.46         1.42        0.20
#&gt; 4          4.88        3.33         1.47        0.29
#&gt; 5          6.10        2.87         4.37        1.38
#&gt; 6          5.85        2.65         4.14        1.27
#&gt; 7          6.26        2.85         4.49        1.41
#&gt; 8          5.83        2.75         4.27        1.34
#&gt; 9          5.64        2.73         4.03        1.23
#&gt; 10         6.57        2.94         5.77        2.04
#&gt; 11         6.55        2.90         5.54        2.05
#&gt; 12         6.63        2.96         5.50        1.93
#&gt; 13         6.74        3.04         5.62        1.94
#&gt; 14         6.45        3.03         5.33        2.17



You could squeeze this into a single line of code in base R:

as.data.frame(lapply(iris[1:4], \(x) sapply(split(x, 0:149 %/% 10), mean)))
#&gt;    Sepal.Length Sepal.Width Petal.Length Petal.Width
#&gt; 0          4.86        3.31         1.45        0.22
#&gt; 1          5.21        3.65         1.42        0.25
#&gt; 2          5.01        3.39         1.55        0.27
#&gt; 3          5.07        3.46         1.42        0.20
#&gt; 4          4.88        3.33         1.47        0.29
#&gt; 5          6.10        2.87         4.37        1.38
#&gt; 6          5.85        2.65         4.14        1.27
#&gt; 7          6.26        2.85         4.49        1.41
#&gt; 8          5.83        2.75         4.27        1.34
#&gt; 9          5.64        2.73         4.03        1.23
#&gt; 10         6.57        2.94         5.77        2.04
#&gt; 11         6.55        2.90         5.54        2.05
#&gt; 12         6.63        2.96         5.50        1.93
#&gt; 13         6.74        3.04         5.62        1.94
#&gt; 14         6.45        3.03         5.33        2.17

The following explanation should make this a bit clearer. Instead of using an anonymous function, we can define a function that takes a vector (or data frame column), splits it into chunks of length 10, and uses sapply to get the average of each chunk in a single vector:

mean_of_every_10_items &lt;- function(x) {
  groups_numbers &lt;- (seq_along(x) - 1) %/% 10
  groups_of_10 &lt;- split(x, group_numbers)
  return(sapply(groups_of_10, mean))

We can apply this function to each numeric column of iris using lapply, to get a list containing the result of our function on each column. As a last step, we turn this list back into a data frame:

iris[1:4] |&gt;
  lapply(mean_of_every_10_items) |&gt;
#&gt;    Sepal.Length Sepal.Width Petal.Length Petal.Width
#&gt; 0          4.86        3.31         1.45        0.22
#&gt; 1          5.21        3.65         1.42        0.25
#&gt; 2          5.01        3.39         1.55        0.27
#&gt; 3          5.07        3.46         1.42        0.20
#&gt; 4          4.88        3.33         1.47        0.29
#&gt; 5          6.10        2.87         4.37        1.38
#&gt; 6          5.85        2.65         4.14        1.27
#&gt; 7          6.26        2.85         4.49        1.41
#&gt; 8          5.83        2.75         4.27        1.34
#&gt; 9          5.64        2.73         4.03        1.23
#&gt; 10         6.57        2.94         5.77        2.04
#&gt; 11         6.55        2.90         5.54        2.05
#&gt; 12         6.63        2.96         5.50        1.93
#&gt; 13         6.74        3.04         5.62        1.94
#&gt; 14         6.45        3.03         5.33        2.17

The "one-liner" version does the same thing, but it's a lot harder to see what it's doing.

<sup>Created on 2023-07-17 with reprex v2.0.2</sup>


得分: 2



iris %>%
  reframe(across(-Species, ~colMeans(matrix(.x, 10))))


x <- data.matrix(iris[-5])
tapply(x, list((row(x) - 1) %/% 10, col(x)), mean)

另一种方法是使用sapply + split

t(sapply(split(iris[-5], gl(nrow(iris)/10, 10)), colMeans))



in tidyverse:


 iris %&gt;%
   reframe(across(-Species, ~colMeans(matrix(.x, 10))))
   Sepal.Length Sepal.Width Petal.Length Petal.Width
1          4.86        3.31         1.45        0.22
2          5.21        3.65         1.42        0.25
3          5.01        3.39         1.55        0.27
4          5.07        3.46         1.42        0.20
5          4.88        3.33         1.47        0.29
6          6.10        2.87         4.37        1.38
7          5.85        2.65         4.14        1.27
8          6.26        2.85         4.49        1.41
9          5.83        2.75         4.27        1.34
10         5.64        2.73         4.03        1.23
11         6.57        2.94         5.77        2.04
12         6.55        2.90         5.54        2.05
13         6.63        2.96         5.50        1.93
14         6.74        3.04         5.62        1.94
15         6.45        3.03         5.33        2.17

Another base R approach is to find correct groupings and use that within tapply:

x &lt;- data.matrix(iris[-5])
tapply(x, list((row(x) -1 ) %/% 10, col(x)), mean)

      1    2    3    4
0  4.86 3.31 1.45 0.22
1  5.21 3.65 1.42 0.25
2  5.01 3.39 1.55 0.27
3  5.07 3.46 1.42 0.20
4  4.88 3.33 1.47 0.29
5  6.10 2.87 4.37 1.38
6  5.85 2.65 4.14 1.27
7  6.26 2.85 4.49 1.41
8  5.83 2.75 4.27 1.34
9  5.64 2.73 4.03 1.23
10 6.57 2.94 5.77 2.04
11 6.55 2.90 5.54 2.05
12 6.63 2.96 5.50 1.93
13 6.74 3.04 5.62 1.94
14 6.45 3.03 5.33 2.17

Another way using sapply+ split

t(sapply(split(iris[-5], gl(nrow(iris)/10, 10)), colMeans))

   Sepal.Length Sepal.Width Petal.Length Petal.Width
1          4.86        3.31         1.45        0.22
2          5.21        3.65         1.42        0.25
3          5.01        3.39         1.55        0.27
4          5.07        3.46         1.42        0.20
5          4.88        3.33         1.47        0.29
6          6.10        2.87         4.37        1.38
7          5.85        2.65         4.14        1.27
8          6.26        2.85         4.49        1.41
9          5.83        2.75         4.27        1.34
10         5.64        2.73         4.03        1.23
11         6.57        2.94         5.77        2.04
12         6.55        2.90         5.54        2.05
13         6.63        2.96         5.50        1.93
14         6.74        3.04         5.62        1.94
15         6.45        3.03         5.33        2.17

  • 本文由 发表于 2023年7月18日 01:50:00
  • 转载请务必保留本文链接:https://go.coder-hub.com/76706963.html



:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:
