2023年7月18日 01:50:00go评论95阅读模式

英文:

building a new data frame with vectorized function

问题

我试图从现有的数据框中通过在几列上执行重复计算来构建一个新的数据框。目前我有一个类似这样的解决方案：

library(tidyverse)
iris_avg <- data.frame(SLength = colMeans(matrix(iris$Sepal.Length, nrow = 10)),
                       SWidth = colMeans(matrix(iris$Sepal.Width, nrow = 10)),
                       PLength = colMeans(matrix(iris$Petal.Length, nrow = 10)),
                       PWidth = colMeans(matrix(iris$Petal.Width, nrow = 10)))
> iris_avg
   SLength SWidth PLength PWidth
1     4.86   3.31    1.45   0.22
2     5.21   3.65    1.42   0.25
3     5.01   3.39    1.55   0.27
4     5.07   3.46    1.42   0.20
5     4.88   3.33    1.47   0.29
6     6.10   2.87    4.37   1.38
7     5.85   2.65    4.14   1.27
8     6.26   2.85    4.49   1.41
9     5.83   2.75    4.27   1.34
10    5.64   2.73    4.03   1.23
11    6.57   2.94    5.77   2.04
12    6.55   2.90    5.54   2.05
13    6.63   2.96    5.50   1.93
14    6.74   3.04    5.62   1.94
15    6.45   3.03    5.33   2.17

我觉得应该有一种简单的方法可以使用类似lapply或map的东西，但是我一直在努力让它工作，因为我是R的新手。非常感谢任何建议！

英文:

I am trying to build a new data frame from an existing dataframe by performing a repitive calculation across a few columns. Currently I have a solution that looks something like this:

library(tidyverse)
iris_avg &lt;- data.frame(SLength = colMeans(matrix(iris$Sepal.Length, nrow = 10)),
                       SWidth = colMeans(matrix(iris$Sepal.Width, nrow = 10)),
                       PLength = colMeans(matrix(iris$Petal.Length, nrow = 10)),
                       PWidth = colMeans(matrix(iris$Petal.Width, nrow = 10)))
&gt; iris_avg
   SLength SWidth PLength PWidth
1     4.86   3.31    1.45   0.22
2     5.21   3.65    1.42   0.25
3     5.01   3.39    1.55   0.27
4     5.07   3.46    1.42   0.20
5     4.88   3.33    1.47   0.29
6     6.10   2.87    4.37   1.38
7     5.85   2.65    4.14   1.27
8     6.26   2.85    4.49   1.41
9     5.83   2.75    4.27   1.34
10    5.64   2.73    4.03   1.23
11    6.57   2.94    5.77   2.04
12    6.55   2.90    5.54   2.05
13    6.63   2.96    5.50   1.93
14    6.74   3.04    5.62   1.94
15    6.45   3.03    5.33   2.17

I feel like there should be a simple way to use something like lapply or map, but I have struggled to get it to work, as I am new to R. Any advice would be greatly appreciated!

答案1

得分: 2

以下是代码的翻译部分：

# 使用基本的R语言，可以将这个操作压缩成一行代码：
as.data.frame(lapply(iris[1:4], \(x) sapply(split(x, 0:149 %/% 10), mean)))
#&gt;    Sepal.Length Sepal.Width Petal.Length Petal.Width
#&gt; 0          4.86        3.31         1.45        0.22
#&gt; 1          5.21        3.65         1.42        0.25
#&gt; 2          5.01        3.39         1.55        0.27
#&gt; 3          5.07        3.46         1.42        0.20
#&gt; 4          4.88        3.33         1.47        0.29
#&gt; 5          6.10        2.87         4.37        1.38
#&gt; 6          5.85        2.65         4.14        1.27
#&gt; 7          6.26        2.85         4.49        1.41
#&gt; 8          5.83        2.75         4.27        1.34
#&gt; 9          5.64        2.73         4.03        1.23
#&gt; 10         6.57        2.94         5.77        2.04
#&gt; 11         6.55        2.90         5.54        2.05
#&gt; 12         6.63        2.96         5.50        1.93
#&gt; 13         6.74        3.04         5.62        1.94
#&gt; 14         6.45        3.03         5.33        2.17

希望这有所帮助。

英文:

You could squeeze this into a single line of code in base R:

as.data.frame(lapply(iris[1:4], \(x) sapply(split(x, 0:149 %/% 10), mean)))
#&gt;    Sepal.Length Sepal.Width Petal.Length Petal.Width
#&gt; 0          4.86        3.31         1.45        0.22
#&gt; 1          5.21        3.65         1.42        0.25
#&gt; 2          5.01        3.39         1.55        0.27
#&gt; 3          5.07        3.46         1.42        0.20
#&gt; 4          4.88        3.33         1.47        0.29
#&gt; 5          6.10        2.87         4.37        1.38
#&gt; 6          5.85        2.65         4.14        1.27
#&gt; 7          6.26        2.85         4.49        1.41
#&gt; 8          5.83        2.75         4.27        1.34
#&gt; 9          5.64        2.73         4.03        1.23
#&gt; 10         6.57        2.94         5.77        2.04
#&gt; 11         6.55        2.90         5.54        2.05
#&gt; 12         6.63        2.96         5.50        1.93
#&gt; 13         6.74        3.04         5.62        1.94
#&gt; 14         6.45        3.03         5.33        2.17

The following explanation should make this a bit clearer. Instead of using an anonymous function, we can define a function that takes a vector (or data frame column), splits it into chunks of length 10, and uses sapply to get the average of each chunk in a single vector:

mean_of_every_10_items &lt;- function(x) {
  
  groups_numbers &lt;- (seq_along(x) - 1) %/% 10
  
  groups_of_10 &lt;- split(x, group_numbers)
    
  return(sapply(groups_of_10, mean))
}

We can apply this function to each numeric column of iris using lapply, to get a list containing the result of our function on each column. As a last step, we turn this list back into a data frame:

iris[1:4] |&gt;
  lapply(mean_of_every_10_items) |&gt;
  as.data.frame()
#&gt;    Sepal.Length Sepal.Width Petal.Length Petal.Width
#&gt; 0          4.86        3.31         1.45        0.22
#&gt; 1          5.21        3.65         1.42        0.25
#&gt; 2          5.01        3.39         1.55        0.27
#&gt; 3          5.07        3.46         1.42        0.20
#&gt; 4          4.88        3.33         1.47        0.29
#&gt; 5          6.10        2.87         4.37        1.38
#&gt; 6          5.85        2.65         4.14        1.27
#&gt; 7          6.26        2.85         4.49        1.41
#&gt; 8          5.83        2.75         4.27        1.34
#&gt; 9          5.64        2.73         4.03        1.23
#&gt; 10         6.57        2.94         5.77        2.04
#&gt; 11         6.55        2.90         5.54        2.05
#&gt; 12         6.63        2.96         5.50        1.93
#&gt; 13         6.74        3.04         5.62        1.94
#&gt; 14         6.45        3.03         5.33        2.17

The "one-liner" version does the same thing, but it's a lot harder to see what it's doing.

<sup>Created on 2023-07-17 with reprex v2.0.2</sup>

答案2

得分: 2

在tidyverse中：

library(tidyverse)
iris %>%
  reframe(across(-Species, ~colMeans(matrix(.x, 10))))

另一种基本R的方法是找到正确的分组，并在tapply中使用它：

x <- data.matrix(iris[-5])
tapply(x, list((row(x) - 1) %/% 10, col(x)), mean)

另一种方法是使用sapply + split：

t(sapply(split(iris[-5], gl(nrow(iris)/10, 10)), colMeans))

希望这可以帮助你。

英文:

in tidyverse:

 library(tidyverse)
 iris %&gt;%
   reframe(across(-Species, ~colMeans(matrix(.x, 10))))
   Sepal.Length Sepal.Width Petal.Length Petal.Width
1          4.86        3.31         1.45        0.22
2          5.21        3.65         1.42        0.25
3          5.01        3.39         1.55        0.27
4          5.07        3.46         1.42        0.20
5          4.88        3.33         1.47        0.29
6          6.10        2.87         4.37        1.38
7          5.85        2.65         4.14        1.27
8          6.26        2.85         4.49        1.41
9          5.83        2.75         4.27        1.34
10         5.64        2.73         4.03        1.23
11         6.57        2.94         5.77        2.04
12         6.55        2.90         5.54        2.05
13         6.63        2.96         5.50        1.93
14         6.74        3.04         5.62        1.94
15         6.45        3.03         5.33        2.17

Another base R approach is to find correct groupings and use that within tapply:

x &lt;- data.matrix(iris[-5])
tapply(x, list((row(x) -1 ) %/% 10, col(x)), mean)
      1    2    3    4
0  4.86 3.31 1.45 0.22
1  5.21 3.65 1.42 0.25
2  5.01 3.39 1.55 0.27
3  5.07 3.46 1.42 0.20
4  4.88 3.33 1.47 0.29
5  6.10 2.87 4.37 1.38
6  5.85 2.65 4.14 1.27
7  6.26 2.85 4.49 1.41
8  5.83 2.75 4.27 1.34
9  5.64 2.73 4.03 1.23
10 6.57 2.94 5.77 2.04
11 6.55 2.90 5.54 2.05
12 6.63 2.96 5.50 1.93
13 6.74 3.04 5.62 1.94
14 6.45 3.03 5.33 2.17

Another way using sapply+ split

t(sapply(split(iris[-5], gl(nrow(iris)/10, 10)), colMeans))
   Sepal.Length Sepal.Width Petal.Length Petal.Width
1          4.86        3.31         1.45        0.22
2          5.21        3.65         1.42        0.25
3          5.01        3.39         1.55        0.27
4          5.07        3.46         1.42        0.20
5          4.88        3.33         1.47        0.29
6          6.10        2.87         4.37        1.38
7          5.85        2.65         4.14        1.27
8          6.26        2.85         4.49        1.41
9          5.83        2.75         4.27        1.34
10         5.64        2.73         4.03        1.23
11         6.57        2.94         5.77        2.04
12         6.55        2.90         5.54        2.05
13         6.63        2.96         5.50        1.93
14         6.74        3.04         5.62        1.94
15         6.45        3.03         5.33        2.17

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

创建一个使用矢量化函数的新数据框。

问题

答案1

答案2

将sf多边形转换为sp

如何在Shiny中自动播放Plotly图表？

如何在ggplot中在分面条之间添加垂直线？

在R中为大型数据集的条形图ggplot添加标签（计数）。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。