2023年6月12日 01:58:11go评论95阅读模式

英文:

Is there a downside to using multiple nested for-loops in R for running intensive matrix calculations?

问题

在下面的示例代码中，我使用了3个嵌套的for循环来将元素列表（来自matList列表对象）扩展为一系列矩阵，其中包括下面列出的向量和代码生成/扩展的矩阵之间的简单计算。这通过执行函数createBucket()来实现。代码按预期工作，运行函数createBucket(3)的输出显示在底部的图像中。

这只是从更大的代码中简单提取的一部分。在更大的代码中，计算是密集的，涉及许多矩阵和向量之间的数据传输。更大的代码处理大型数据集。我也理解，for循环不一定比apply函数慢。

嵌套多个for循环像这样是否有缺点？在这个示例派生的更大的代码中，有更多的循环和嵌套。我尝试过lapply()，但是for循环的优势在于可以通过逐行执行代码的方式开发算法和逐行排查问题，以使其正常工作。它更容易可视化和理解。

此外，其次，是否有其他更简洁但可理解的方法来实现我在createBucket()中所做的事情？

seriesVector <- function() {c("mat_One", "mat_Two")}
matList <- list(mat_One = c("Boy", "Cat"), mat_Two = c("Boy", "Bat"))
allocate <- list(mat_One = c(0.6,0.5,0.4), mat_Two = c(0.4,0.5,0.6))
flowVector <- c(6,5,4)
createBucket <- function(nbr_rows) {
  series <- seriesVector()
  flowMat <- vector("list", length = length(series))
  
  # 遍历"seriesVector"中列出的每个系列
  for (i in seq_along(series)) {
    series_name <- series[i]
    mat_elements <- matList[[series_name]]
    mat_list <- vector("list", length = length(mat_elements))
    
    # 遍历"matList"中每个系列列出的每个向量元素（例如Boy/Cat等）
    for (j in seq_along(mat_elements)) {
      element <- mat_elements[j]
      mat <- matrix(0, nbr_rows, 4)
      colnames(mat) <- c('Inflow','Due','Cover_due','Outflow')
      
      # 遍历矩阵的行数，以及每个系列和矩阵中的每个元素，按矩阵中的行数和每个系列和矩阵中的元素计算列
      for (k in 1:nbr_rows) {
        allocate_value <- allocate[[series_name]][k]
        flow_value <- flowVector[k]
        mat[k, "Inflow"] <- allocate_value * flow_value
      }
      
      mat_list[[j]] <- mat
    }
    
    names(mat_list) <- mat_elements
    flowMat[[i]] <- setNames(mat_list, mat_elements)
  }
  
  names(flowMat) <- series
  return(flowMat)
}
createBucket(3)

英文:

In the example code presented below, I use 3 nested for-loops to expand a list of elements (from the matList list object) into a series of matrices, with example simple calculations between the below listed vectors and the code-generated/expanded matrices. This is done by executing function createBucket(). The code works as intended, with the output of running the function createBucket(3) shown in the image at the bottom.

This is a simple extraction from larger code. The calculations in the larger code are intensive with many matrices and vectors "speaking to each other" with the transfer of data between them. The larger code works with large datasets. I also understand that for-loops are not necessarily slower than apply functions.

Is there a downside to nesting multiple for-loops like this? There are more loops and nesting in the larger code this example derives from. I tried lapply() but the advantage of a for-loop is that it is easier to develop algorithms and troubleshoot by executing each line of code, line by line, to get it working right. It is easier to visualize and understand.

Also, secondarily, are there other more streamlined, yet comprehensible, ways to get at what I'm doing with createBucket()?

Code:

seriesVector &lt;- function() {c(&quot;mat_One&quot;, &quot;mat_Two&quot;)}
matList &lt;- list(mat_One = c(&quot;Boy&quot;, &quot;Cat&quot;),mat_Two = c(&quot;Boy&quot;, &quot;Bat&quot;))
allocate &lt;- list(mat_One = c(0.6,0.5,0.4),mat_Two = c(0.4,0.5,0.6))
flowVector &lt;- c(6,5,4)
createBucket &lt;- function(nbr_rows) {
  series &lt;- seriesVector()
  flowMat &lt;- vector(&quot;list&quot;, length = length(series))
  
  # sequences through each of the series listed in &quot;seriesVector&quot;
  for (i in seq_along(series)) {
    series_name &lt;- series[i]
    mat_elements &lt;- matList[[series_name]]
    mat_list &lt;- vector(&quot;list&quot;, length = length(mat_elements))
    
    # sequences through each vector element (Boy/Cat/etc.) for each series listed in &quot;matList&quot;
    for (j in seq_along(mat_elements)) {
      element &lt;- mat_elements[j]
      mat &lt;- matrix(0, nbr_rows, 4)
      colnames(mat) &lt;- c(&#39;Inflow&#39;,&#39;Due&#39;,&#39;Cover_due&#39;,&#39;Outflow&#39;)
      
      # sequences through each element of &quot;allocate&quot; and &quot;flowVector&quot; by the number of rows in
      # the matrices, and for each series and each element in the matrices calculates columns
      for (k in 1:nbr_rows) {
        allocate_value &lt;- allocate[[series_name]][k]
        flow_value &lt;- flowVector[k]
        mat[k, &quot;Inflow&quot;] &lt;- allocate_value * flow_value
      }
      
      mat_list[[j]] &lt;- mat
    }
    
    names(mat_list) &lt;- mat_elements
    flowMat[[i]] &lt;- setNames(mat_list, mat_elements)
  }
  
  names(flowMat) &lt;- series
  return(flowMat)
}
createBucket(3)

答案1

得分: 2

以下是翻译好的部分：

"我不确定是否更快，但肯定更紧凑的方法是映射出所有内容，而不是一堆循环。这里是一个伪一行代码，重新创建了您的函数。我建议尝试这个方法，自己进行一些速度测试。

library(tidyverse)
seriesVector <- function() {c("mat_One", "mat_Two")}
matList <- list(mat_One = c("Boy", "Cat"),mat_Two = c("Boy", "Bat"))
allocate <- list(mat_One = c(0.6,0.5,0.4),mat_Two = c(0.4,0.5,0.6))
flowVector <- c(6,5,4)
createBucket <- function(){
map(allocate, \(alc) alc*flowVector) |>
  map2(matList, \(alc, matL) expand.grid(V1 = matL, 
                                         Inflow = alc,
                                         Due = 0,
                                         Cover_due = 0,
                                         Outflow = 0)) |>
  map(\(dfs) group_split(dfs, V1, .keep = FALSE) |>
        map(as.matrix))  |>
  map2(matList, ~set_names(.x, .y))
}
createBucket()
#> $mat_One
#> $mat_One$Boy
#>      Inflow Due Cover_due Outflow
#> [1,]    3.6   0         0       0
#> [2,]    2.5   0         0       0
#> [3,]    1.6   0         0       0
#> 
#> $mat_One$Cat
#>      Inflow Due Cover_due Outflow
#> [1,]    3.6   0         0       0
#> [2,]    2.5   0         0       0
#> [3,]    1.6   0         0       0
#> 
#> 
#> $mat_Two
#> $mat_Two$Boy
#>      Inflow Due Cover_due Outflow
#> [1,]    2.4   0         0       0
#> [2,]    2.5   0         0       0
#> [3,]    2.4   0         0       0
#> 
#> $mat_Two$Bat
#>      Inflow Due Cover_due Outflow
#> [1,]    2.4   0         0       0
#> [2,]    2.5   0         0       0
#> [3,]    2.4   0         0       0
```"
<details>
<summary>英文:</summary>
I&#39;m not sure if its faster, but definitely more compact would be to map everything out rather than a bunch of loops. Here is a pseudo-one liner that recreates your function. I recommend trying this out and doing some speed tests for yourself.
``` r
library(tidyverse)
seriesVector &lt;- function() {c(&quot;mat_One&quot;, &quot;mat_Two&quot;)}
matList &lt;- list(mat_One = c(&quot;Boy&quot;, &quot;Cat&quot;),mat_Two = c(&quot;Boy&quot;, &quot;Bat&quot;))
allocate &lt;- list(mat_One = c(0.6,0.5,0.4),mat_Two = c(0.4,0.5,0.6))
flowVector &lt;- c(6,5,4)
createBucket &lt;- function(){
map(allocate, \(alc) alc*flowVector) |&gt;
  map2(matList, \(alc, matL) expand.grid(V1 = matL, 
                                         Inflow = alc,
                                         Due = 0,
                                         Cover_due = 0,
                                         Outflow = 0)) |&gt;
  map(\(dfs) group_split(dfs, V1, .keep = FALSE) |&gt;
        map(as.matrix))  |&gt;
  map2(matList, ~set_names(.x, .y))
}
createBucket()
#&gt; $mat_One
#&gt; $mat_One$Boy
#&gt;      Inflow Due Cover_due Outflow
#&gt; [1,]    3.6   0         0       0
#&gt; [2,]    2.5   0         0       0
#&gt; [3,]    1.6   0         0       0
#&gt; 
#&gt; $mat_One$Cat
#&gt;      Inflow Due Cover_due Outflow
#&gt; [1,]    3.6   0         0       0
#&gt; [2,]    2.5   0         0       0
#&gt; [3,]    1.6   0         0       0
#&gt; 
#&gt; 
#&gt; $mat_Two
#&gt; $mat_Two$Boy
#&gt;      Inflow Due Cover_due Outflow
#&gt; [1,]    2.4   0         0       0
#&gt; [2,]    2.5   0         0       0
#&gt; [3,]    2.4   0         0       0
#&gt; 
#&gt; $mat_Two$Bat
#&gt;      Inflow Due Cover_due Outflow
#&gt; [1,]    2.4   0         0       0
#&gt; [2,]    2.5   0         0       0
#&gt; [3,]    2.4   0         0       0

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

在R中运行密集矩阵计算时，使用多个嵌套的for循环是否有不利之处？

问题

答案1

Train a classification model using the "rpart" and "caret" libraries in R with four classes: how to define accuracy metric

过滤掉数据框中特定列为零的行（R）

使用基于data.table的函数完成time.series

在R中将stubhead部分的标签居中对齐。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。