在R中运行密集矩阵计算时,使用多个嵌套的for循环是否有不利之处?

huangapple go评论62阅读模式
英文:

Is there a downside to using multiple nested for-loops in R for running intensive matrix calculations?

问题

在下面的示例代码中,我使用了3个嵌套的for循环来将元素列表(来自matList列表对象)扩展为一系列矩阵,其中包括下面列出的向量和代码生成/扩展的矩阵之间的简单计算。这通过执行函数createBucket()来实现。代码按预期工作,运行函数createBucket(3)的输出显示在底部的图像中。

这只是从更大的代码中简单提取的一部分。在更大的代码中,计算是密集的,涉及许多矩阵和向量之间的数据传输。更大的代码处理大型数据集。我也理解,for循环不一定比apply函数慢。

嵌套多个for循环像这样是否有缺点?在这个示例派生的更大的代码中,有更多的循环和嵌套。我尝试过lapply(),但是for循环的优势在于可以通过逐行执行代码的方式开发算法和逐行排查问题,以使其正常工作。它更容易可视化和理解。

此外,其次,是否有其他更简洁但可理解的方法来实现我在createBucket()中所做的事情?

seriesVector <- function() {c("mat_One", "mat_Two")}
matList <- list(mat_One = c("Boy", "Cat"), mat_Two = c("Boy", "Bat"))
allocate <- list(mat_One = c(0.6,0.5,0.4), mat_Two = c(0.4,0.5,0.6))
flowVector <- c(6,5,4)

createBucket <- function(nbr_rows) {
  series <- seriesVector()
  flowMat <- vector("list", length = length(series))
  
  # 遍历"seriesVector"中列出的每个系列
  for (i in seq_along(series)) {
    series_name <- series[i]
    mat_elements <- matList[[series_name]]
    mat_list <- vector("list", length = length(mat_elements))
    
    # 遍历"matList"中每个系列列出的每个向量元素(例如Boy/Cat等)
    for (j in seq_along(mat_elements)) {
      element <- mat_elements[j]
      mat <- matrix(0, nbr_rows, 4)
      colnames(mat) <- c('Inflow','Due','Cover_due','Outflow')
      
      # 遍历矩阵的行数,以及每个系列和矩阵中的每个元素,按矩阵中的行数和每个系列和矩阵中的元素计算列
      for (k in 1:nbr_rows) {
        allocate_value <- allocate[[series_name]][k]
        flow_value <- flowVector[k]
        mat[k, "Inflow"] <- allocate_value * flow_value
      }
      
      mat_list[[j]] <- mat
    }
    
    names(mat_list) <- mat_elements
    flowMat[[i]] <- setNames(mat_list, mat_elements)
  }
  
  names(flowMat) <- series
  return(flowMat)
}
createBucket(3)
英文:

In the example code presented below, I use 3 nested for-loops to expand a list of elements (from the matList list object) into a series of matrices, with example simple calculations between the below listed vectors and the code-generated/expanded matrices. This is done by executing function createBucket(). The code works as intended, with the output of running the function createBucket(3) shown in the image at the bottom.

This is a simple extraction from larger code. The calculations in the larger code are intensive with many matrices and vectors "speaking to each other" with the transfer of data between them. The larger code works with large datasets. I also understand that for-loops are not necessarily slower than apply functions.

Is there a downside to nesting multiple for-loops like this? There are more loops and nesting in the larger code this example derives from. I tried lapply() but the advantage of a for-loop is that it is easier to develop algorithms and troubleshoot by executing each line of code, line by line, to get it working right. It is easier to visualize and understand.

Also, secondarily, are there other more streamlined, yet comprehensible, ways to get at what I'm doing with createBucket()?

Code:

seriesVector &lt;- function() {c(&quot;mat_One&quot;, &quot;mat_Two&quot;)}
matList &lt;- list(mat_One = c(&quot;Boy&quot;, &quot;Cat&quot;),mat_Two = c(&quot;Boy&quot;, &quot;Bat&quot;))
allocate &lt;- list(mat_One = c(0.6,0.5,0.4),mat_Two = c(0.4,0.5,0.6))
flowVector &lt;- c(6,5,4)

createBucket &lt;- function(nbr_rows) {
  series &lt;- seriesVector()
  flowMat &lt;- vector(&quot;list&quot;, length = length(series))
  
  # sequences through each of the series listed in &quot;seriesVector&quot;
  for (i in seq_along(series)) {
    series_name &lt;- series[i]
    mat_elements &lt;- matList[[series_name]]
    mat_list &lt;- vector(&quot;list&quot;, length = length(mat_elements))
    
    # sequences through each vector element (Boy/Cat/etc.) for each series listed in &quot;matList&quot;
    for (j in seq_along(mat_elements)) {
      element &lt;- mat_elements[j]
      mat &lt;- matrix(0, nbr_rows, 4)
      colnames(mat) &lt;- c(&#39;Inflow&#39;,&#39;Due&#39;,&#39;Cover_due&#39;,&#39;Outflow&#39;)
      
      # sequences through each element of &quot;allocate&quot; and &quot;flowVector&quot; by the number of rows in
      # the matrices, and for each series and each element in the matrices calculates columns
      for (k in 1:nbr_rows) {
        allocate_value &lt;- allocate[[series_name]][k]
        flow_value &lt;- flowVector[k]
        mat[k, &quot;Inflow&quot;] &lt;- allocate_value * flow_value
      }
      
      mat_list[[j]] &lt;- mat
    }
    
    names(mat_list) &lt;- mat_elements
    flowMat[[i]] &lt;- setNames(mat_list, mat_elements)
  }
  
  names(flowMat) &lt;- series
  return(flowMat)
}
createBucket(3)

在R中运行密集矩阵计算时,使用多个嵌套的for循环是否有不利之处?

答案1

得分: 2

以下是翻译好的部分:

"我不确定是否更快,但肯定更紧凑的方法是映射出所有内容,而不是一堆循环。这里是一个伪一行代码,重新创建了您的函数。我建议尝试这个方法,自己进行一些速度测试。

library(tidyverse)

seriesVector <- function() {c("mat_One", "mat_Two")}
matList <- list(mat_One = c("Boy", "Cat"),mat_Two = c("Boy", "Bat"))
allocate <- list(mat_One = c(0.6,0.5,0.4),mat_Two = c(0.4,0.5,0.6))
flowVector <- c(6,5,4)

createBucket <- function(){
map(allocate, \(alc) alc*flowVector) |>
  map2(matList, \(alc, matL) expand.grid(V1 = matL, 
                                         Inflow = alc,
                                         Due = 0,
                                         Cover_due = 0,
                                         Outflow = 0)) |>
  map(\(dfs) group_split(dfs, V1, .keep = FALSE) |>
        map(as.matrix))  |>
  map2(matList, ~set_names(.x, .y))
}

createBucket()
#> $mat_One
#> $mat_One$Boy
#>      Inflow Due Cover_due Outflow
#> [1,]    3.6   0         0       0
#> [2,]    2.5   0         0       0
#> [3,]    1.6   0         0       0
#> 
#> $mat_One$Cat
#>      Inflow Due Cover_due Outflow
#> [1,]    3.6   0         0       0
#> [2,]    2.5   0         0       0
#> [3,]    1.6   0         0       0
#> 
#> 
#> $mat_Two
#> $mat_Two$Boy
#>      Inflow Due Cover_due Outflow
#> [1,]    2.4   0         0       0
#> [2,]    2.5   0         0       0
#> [3,]    2.4   0         0       0
#> 
#> $mat_Two$Bat
#>      Inflow Due Cover_due Outflow
#> [1,]    2.4   0         0       0
#> [2,]    2.5   0         0       0
#> [3,]    2.4   0         0       0
```"

<details>
<summary>英文:</summary>

I&#39;m not sure if its faster, but definitely more compact would be to map everything out rather than a bunch of loops. Here is a pseudo-one liner that recreates your function. I recommend trying this out and doing some speed tests for yourself.

``` r
library(tidyverse)

seriesVector &lt;- function() {c(&quot;mat_One&quot;, &quot;mat_Two&quot;)}
matList &lt;- list(mat_One = c(&quot;Boy&quot;, &quot;Cat&quot;),mat_Two = c(&quot;Boy&quot;, &quot;Bat&quot;))
allocate &lt;- list(mat_One = c(0.6,0.5,0.4),mat_Two = c(0.4,0.5,0.6))
flowVector &lt;- c(6,5,4)


createBucket &lt;- function(){
map(allocate, \(alc) alc*flowVector) |&gt;
  map2(matList, \(alc, matL) expand.grid(V1 = matL, 
                                         Inflow = alc,
                                         Due = 0,
                                         Cover_due = 0,
                                         Outflow = 0)) |&gt;
  map(\(dfs) group_split(dfs, V1, .keep = FALSE) |&gt;
        map(as.matrix))  |&gt;
  map2(matList, ~set_names(.x, .y))
}

createBucket()
#&gt; $mat_One
#&gt; $mat_One$Boy
#&gt;      Inflow Due Cover_due Outflow
#&gt; [1,]    3.6   0         0       0
#&gt; [2,]    2.5   0         0       0
#&gt; [3,]    1.6   0         0       0
#&gt; 
#&gt; $mat_One$Cat
#&gt;      Inflow Due Cover_due Outflow
#&gt; [1,]    3.6   0         0       0
#&gt; [2,]    2.5   0         0       0
#&gt; [3,]    1.6   0         0       0
#&gt; 
#&gt; 
#&gt; $mat_Two
#&gt; $mat_Two$Boy
#&gt;      Inflow Due Cover_due Outflow
#&gt; [1,]    2.4   0         0       0
#&gt; [2,]    2.5   0         0       0
#&gt; [3,]    2.4   0         0       0
#&gt; 
#&gt; $mat_Two$Bat
#&gt;      Inflow Due Cover_due Outflow
#&gt; [1,]    2.4   0         0       0
#&gt; [2,]    2.5   0         0       0
#&gt; [3,]    2.4   0         0       0

huangapple
  • 本文由 发表于 2023年6月12日 01:58:11
  • 转载请务必保留本文链接:https://go.coder-hub.com/76451792.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定