2023年6月9日 01:11:44go评论107阅读模式

英文:

Is using the lapply function worthwhile in lieu of a for-loop when building complex lists with multiple conditionals?

问题

在下面的示例代码中，我创建了一个名为createBucket的函数，该函数遍历一个向量（dfVector）和一个包含两个子列表数据框（"DFOne"和"DFTwo"）的列表（dfList）。该函数为每个dfList子列表数据框创建另一个虚拟数据框列表，其中找到元素"Boy"。这个示例代码按预期工作。

这只是我正在处理的代码的简化版本。在实际代码中，dfVector和dfList的等效部分是响应性的，根据Shiny输入而扩展和收缩。函数读取其他列表，函数在遍历向量和列表时还会施加其他条件。还有计算，这些计算不同于这个示例，这个示例为了简化而将子列表数据框填充为零。

考虑到这个函数有多么复杂，是否建议使用lapply()或其他apply系列函数？速度很重要，但由这个函数及相关函数生成的最终数据框不会被视为"大数据"（120行100多列）。我如何在下面的代码中使用lapply()？我可以使用for循环与lapply()运行速度测试。

代码：

dfVector <- function(){c("DF One","DF Two")}
dfList <- list(DFOne = c("Boy","Cat","Dog"),DFTwo = c("Boy","Rat","Bat"))
createBucket <- function(nbr_rows) {
  series <- gsub("\\s+", "", dfVector())
  buckets <- list()
  
  for (i in seq_along(series)) {
    series_name <- series[i]
    dfListOrder <- dfList[[series_name]]
    
    if ("Boy" %in% dfListOrder) {
      df_name <- paste0("bucket", gsub("\\s+", "", series_name))
      bucket <- data.frame(
        A = rep(0, nbr_rows),
        B = rep(0, nbr_rows),
        check.names = FALSE
      )
      buckets[[df_name]] <- bucket
    }
  }
  if (length(buckets) > 0) {return(buckets)} else {return(NULL)}
}
result <- createBucket(10)
result

希望这对你有所帮助。

英文:

In the example code below I create a function createBucket that reads through a vector (dfVector) and a list (dfList) comprised of two sublist dataframes, "DFOne" and "DFTwo". The function creates another list of dummy dataframes for each dfList sublist dataframe where it finds the element "Boy". This example code works as intended.

This is a simplification of the code I am working on. In the actual code, the equivalents of dfVector and dfList are reactive, expanding and contracting depending on Shiny inputs. There are other lists that the function reads, and there are other conditionals imposed as the vectors and lists are read through by the function. There are also calculations that feed from one sublist to another, instead of filling the sublist dataframes with zeroes as this example does for the sake of simplicity.

Given how much is going on with this function, is using lapply() or another apply family function advisable? Speed is important, but the ultimate dataframe generated by this and related functions won't qualify for "big data" (120 rows by 100+ columns). How could I use lapply() in the below? I could run speed tests with the for-loop versus lapply().

Code:

dfVector &lt;- function(){c(&quot;DF One&quot;,&quot;DF Two&quot;)}
dfList &lt;- list(DFOne = c(&quot;Boy&quot;,&quot;Cat&quot;,&quot;Dog&quot;),DFTwo = c(&quot;Boy&quot;,&quot;Rat&quot;,&quot;Bat&quot;))
createBucket &lt;- function(nbr_rows) {
  series &lt;- gsub(&quot;\\s+&quot;, &quot;&quot;, dfVector())
  buckets &lt;- list()
  
  for (i in seq_along(series)) {
    series_name &lt;- series[i]
    dfListOrder &lt;- dfList[[series_name]]
    
    if (&quot;Boy&quot; %in% dfListOrder) {
      df_name &lt;- paste0(&quot;bucket&quot;, gsub(&quot;\\s+&quot;, &quot;&quot;, series_name))
      bucket &lt;- data.frame(
        A = rep(0, nbr_rows),
        B = rep(0, nbr_rows),
        check.names = FALSE
      )
      buckets[[df_name]] &lt;- bucket
    }
  }
  if (length(buckets) &gt; 0) {return(buckets)} else {return(NULL)}
  }
result &lt;- createBucket(10)
result

答案1

得分: 3

以下是您要翻译的代码部分：

one approach:

createBucket2 <- function(nbr_rows){
series <- gsub("\s+", "", dfVector())
series |
lapply(FUN = (series_name){
if('Boy' %in% dfList[[series_name]]){
## here's the actual performance boost:
as.data.frame(matrix(0, nbr_rows, 2)) |
setNames(nm = c('A', 'B'))
}
}) |
setNames(nm = paste0('bucket', series)) |
((.) list(NULL, .)[[1 + (length(.) > 0)]])()
}

identical(createBucket(10), createBucket2(10))
1 TRUE

**edit** as for speed differences, the `lapply` variant would be about 10% faster than the `loop` variant (not shown) but the *real boost* in performance - three times as fast - comes from [creating the bucket dataframe via][1] `as.data.frame(matrix(...))` rather than via `data.frame(...)`.
loop variant: 314.8 μs
lapply variant: 77.2 μs
(in microseconds, median of 5000 runs using {microbenchmark})

英文:

one approach:

createBucket2 &lt;- function(nbr_rows){
  series &lt;- gsub(&quot;\\s+&quot;, &quot;&quot;, dfVector())
  series |&gt;
    lapply(FUN = \(series_name){
      if(&#39;Boy&#39; %in% dfList[[series_name]]){
        ## here&#39;s the actual performance boost:
        as.data.frame(matrix(0, nbr_rows, 2)) |&gt;
          setNames(nm = c(&#39;A&#39;, &#39;B&#39;))
      }
    }) |&gt;
    setNames(nm = paste0(&#39;bucket&#39;, series)) |&gt;
    (\(.) list(NULL, .)[[1 + (length(.) &gt; 0)]])()
}

&gt; identical(createBucket(10), createBucket2(10))
[1] TRUE

edit as for speed differences, the lapply variant would be about 10% faster than the loop variant (not shown) but the real boost in performance - three times as fast - comes from creating the bucket dataframe via as.data.frame(matrix(...)) rather than via data.frame(...).

loop variant: 314.8 µs

lapply variant: 77.2 µs

(in microseconds, median of 5000 runs using {microbenchmark})

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

使用lapply函数在构建带有多个条件的复杂列表时是否值得代替for循环？

问题

答案1

dplyr::coalesce在列名缺失时抛出错误。

在每个包含模式的字符串之前插入字符串。

从数据框中删除符合多个条件的行

遍历列表的筛选值

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。