
huangapple go评论68阅读模式

Is using the lapply function worthwhile in lieu of a for-loop when building complex lists with multiple conditionals?






dfVector <- function(){c("DF One","DF Two")}

dfList <- list(DFOne = c("Boy","Cat","Dog"),DFTwo = c("Boy","Rat","Bat"))

createBucket <- function(nbr_rows) {
  series <- gsub("\\s+", "", dfVector())
  buckets <- list()
  for (i in seq_along(series)) {
    series_name <- series[i]
    dfListOrder <- dfList[[series_name]]
    if ("Boy" %in% dfListOrder) {
      df_name <- paste0("bucket", gsub("\\s+", "", series_name))
      bucket <- data.frame(
        A = rep(0, nbr_rows),
        B = rep(0, nbr_rows),
        check.names = FALSE
      buckets[[df_name]] <- bucket
  if (length(buckets) > 0) {return(buckets)} else {return(NULL)}

result <- createBucket(10)



In the example code below I create a function createBucket that reads through a vector (dfVector) and a list (dfList) comprised of two sublist dataframes, "DFOne" and "DFTwo". The function creates another list of dummy dataframes for each dfList sublist dataframe where it finds the element "Boy". This example code works as intended.

This is a simplification of the code I am working on. In the actual code, the equivalents of dfVector and dfList are reactive, expanding and contracting depending on Shiny inputs. There are other lists that the function reads, and there are other conditionals imposed as the vectors and lists are read through by the function. There are also calculations that feed from one sublist to another, instead of filling the sublist dataframes with zeroes as this example does for the sake of simplicity.

Given how much is going on with this function, is using lapply() or another apply family function advisable? Speed is important, but the ultimate dataframe generated by this and related functions won't qualify for "big data" (120 rows by 100+ columns). How could I use lapply() in the below? I could run speed tests with the for-loop versus lapply().


dfVector &lt;- function(){c(&quot;DF One&quot;,&quot;DF Two&quot;)}

dfList &lt;- list(DFOne = c(&quot;Boy&quot;,&quot;Cat&quot;,&quot;Dog&quot;),DFTwo = c(&quot;Boy&quot;,&quot;Rat&quot;,&quot;Bat&quot;))

createBucket &lt;- function(nbr_rows) {
  series &lt;- gsub(&quot;\\s+&quot;, &quot;&quot;, dfVector())
  buckets &lt;- list()
  for (i in seq_along(series)) {
    series_name &lt;- series[i]
    dfListOrder &lt;- dfList[[series_name]]
    if (&quot;Boy&quot; %in% dfListOrder) {
      df_name &lt;- paste0(&quot;bucket&quot;, gsub(&quot;\\s+&quot;, &quot;&quot;, series_name))
      bucket &lt;- data.frame(
        A = rep(0, nbr_rows),
        B = rep(0, nbr_rows),
        check.names = FALSE
      buckets[[df_name]] &lt;- bucket
  if (length(buckets) &gt; 0) {return(buckets)} else {return(NULL)}

result &lt;- createBucket(10)


得分: 3


one approach:

createBucket2 <- function(nbr_rows){
series <- gsub("\s+", "", dfVector())
series |
lapply(FUN = (series_name){
if('Boy' %in% dfList[[series_name]]){
## here's the actual performance boost:
as.data.frame(matrix(0, nbr_rows, 2)) |
setNames(nm = c('A', 'B'))
}) |
setNames(nm = paste0('bucket', series)) |
((.) list(NULL, .)[[1 + (length(.) > 0)]])()

identical(createBucket(10), createBucket2(10))

**edit** as for speed differences, the `lapply` variant would be about 10% faster than the `loop` variant (not shown) but the *real boost* in performance - three times as fast - comes from [creating the bucket dataframe via][1] `as.data.frame(matrix(...))` rather than via `data.frame(...)`.

loop variant: 314.8 μs

lapply variant: 77.2 μs

(in microseconds, median of 5000 runs using {microbenchmark})

one approach:

createBucket2 &lt;- function(nbr_rows){
  series &lt;- gsub(&quot;\\s+&quot;, &quot;&quot;, dfVector())
  series |&gt;
    lapply(FUN = \(series_name){
      if(&#39;Boy&#39; %in% dfList[[series_name]]){
        ## here&#39;s the actual performance boost:
        as.data.frame(matrix(0, nbr_rows, 2)) |&gt;
          setNames(nm = c(&#39;A&#39;, &#39;B&#39;))
    }) |&gt;
    setNames(nm = paste0(&#39;bucket&#39;, series)) |&gt;
    (\(.) list(NULL, .)[[1 + (length(.) &gt; 0)]])()
&gt; identical(createBucket(10), createBucket2(10))
[1] TRUE 

edit as for speed differences, the lapply variant would be about 10% faster than the loop variant (not shown) but the real boost in performance - three times as fast - comes from creating the bucket dataframe via as.data.frame(matrix(...)) rather than via data.frame(...).

loop variant: 314.8 µs

lapply variant: 77.2 µs

(in microseconds, median of 5000 runs using {microbenchmark})

  • 本文由 发表于 2023年6月9日 01:11:44
  • 转载请务必保留本文链接:https://go.coder-hub.com/76434238.html



:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:
