共享库在使用R中的并行foreach时未被识别。

huangapple go评论64阅读模式
英文:

Shared library is not recognized when using parallel foreach in R

问题

你在我的R代码中使用了一个用C语言编写的共享库。我使用dyn.load命令加载了编译后的共享库。我打算在一个并行的foreach循环中调用一个共享库函数。以下是我的代码:

library(foreach)
library(doParallel)

totalCores = detectCores()
cluster <- makeCluster(totalCores[1]-1)
registerDoParallel(cluster)

dyn.load("package.so")

run <- function(i) {
    row <- data[i,]
    res <- .Call("c_function", as.double(row))
    return(res)
}

result <- foreach(i=1:nrow(data), .combine = rbind) %dopar% {
  run(i)
}

我得到了以下错误:

Error in { : 
  task 1 failed - "C symbol name "c_function" not in load table"

尽管我已经加载了共享库,但似乎在并行任务中无法识别c_function。当然,如果我在foreach循环中让dyn.load命令,问题就解决了:

result <- foreach(i=1:nrow(data), .combine = rbind) %dopar% {
  dyn.load("package.so")
  run(i)
}

但我不确定这是否是最佳实践,因为在每次迭代中都会加载共享库(package.so),这可能效率不高。有什么想法吗?

编辑:

关于r2even的答案,我测试了以下代码:

foreach(i=1:50,.packages='rootSolve') %dopar% {
  print(is.loaded("c_function"))
}

我的电脑有10个CPU核心(20个线程),所以在执行此代码时,totalCores变量的值是20。以下是结果:

[[1]]
[1] TRUE

[[2]]
[1] TRUE

[[3]]
[1] TRUE

[[4]]
[1] TRUE

[[5]]
[1] TRUE

[[6]]
[1] TRUE

[[7]]
[1] TRUE

[[8]]
[1] TRUE

[[9]]
[1] TRUE

[[10]]
[1] TRUE

[[11]]
[1] FALSE

[[12]]
[1] FALSE

[[13]]
[1] FALSE

[[14]]
[1] FALSE

[[15]]
[1] FALSE

[[16]]
[1] FALSE

[[17]]
[1] FALSE

[[18]]
[1] FALSE

[[19]]
[1] FALSE

[[20]]
[1] TRUE

[[21]]
[1] TRUE

[[22]]
[1] TRUE

[[23]]
[1] TRUE

[[24]]
[1] TRUE

[[25]]
[1] TRUE

[[26]]
[1] TRUE

[[27]]
[1] TRUE

[[28]]
[1] TRUE

[[29]]
[1] TRUE

[[30]]
[1] TRUE

[[31]]
[1] TRUE

[[32]]
[1] TRUE

[[33]]
[1] TRUE

[[34]]
[1] TRUE

[[35]]
[1] TRUE

[[36]]
[1] TRUE

[[37]]
[1] TRUE

[[38]]
[1] TRUE

[[39]]
[1] TRUE

[[40]]
[1] TRUE

[[41]]
[1] TRUE

[[42]]
[1] TRUE

[[43]]
[1] TRUE

[[44]]
[1] TRUE

[[45]]
[1] TRUE

[[46]]
[1] TRUE

[[47]]
[1] TRUE

[[48]]
[1] TRUE

[[49]]
[1] TRUE

[[50]]
[1] TRUE

这引发了一些问题。is.loaded("c_function")的值是否总是在迭代11到20之间为False?从第21次迭代开始是否保证它总是为True?

英文:

I am using a shared library written in C in my R code. I load the compiled shared library using dyn.load command. I am going to call a shared library function in a parallelized foreach loop. Here is my code:

library(foreach)
library(doParallel)

totalCores = detectCores()
cluster &lt;- makeCluster(totalCores[1]-1)
registerDoParallel(cluster)

dyn.load(&quot;package.so&quot;)

run &lt;- function(i) {
    row &lt;- data[i,]
    res &lt;- .Call(&quot;c_function&quot;, as.double(row))
    return(res)
}


result &lt;- foreach(i=1:nrow(data), .combine = rbind) %dopar% {
  run(i)
}

I get the following error:

Error in { :
  task 1 failed - &quot;C symbol name &quot;c_function&quot; not in load table&quot;

Although I have loaded the shared library, it seems c_function is not recognized in the parallel tasks. Of course when I let the dyn.load command in the foreach loop the problem is solved:

result &lt;- foreach(i=1:nrow(data), .combine = rbind) %dopar% {
  dyn.load(&quot;package.so&quot;)
  run(i)
}

But I am not sure if this is the best practice since at each iteration the shared library (package.so) is loaded and it may be not efficient. Any ideas?

Edit:

Regarding to r2even's answer I tested the following code:

foreach(i=1:50,.packages=&#39;rootSolve&#39;) %dopar% {
  print(is.loaded(&quot;c_function&quot;))
}

My PC has 10 CPU cores (20 threads) so the value of totalCores variable was 20 when I executed this code. Here is the result:

[[1]]
[1] TRUE

[[2]]
[1] TRUE

[[3]]
[1] TRUE

[[4]]
[1] TRUE

[[5]]
[1] TRUE

[[6]]
[1] TRUE

[[7]]
[1] TRUE

[[8]]
[1] TRUE

[[9]]
[1] TRUE

[[10]]
[1] TRUE

[[11]]
[1] FALSE

[[12]]
[1] FALSE

[[13]]
[1] FALSE

[[14]]
[1] FALSE

[[15]]
[1] FALSE

[[16]]
[1] FALSE

[[17]]
[1] FALSE

[[18]]
[1] FALSE

[[19]]
[1] FALSE

[[20]]
[1] TRUE

[[21]]
[1] TRUE

[[22]]
[1] TRUE

[[23]]
[1] TRUE

[[24]]
[1] TRUE

[[25]]
[1] TRUE

[[26]]
[1] TRUE

[[27]]
[1] TRUE

[[28]]
[1] TRUE

[[29]]
[1] TRUE

[[30]]
[1] TRUE

[[31]]
[1] TRUE

[[32]]
[1] TRUE

[[33]]
[1] TRUE

[[34]]
[1] TRUE

[[35]]
[1] TRUE

[[36]]
[1] TRUE

[[37]]
[1] TRUE

[[38]]
[1] TRUE

[[39]]
[1] TRUE

[[40]]
[1] TRUE

[[41]]
[1] TRUE

[[42]]
[1] TRUE

[[43]]
[1] TRUE

[[44]]
[1] TRUE

[[45]]
[1] TRUE

[[46]]
[1] TRUE

[[47]]
[1] TRUE

[[48]]
[1] TRUE

[[49]]
[1] TRUE

[[50]]
[1] TRUE

It raises several questions. Is the value of is.loaded("c_function") always False only in the the iterations 11 to 20? Is it guaranteed that from the iteration 21 to the rest it is always true?

答案1

得分: 1

尝试这个hack,几乎肯定比重复调用dyn.load要高效得多:

result <- foreach(i=1:nrow(data), .combine = rbind) %dopar% {
  if (!is.loaded("c_function")) dyn.load("package.so")
  run(i)
}

(未经测试。)

英文:

Try this *hack*, almost certainly less inefficient than repeated calls to dyn.load:

result &lt;- foreach(i=1:nrow(data), .combine = rbind) %dopar% {
  if (!is.loaded(&quot;c_function&quot;)) dyn.load(&quot;package.so&quot;)
  run(i)
}

(Untested.)

huangapple
  • 本文由 发表于 2023年5月25日 16:58:32
  • 转载请务必保留本文链接:https://go.coder-hub.com/76330516.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定