系统软件能对R包功能造成多大干扰?

huangapple go评论70阅读模式
英文:

How strongly can system software interfere with R package functionality?

问题

背景:

我正在尝试建立一个设置,可以创建一个持久的Docker镜像,该镜像将为R统计处理的核心提供支持。目前,我已经找到了如何安装我请求的R包的确切方法,但是,我不确定底层系统(在我这种情况下是Ubuntu 20.04)提供的软件对于在R中实现可重现性的相关性如何。我是通过apt-get install来安装的,但没有指定版本。

问题:

  1. 当我以后重新构建镜像时,使用所有相同的R包但可能不同的系统库,会对R包的功能性产生什么影响?
  2. 影响有多大,有什么解决方法?

感谢任何指导。

英文:

Background:

I am trying to establish a setup where I can create a persistent build for a docker image which will in turn power the core of an R Statistics process. At this point I've figured out how to install exactly the R Packages that I am requesting, however, I do wonder how relevant the software supplied by the underlying system (in my case Ubuntu 20.04) is in regard to reproducibility in R. I am installing via apt-get install but without version specification there.

Questions:

  1. What can happen, in regard to R Package functionality, when I rebuild the image later with all the same R packages specified but potentially different system libraries?
  2. How big can the influence be and what are the remedies?

Any guidance is appreciated.

答案1

得分: 2

此问题非常广泛/模糊,但您可能首先需要担心以下几点:

  • 线性代数库(BLAS/LAPACK)
  • 编译器版本

除此之外,这将取决于您加载的包是否使用其他系统库(请参阅包的DESCRIPTION文件中的SystemRequirements:字段,或查看CRAN网页)。例如,sf(用于空间数据处理的包)列出了以下内容:

C++11、GDAL(>= 2.0.1)、GEOS(>= 3.4.0)、PROJ(>= 4.8.0)、sqlite3

仅针对前两者(编译器/线性代数),差异将在浮点精度级别上出现。在数值方法稳定和问题良好定义的情况下,差异仅在您可以通过使用浮点比较的标准最佳实践来减轻的程度上出现(例如,使用all.equal()而不是==identical())。

英文:

This question is quite broad/vague, but you should probably worry first about

  • linear algebra libraries (BLAS/LAPACK)
  • compiler versions

Beyond that, it will depend on whether the packages you are loading use additional system libraries (see the SystemRequirements: field in the DESCRIPTION file of the package, or on the CRAN web page). For example, sf (a package for spatial data processing) lists

C++11, GDAL (>= 2.0.1), GEOS (>= 3.4.0), PROJ (>= 4.8.0), sqlite3

Speaking only for the first two (compiler/lin alg), the differences will be at the floating-point precision level. To the extent that the numerical methods used are robust/statistical problems you're working with are stable and well-posed, the differences will only be at the level that you can mitigate by using standard best practices for floating-point comparison (e.g., using all.equal() rather than == or identical()).

huangapple
  • 本文由 发表于 2023年3月3日 23:10:16
  • 转载请务必保留本文链接:https://go.coder-hub.com/75628770.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定