英文:
How strongly can system software interfere with R package functionality?
问题
背景:
我正在尝试建立一个设置,可以创建一个持久的Docker镜像,该镜像将为R统计处理的核心提供支持。目前,我已经找到了如何安装我请求的R包的确切方法,但是,我不确定底层系统(在我这种情况下是Ubuntu 20.04)提供的软件对于在R中实现可重现性的相关性如何。我是通过apt-get install
来安装的,但没有指定版本。
问题:
- 当我以后重新构建镜像时,使用所有相同的R包但可能不同的系统库,会对R包的功能性产生什么影响?
- 影响有多大,有什么解决方法?
感谢任何指导。
英文:
Background:
I am trying to establish a setup where I can create a persistent build for a docker image which will in turn power the core of an R Statistics process. At this point I've figured out how to install exactly the R Packages that I am requesting, however, I do wonder how relevant the software supplied by the underlying system (in my case Ubuntu 20.04) is in regard to reproducibility in R. I am installing via apt-get install
but without version specification there.
Questions:
- What can happen, in regard to R Package functionality, when I rebuild the image later with all the same R packages specified but potentially different system libraries?
- How big can the influence be and what are the remedies?
Any guidance is appreciated.
答案1
得分: 2
此问题非常广泛/模糊,但您可能首先需要担心以下几点:
- 线性代数库(BLAS/LAPACK)
- 编译器版本
除此之外,这将取决于您加载的包是否使用其他系统库(请参阅包的DESCRIPTION
文件中的SystemRequirements:
字段,或查看CRAN网页)。例如,sf
(用于空间数据处理的包)列出了以下内容:
C++11、GDAL(>= 2.0.1)、GEOS(>= 3.4.0)、PROJ(>= 4.8.0)、sqlite3
仅针对前两者(编译器/线性代数),差异将在浮点精度级别上出现。在数值方法稳定和问题良好定义的情况下,差异仅在您可以通过使用浮点比较的标准最佳实践来减轻的程度上出现(例如,使用all.equal()
而不是==
或identical()
)。
英文:
This question is quite broad/vague, but you should probably worry first about
- linear algebra libraries (BLAS/LAPACK)
- compiler versions
Beyond that, it will depend on whether the packages you are loading use additional system libraries (see the SystemRequirements:
field in the DESCRIPTION
file of the package, or on the CRAN web page). For example, sf
(a package for spatial data processing) lists
C++11, GDAL (>= 2.0.1), GEOS (>= 3.4.0), PROJ (>= 4.8.0), sqlite3
Speaking only for the first two (compiler/lin alg), the differences will be at the floating-point precision level. To the extent that the numerical methods used are robust/statistical problems you're working with are stable and well-posed, the differences will only be at the level that you can mitigate by using standard best practices for floating-point comparison (e.g., using all.equal()
rather than ==
or identical()
).
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论