2023年3月3日 23:10:16go评论115阅读模式

英文:

How strongly can system software interfere with R package functionality?

问题

背景:

我正在尝试建立一个设置，可以创建一个持久的Docker镜像，该镜像将为R统计处理的核心提供支持。目前，我已经找到了如何安装我请求的R包的确切方法，但是，我不确定底层系统（在我这种情况下是Ubuntu 20.04）提供的软件对于在R中实现可重现性的相关性如何。我是通过apt-get install来安装的，但没有指定版本。

问题:

当我以后重新构建镜像时，使用所有相同的R包但可能不同的系统库，会对R包的功能性产生什么影响？
影响有多大，有什么解决方法？

感谢任何指导。

英文:

Background:

I am trying to establish a setup where I can create a persistent build for a docker image which will in turn power the core of an R Statistics process. At this point I've figured out how to install exactly the R Packages that I am requesting, however, I do wonder how relevant the software supplied by the underlying system (in my case Ubuntu 20.04) is in regard to reproducibility in R. I am installing via apt-get install but without version specification there.

Questions:

What can happen, in regard to R Package functionality, when I rebuild the image later with all the same R packages specified but potentially different system libraries?
How big can the influence be and what are the remedies?

Any guidance is appreciated.

答案1

得分: 2

此问题非常广泛/模糊，但您可能首先需要担心以下几点：

线性代数库（BLAS/LAPACK）
编译器版本

除此之外，这将取决于您加载的包是否使用其他系统库（请参阅包的DESCRIPTION文件中的SystemRequirements:字段，或查看CRAN网页）。例如，sf（用于空间数据处理的包）列出了以下内容：

C++11、GDAL（>= 2.0.1）、GEOS（>= 3.4.0）、PROJ（>= 4.8.0）、sqlite3

仅针对前两者（编译器/线性代数），差异将在浮点精度级别上出现。在数值方法稳定和问题良好定义的情况下，差异仅在您可以通过使用浮点比较的标准最佳实践来减轻的程度上出现（例如，使用all.equal()而不是==或identical()）。

英文:

This question is quite broad/vague, but you should probably worry first about

linear algebra libraries (BLAS/LAPACK)
compiler versions

Beyond that, it will depend on whether the packages you are loading use additional system libraries (see the SystemRequirements: field in the DESCRIPTION file of the package, or on the CRAN web page). For example, sf (a package for spatial data processing) lists

C++11, GDAL (&gt;= 2.0.1), GEOS (&gt;= 3.4.0), PROJ (&gt;= 4.8.0), sqlite3

Speaking only for the first two (compiler/lin alg), the differences will be at the floating-point precision level. To the extent that the numerical methods used are robust/statistical problems you're working with are stable and well-posed, the differences will only be at the level that you can mitigate by using standard best practices for floating-point comparison (e.g., using all.equal() rather than == or identical()).

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

系统软件能对R包功能造成多大干扰？

问题

背景:

问题:

Background:

Questions:

答案1

“Memory used”指标：Go工具pprof与docker stats的比较。

golang与cgo一起使用时出现错误：collect2: error: ld returned 1 exit status。

How to animate a bar plot in R that represents one variable increasing over time, synchronised with a second animation?

无法在运行docker-compose时找到services package.json。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。