英文:
How to ensure reproducibility of nlme() function calls?
问题
以下是你要的内容的翻译:
我最近开始使用R包{nlme}来拟合具有随机效应的非线性混合效应模型。我发现多次运行完全相同的nlme()函数调用(即相同的代码行:相同的数据,模型和起始参数)可能会产生不同的结果,即一些尝试会收敛,而其他尝试则不会,即使收敛的模型在参数估计上也可能不同(尽管略有不同)。
即使在调用nlme()之前立即调用set.seed()时,这似乎仍然成立。以下是一个可重现的示例(在此问题的范围内!),它调用set.seed(10),然后调用nlme(),重复此过程100次(在我的笔记本电脑上大约需要2秒/迭代)。(它在tryCatch()语句中执行此操作,将警告提升为错误,因为根据我的有限经验,最终导致不收敛的警告会导致nlme() hang(挂起)很长时间;这将其分类为错误并跳过到下一个循环迭代。)当我在R的新实例中运行此代码时,5/100个模型收敛。
library(nlme)
# 为分析创建数据框
reprex_data <- structure(list(id = c(1L, 1L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L,
4L, 5L, 5L, 6L, 6L, 7L, 7L, 7L, 7L, 7L, 8L, 9L, 9L, 9L, 9L, 10L,
10L, 10L, 10L, 11L, 11L, 11L, 11L, 12L, 12L, 12L, 12L, 12L, 12L,
13L, 13L, 13L, 13L, 14L, 14L, 14L, 14L, 14L, 14L, 15L, 15L, 15L,
15L, 16L, 17L, 17L, 18L, 18L, 18L, 18L, 18L, 19L, 19L, 19L, 20L,
20L, 21L, 22L, 22L, 22L, 23L, 23L, 23L, 24L, 24L, 24L, 24L, 24L,
25L, 25L, 26L, 26L, 26L, 27L, 28L, 28L, 29L, 30L, 30L, 30L, 30L,
30L, 31L, 31L, 31L, 31L, 31L, 31L, 32L, 33L, 33L, 34L, 34L, 34L,
34L, 34L, 34L, 34L, 34L, 35L, 35L, 36L, 36L, 37L, 37L, 38L, 38L,
38L, 38L, 39L, 39L, 40L, 40L, 40L, 40L, 40L, 40L, 41L, 41L, 42L,
43L, 43L, 43L, 43L, 43L, 43L, 44L, 44L, 44L, 44L, 44L, 44L, 45L,
45L, 45L, 45L, 45L, 46L, 46L, 46L, 46L, 46L, 47L, 47L, 47L, 47L,
47L, 47L, 48L, 48L, 49L, 49L, 50L, 50L, 51L, 51L, 51L, 51L, 51L,
51L, 52L, 52L, 53L, 53L, 53L, 53L, 53L, 53L, 54L, 54L, 54L, 54L,
55L, 55L, 56L, 57L, 57L, 57L, 58L, 58L, 58L, 58L, 58L, 59L, 60L,
61L, 62L, 62L, 62L, 63L, 64L, 64L, 64L, 64L, 65L, 65L, 65L, 65L,
66L, 66L, 66L, 66L, 66L, 67L, 67L, 67L, 67L, 68L, 68L, 68L, 68L,
68L, 69L, 70L, 70L, 70L, 71L, 71L, 71L, 71L, 71L, 71L, 72L, 72L,
73L, 73L, 74L, 74L, 74L, 74L, 74L, 75L, 76L, 77L, 77L, 77L, 77L,
77L, 77L, 78L, 78L, 78L, 79L, 79L, 79L, 79L, 80L, 80L, 80L, 80L,
80L, 81L, 81L, 82L, 82L, 82L, 83L, 83L, 83L, 83L, 83L, 83L, 83L,
84L, 84L, 85L, 85L, 85L, 85L, 86L, 86L, 87L, 87L, 87L, 87L, 87L,
88L, 88L,
<details>
<summary>英文:</summary>
I recently began using the R package {nlme} to fit non-linear mixed-effects models with random effects. I have discovered that running *exactly the same* nlme() function call (i.e., the same line of code: same data, model, and starting parameters) multiple times may yield different results, in that some attempts will converge while others will not, and even converging models may differ (albeit slightly) in parameter estimates.
This appears to be true even when I call set.seed() immediately before the nlme() call. Here is a reproducible example (within the boundaries of this question!) that calls set.seed(10) and then calls nlme(), repeating this process 100 times (~2 seconds/iteration on my laptop). (It does this within a tryCatch() statement with warnings promoted to errors because, in my limited experience, warnings that ultimately lead to non-convergence cause nlme() to hang for extended periods of time; this classifies them as errors and skips to the next loop iteration.) When I ran this code in a fresh instance of R, 5/100 models converged.
library(nlme)
create df for analysis
reprex_data <- structure(list(id = c(1L, 1L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L,
4L, 5L, 5L, 6L, 6L, 7L, 7L, 7L, 7L, 7L, 8L, 9L, 9L, 9L, 9L, 10L,
10L, 10L, 10L, 11L, 11L, 11L, 11L, 12L, 12L, 12L, 12L, 12L, 12L,
13L, 13L, 13L, 13L, 14L, 14L, 14L, 14L, 14L, 14L, 15L, 15L, 15L,
15L, 16L, 17L, 17L, 18L, 18L, 18L, 18L, 18L, 19L, 19L, 19L, 20L,
20L, 21L, 22L, 22L, 22L, 23L, 23L, 23L, 24L, 24L, 24L, 24L, 24L,
25L, 25L, 26L, 26L, 26L, 27L, 28L, 28L, 29L, 30L, 30L, 30L, 30L,
30L, 31L, 31L, 31L, 31L, 31L, 31L, 32L, 33L, 33L, 34L, 34L, 34L,
34L, 34L, 34L, 34L, 34L, 35L, 35L, 36L, 36L, 37L, 37L, 38L, 38L,
38L, 38L, 39L, 39L, 40L, 40L, 40L, 40L, 40L, 40L, 41L, 41L, 42L,
43L, 43L, 43L, 43L, 43L, 43L, 44L, 44L, 44L, 44L, 44L, 44L, 45L,
45L, 45L, 45L, 45L, 46L, 46L, 46L, 46L, 46L, 47L, 47L, 47L, 47L,
47L, 47L, 48L, 48L, 49L, 49L, 50L, 50L, 51L, 51L, 51L, 51L, 51L,
51L, 52L, 52L, 53L, 53L, 53L, 53L, 53L, 53L, 54L, 54L, 54L, 54L,
55L, 55L, 56L, 57L, 57L, 57L, 58L, 58L, 58L, 58L, 58L, 59L, 60L,
61L, 62L, 62L, 62L, 63L, 64L, 64L, 64L, 64L, 65L, 65L, 65L, 65L,
66L, 66L, 66L, 66L, 66L, 67L, 67L, 67L, 67L, 68L, 68L, 68L, 68L,
68L, 69L, 70L, 70L, 70L, 71L, 71L, 71L, 71L, 71L, 71L, 72L, 72L,
73L, 73L, 74L, 74L, 74L, 74L, 74L, 75L, 76L, 77L, 77L, 77L, 77L,
77L, 77L, 78L, 78L, 78L, 79L, 79L, 79L, 79L, 80L, 80L, 80L, 80L,
80L, 81L, 81L, 82L, 82L, 82L, 83L, 83L, 83L, 83L, 83L, 83L, 83L,
84L, 84L, 85L, 85L, 85L, 85L, 86L, 86L, 87L, 87L, 87L, 87L, 87L,
88L, 88L, 89L, 89L, 89L, 89L, 89L), x = c(7, 7.33, 2.83, 3.33,
4.66, 5.25, 5.58, 6.08, 6.5, 7.08, 4.41, 5.41, 6.25, 0, 0.5,
1.66, 2.33, 2.75, 3.66, 4.16, 1.66, 2.75, 3.41, 3.75, 4.33, 4.91,
5.5, 6.83, 7.33, 2, 2.33, 2.75, 3.75, 2, 2.58, 3, 3.5, 4, 4.41,
2.41, 2.66, 3.08, 3.75, 5, 5.58, 6, 6.41, 6.91, 7.41, 5.33, 5.66,
6, 6.66, 3.75, 3.66, 4.33, 1.91, 2.5, 2.91, 3.83, 4.33, 1.25,
1.91, 2.25, 3.08, 3.75, 2.91, 7.25, 7.91, 8.33, 5, 5.33, 5.66,
0.83, 1.5, 1.83, 2.66, 3.16, 6.91, 7.66, 1.41, 2.08, 2.91, 3.5,
3.25, 3.5, 7.16, 5.66, 6.33, 6.58, 7.08, 7.58, 3.83, 4.41, 4.83,
5.33, 5.75, 6.25, 2.16, 5.08, 5.25, 5.58, 5.83, 6.25, 6.91, 7.25,
7.75, 8.16, 8.66, 6.33, 6.66, 2.16, 2.75, 5.41, 5.66, 4.41, 4.75,
5.16, 5.75, 6.83, 7.25, 2.75, 3.41, 3.75, 4.25, 4.75, 5.25, 2.33,
2.83, 1.5, 4.33, 4.66, 5.08, 5.75, 6.16, 6.66, 2.33, 2.91, 3.25,
3.75, 4.25, 4.75, 3.66, 3.91, 4.33, 5.16, 5.5, 1.75, 2.33, 2.66,
3.66, 4.16, 0.58, 1.16, 1.58, 2, 2.58, 3.08, 5.66, 6.41, 4.25,
4.5, 4.25, 4.41, 3.75, 4.33, 4.83, 5.33, 5.75, 6.16, 9.16, 9.5,
4.66, 5.25, 5.66, 6.08, 6.58, 7.16, 3.58, 4.16, 4.5, 5.16, 6.16,
6.5, 6.5, 6.08, 6.41, 6.75, 2.16, 2.75, 3.16, 3.58, 4.08, 3.41,
5.91, 0.91, 8, 8.33, 8.66, 3.33, 3.25, 3.91, 4.16, 4.66, 5, 5.66,
6, 6.5, 0.83, 1.83, 2.25, 2.75, 3.25, 2.25, 2.83, 3.16, 3.66,
3.41, 3.75, 4.16, 4.75, 5.08, 8.16, 4.41, 4.66, 5.08, 6.5, 6.75,
7.16, 7.83, 8.25, 8.66, 2.66, 3.33, 6.5, 6.75, 3.66, 4.25, 4.66,
5.16, 5.58, 2.41, 1.91, 2.08, 2.66, 3.16, 3.58, 4, 4.5, 5.25,
5.5, 5.91, 5.41, 6, 6.5, 7, 4.08, 4.66, 5.16, 5.58, 5.91, 6,
6.33, 1.58, 2.16, 2.58, 4.33, 4.58, 5, 5.58, 6, 6.41, 6.91, 6.58,
7.33, 5.41, 6.08, 6.33, 6.83, 2.83, 3, 1.08, 1.66, 2.16, 3, 3.5,
4.58, 5.16, 3.75, 4.33, 4.66, 5.16, 5.66), y = c(68, 71, 51,
61, 63, 72, 69, 69, 72, 71, 46, 49, 64, 24, 31, 33, 39, 43, 57,
58, 38, 59, 63, 64, 64, 50, 65, 66, 71, 55, 63, 56, 62, 26, 30,
39, 46, 39, 48, 49, 50, 57, 63, 67, 73, 78, 78, 91, 83, 74, 78,
71, 77, 76, 71, 63, 56, 62, 66, 62, 69, 17, 29, 42, 63, 67, 66,
71, 71, 70, 71, 77, 79, 19, 34, 38, 40, 50, 72, 78, 33, 44, 55,
67, 53, 66, 70, 75, 71, 72, 81, 81, 68, 63, 70, 58, 77, 79, 39,
81, 71, 67, 49, 54, 65, 73, 72, 78, 79, 77, 84, 60, 63, 62, 61,
67, 62, 68, 77, 65, 65, 53, 68, 68, 70, 59, 76, 50, 62, 48, 58,
51, 63, 62, 62, 67, 56, 65, 65, 71, 70, 71, 74, 76, 79, 70, 62,
59, 64, 64, 68, 77, 13, 25, 31, 31, 34, 57, 67, 76, 80, 82, 54,
62, 56, 56, 58, 64, 71, 75, 92, 92, 61, 66, 53, 72, 81, 81, 59,
59, 65, 69, 71, 68, 65, 33, 36, 34, 37, 37, 51, 53, 58, 60, 58,
22, 91, 90, 88, 61, 59, 73, 76, 81, 67, 72, 64, 77, 31, 48, 57,
61, 72, 61, 64, 68, 82, 61, 65, 63, 59, 66, 90, 62, 63, 67, 56,
58, 67, 62, 69, 60, 61, 61, 67, 84, 56, 58, 63, 62, 72, 56, 53,
47, 57, 58, 49, 63, 63, 50, 68, 71, 46, 55, 59, 62, 59, 56, 54,
69, 67, 62, 66, 60, 65, 70, 41, 51, 48, 63, 61, 60, 59, 74, 76,
81, 84, 86, 84, 57, 58, 30, 38, 48, 48, 61, 60, 43, 70, 70, 74,
74, 81)), row.names = c(NA, -293L), class = "data.frame")
define model and loop parameters
model_formula <- formula(y ~ int + (capac - int) * (1 - exp(-x * grow)))
model_starting_params <- c(int=10.876, capac=75.119, grow=0.402)
num_tries <- 100
rng_seed <- 10
initialize vector in which to record convergence success/failure
attempt_successes <- rep(NA, num_tries)
options(warn=2) # promote warnings to errors
for(next_attempt_num in 1:num_tries) {
print(paste0(next_attempt_num, " / ", num_tries, "..."))
set.seed(rng_seed) # set random seed
model_success <- 0
tryCatch( # attempt to fit model
{
nlme_model <- nlme(
model_formula,
data = reprex_data,
fixed = list(int + capac + grow ~ 1),
random = int + capac ~ 1,
groups = ~ id,
start = model_starting_params,
control = nlmeControl(maxIter=200, msMaxIter=200)
)
model_success <- 1
}, error=function(cond) {} # catch error and keep going
)
attempt_successes[next_attempt_num] <- model_success # record success/failure
}
options(warn=0) # revert to normal warning behavior
mean(attempt_successes) # if all values are the same, will equal either 0 or 1
In case it's useful, here are my R/package versions:
> R version 4.1.2 (2021-11-01) Platform: x86_64-apple-darwin17.0
> (64-bit) Running under: macOS Big Sur 11.6
>
> Matrix products: default LAPACK:
> /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib
>
> locale: [1]
> en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>
> attached base packages: [1] stats graphics grDevices utils
> datasets methods base
>
> other attached packages: [1] nlme_3.1-162
>
> loaded via a namespace (and not attached): [1] compiler_4.1.2
> tools_4.1.2 grid_4.1.2 lattice_0.20-45
My question is: Is it possible to ensure that fitting the same nlme() model multiple times will yield identical results? If so, how?
</details>
# 答案1
**得分**: 0
根据 [@IRTFM][1] 的建议,我将 R 更新到最新版本(4.2.2),问题得以解决。nlme() 模型现在在使用特定随机种子时,要么始终收敛,要么始终不收敛,并且在不同种子下也更加一致。这是一个不错的提醒,要保持更新!
[1]: https://stackoverflow.com/users/1855677/irtfm
<details>
<summary>英文:</summary>
As per [@IRTFM][1]'s suggestion, I updated R to the most current version (4.2.2), which solved the problem. nlme() models now consistently converge, or consistently fail to converge, with a given random seed (and are also more consistent across seeds). A good reminder to stay updated!
[1]: https://stackoverflow.com/users/1855677/irtfm
</details>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论