2023年3月9日 22:16:10go评论78阅读模式

英文:

Why is the nls function returning such different values, for the same model, with similar datasets?

问题

我有相同鱼种的两组年龄和长度数据，都在以下链接中提供。

我想用R拟合生长模型，允许在寿命的特定时刻发生生长变化。

我尝试使用nls函数，并提供适应我的数据的起始值。该模型是Von Bertalanffy生长模型的一种改编，应返回五个不同参数（Linf、k0、t0、k1和t1）的值。

我使用的代码，对于两个数据集，如下：

fit <- as.formula(TL ~ Linf * (1 - exp(-K0 * (Age - t0))) * (Age < t1) +
                   Linf * (1 - exp(-K0 * (t1 - t0) - K1 * (Age - t1))) * (Age > t1))

model <- nls(fit, data = dataset, start = list(Linf = 17, K0 = 0.3, t0 = -2, K1 = 0.1, t1 = 3), nls.control(maxiter = 500, tol = 1e-03, minFactor = 1/1024, printEval = FALSE, warnOnly = FALSE))
summary(model)

对于第一个数据集，返回的值如下：

Formula: TL ~ Linf * (1 - exp(-K0 * (Age - t0))) * (Age < t1) + Linf * 
    (1 - exp(-K0 * (t1 - t0) - K1 * (Age - t1))) * (Age > t1)

Parameters:
       Estimate Std. Error t value Pr(>|t|)    
Linf  4.089e+02  1.565e+04   0.026   0.9792    
K0    5.477e-03  2.141e-01   0.026   0.9796    
t0   -2.934e+00  1.500e+00  -1.956   0.0511 .  
K1    7.596e-04  3.004e-02   0.025   0.9798    
t1    2.246e+00  2.143e-01  10.477   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.881 on 457 degrees of freedom

Number of iterations to convergence: 294 
Achieved convergence tolerance: 0.000979

而对于第二个数据集，返回的值为：

Formula: TL ~ Linf * (1 - exp(-K0 * (Age - t0))) * (Age < t1) + Linf * 
    (1 - exp(-K0 * (t1 - t0) - K1 * (Age - t1))) * (Age > t1)

Parameters:
     Estimate Std. Error t value Pr(>|t|)    
Linf 15.04002    0.60919  24.689  < 2e-16 ***
K0    0.16740    0.01895   8.833  < 2e-16 ***
t0   -3.67353    0.34427 -10.671  < 2e-16 ***
K1    0.11986    0.02007   5.971 2.63e-09 ***
t1    2.29970    0.31711   7.252 5.18e-13 ***
---

为什么nls函数在使用相同模型、相同起始值和非常相似的数据集时返回如此不同的参数值？

英文:

I have two sets of age and length data for the same fish species, both provided in the following link.

And I would like to a fit growth model, using R, that allows for a change in the growth at a specific moment of the lifespan.

I tried using the nls function and provided starting values adapted to my data. The model is an adaptation of the Von Bertalanffy growth model that is supposed to return values for five different parameters (Linf, k0, t0, k1, and t1).

The code I used, for both datasets, was the folowwing:

fit &lt;-as.formula(TL~ Linf * (1 - exp(-K0 * (Age - t0))) * (Age &lt; t1) +
                   Linf * (1 - exp(-K0 * (t1 - t0) - K1 * (Age - t1))) * (Age &gt; t1))

model&lt;-nls(fit, data=dataset, start=list(Linf=17, K0=0.3, t0=-2, K1=0.1, t1=3), nls.control(maxiter = 500, tol = 1e-03, minFactor = 1/1024, printEval = FALSE, warnOnly = FALSE))
summary(model)

For the first dataset the values returned were the following:

Formula: TL ~ Linf * (1 - exp(-K0 * (Age - t0))) * (Age &lt; t1) + Linf * 
    (1 - exp(-K0 * (t1 - t0) - K1 * (Age - t1))) * (Age &gt; t1)

Parameters:
       Estimate Std. Error t value Pr(&gt;|t|)    
Linf  4.089e+02  1.565e+04   0.026   0.9792    
K0    5.477e-03  2.141e-01   0.026   0.9796    
t0   -2.934e+00  1.500e+00  -1.956   0.0511 .  
K1    7.596e-04  3.004e-02   0.025   0.9798    
t1    2.246e+00  2.143e-01  10.477   &lt;2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.881 on 457 degrees of freedom

Number of iterations to convergence: 294 
Achieved convergence tolerance: 0.000979

While for the second dataset, the values returned were:

Formula: TL ~ Linf * (1 - exp(-K0 * (Age - t0))) * (Age &lt; t1) + Linf * 
    (1 - exp(-K0 * (t1 - t0) - K1 * (Age - t1))) * (Age &gt; t1)

Parameters:
     Estimate Std. Error t value Pr(&gt;|t|)    
Linf 15.04002    0.60919  24.689  &lt; 2e-16 ***
K0    0.16740    0.01895   8.833  &lt; 2e-16 ***
t0   -3.67353    0.34427 -10.671  &lt; 2e-16 ***
K1    0.11986    0.02007   5.971 2.63e-09 ***
t1    2.29970    0.31711   7.252 5.18e-13 ***
---

Only the values returned for the second dataset make sense for the species in question.

Why is the nls function returning such different parameter values, while using the same model, same starting values and very similar datasets?

答案1

得分: 2

我不认为拟合本身有什么问题——它们都看起来是对给定数据的合理拟合。问题似乎是在第一个集合中，在相对较少的数据点的年龄段，存在梯度明显变化的情况。

以下是第一个数据集的图表：

library(ggplot2)

fit <- as.formula(y ~ Linf * (1 - exp(-K0 * (x - t0))) * (x < t1) +
                   Linf * (1 - exp(-K0 * (t1 - t0) - K1 * (x - t1))) * (x > t1))

ggplot(dataset, aes(Age, TL)) +
  geom_point() +
  geom_smooth(method = nls, formula = fit, method.args = list(
    start = list(Linf=17, K0=0.3, t0=-2, K1=0.1, t1=3), 
    control = list(maxiter = 10000, minFactor = 1e-9, tol = 1e-3)),
    se = FALSE, linetype = 2
  )

但第二个数据集的数据和图表形状都相当不同：

ggplot(dataset2, aes(Age, TL)) +
  geom_point() +
  geom_smooth(method = nls, formula = fit, method.args = list(
    start = list(Linf=17, K0=0.3, t0=-2, K1=0.1, t1=3), 
    control = list(maxiter = 10000, minFactor = 1e-9, tol = 1e-3)),
    se = FALSE, linetype = 2
  )

因此，问题仅仅在于您假设这两个数据集是_相似_的。它们在拟合此模型方面实际上并不相似。例如，第一个数据集在4岁以下只有52名个体（11%），而第二个数据集有1279名（42%）。两个样本的年龄分布显然存在很大差异。请注意，使用rbind组合两个数据框将得到一个与仅对dataset2获得的值相似的大模型。

英文:

I don't think there's anything wrong with the fits per se - they both look like reasonable fits to the given data. The problem appears to be that in the first set there is an apparent change in gradient that occurs around an age where there are relatively few data points.

Here's the plot for the first data set:

library(ggplot2)

fit &lt;-as.formula(y~ Linf * (1 - exp(-K0 * (x - t0))) * (x &lt; t1) +
                   Linf * (1 - exp(-K0 * (t1 - t0) - K1 * (x - t1))) * (x &gt; t1))

ggplot(dataset, aes(Age, TL)) +
  geom_point() +
  geom_smooth(method = nls, formula = fit, method.args = list(
    start = list(Linf=17, K0=0.3, t0=-2, K1=0.1, t1=3), 
    control = list(maxiter = 10000, minFactor = 1e-9, tol = 1e-3)),
    se = FALSE, linetype = 2
  )

But the data, and the shape of the plot, is quite different for the second data set:

ggplot(dataset2, aes(Age, TL)) +
  geom_point() +
  geom_smooth(method = nls, formula = fit, method.args = list(
    start = list(Linf=17, K0=0.3, t0=-2, K1=0.1, t1=3), 
    control = list(maxiter = 10000, minFactor = 1e-9, tol = 1e-3)),
    se = FALSE, linetype = 2
  )

So the problem simply lies in your assumption that both data sets are similar. They are not very similar at all, at least in terms of fitting this model. For example, the first data set only has 52 individuals (11%) under the age of 4, but the second data set has 1279 (42%). There is clearly a big difference in the age distribution of the two samples. Note that combining the two data frames using rbind gives one big model that is similar to the values obtained for dataset2 alone.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

为什么 nls 函数对于相同的模型和类似的数据集返回如此不同的值？

问题

答案1

拟合GEV分布：数据集和结果

使用forestplot下划线标题

在图例中在“bquote”内添加一个字符

在迭代字符向量时跳过某些元素时出错。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论