英文:
Why is the nls function returning such different values, for the same model, with similar datasets?
问题
我有相同鱼种的两组年龄和长度数据,都在以下链接中提供。
我想用R拟合生长模型,允许在寿命的特定时刻发生生长变化。
我尝试使用nls函数,并提供适应我的数据的起始值。该模型是Von Bertalanffy生长模型的一种改编,应返回五个不同参数(Linf、k0、t0、k1和t1)的值。
我使用的代码,对于两个数据集,如下:
fit <- as.formula(TL ~ Linf * (1 - exp(-K0 * (Age - t0))) * (Age < t1) +
Linf * (1 - exp(-K0 * (t1 - t0) - K1 * (Age - t1))) * (Age > t1))
model <- nls(fit, data = dataset, start = list(Linf = 17, K0 = 0.3, t0 = -2, K1 = 0.1, t1 = 3), nls.control(maxiter = 500, tol = 1e-03, minFactor = 1/1024, printEval = FALSE, warnOnly = FALSE))
summary(model)
对于第一个数据集,返回的值如下:
Formula: TL ~ Linf * (1 - exp(-K0 * (Age - t0))) * (Age < t1) + Linf *
(1 - exp(-K0 * (t1 - t0) - K1 * (Age - t1))) * (Age > t1)
Parameters:
Estimate Std. Error t value Pr(>|t|)
Linf 4.089e+02 1.565e+04 0.026 0.9792
K0 5.477e-03 2.141e-01 0.026 0.9796
t0 -2.934e+00 1.500e+00 -1.956 0.0511 .
K1 7.596e-04 3.004e-02 0.025 0.9798
t1 2.246e+00 2.143e-01 10.477 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.881 on 457 degrees of freedom
Number of iterations to convergence: 294
Achieved convergence tolerance: 0.000979
而对于第二个数据集,返回的值为:
Formula: TL ~ Linf * (1 - exp(-K0 * (Age - t0))) * (Age < t1) + Linf *
(1 - exp(-K0 * (t1 - t0) - K1 * (Age - t1))) * (Age > t1)
Parameters:
Estimate Std. Error t value Pr(>|t|)
Linf 15.04002 0.60919 24.689 < 2e-16 ***
K0 0.16740 0.01895 8.833 < 2e-16 ***
t0 -3.67353 0.34427 -10.671 < 2e-16 ***
K1 0.11986 0.02007 5.971 2.63e-09 ***
t1 2.29970 0.31711 7.252 5.18e-13 ***
---
为什么nls函数在使用相同模型、相同起始值和非常相似的数据集时返回如此不同的参数值?
英文:
I have two sets of age and length data for the same fish species, both provided in the following link.
And I would like to a fit growth model, using R, that allows for a change in the growth at a specific moment of the lifespan.
I tried using the nls function and provided starting values adapted to my data. The model is an adaptation of the Von Bertalanffy growth model that is supposed to return values for five different parameters (Linf, k0, t0, k1, and t1).
The code I used, for both datasets, was the folowwing:
fit <-as.formula(TL~ Linf * (1 - exp(-K0 * (Age - t0))) * (Age < t1) +
Linf * (1 - exp(-K0 * (t1 - t0) - K1 * (Age - t1))) * (Age > t1))
model<-nls(fit, data=dataset, start=list(Linf=17, K0=0.3, t0=-2, K1=0.1, t1=3), nls.control(maxiter = 500, tol = 1e-03, minFactor = 1/1024, printEval = FALSE, warnOnly = FALSE))
summary(model)
For the first dataset the values returned were the following:
Formula: TL ~ Linf * (1 - exp(-K0 * (Age - t0))) * (Age < t1) + Linf *
(1 - exp(-K0 * (t1 - t0) - K1 * (Age - t1))) * (Age > t1)
Parameters:
Estimate Std. Error t value Pr(>|t|)
Linf 4.089e+02 1.565e+04 0.026 0.9792
K0 5.477e-03 2.141e-01 0.026 0.9796
t0 -2.934e+00 1.500e+00 -1.956 0.0511 .
K1 7.596e-04 3.004e-02 0.025 0.9798
t1 2.246e+00 2.143e-01 10.477 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.881 on 457 degrees of freedom
Number of iterations to convergence: 294
Achieved convergence tolerance: 0.000979
While for the second dataset, the values returned were:
Formula: TL ~ Linf * (1 - exp(-K0 * (Age - t0))) * (Age < t1) + Linf *
(1 - exp(-K0 * (t1 - t0) - K1 * (Age - t1))) * (Age > t1)
Parameters:
Estimate Std. Error t value Pr(>|t|)
Linf 15.04002 0.60919 24.689 < 2e-16 ***
K0 0.16740 0.01895 8.833 < 2e-16 ***
t0 -3.67353 0.34427 -10.671 < 2e-16 ***
K1 0.11986 0.02007 5.971 2.63e-09 ***
t1 2.29970 0.31711 7.252 5.18e-13 ***
---
Only the values returned for the second dataset make sense for the species in question.
Why is the nls function returning such different parameter values, while using the same model, same starting values and very similar datasets?
答案1
得分: 2
我不认为拟合本身有什么问题——它们都看起来是对给定数据的合理拟合。问题似乎是在第一个集合中,在相对较少的数据点的年龄段,存在梯度明显变化的情况。
以下是第一个数据集的图表:
library(ggplot2)
fit <- as.formula(y ~ Linf * (1 - exp(-K0 * (x - t0))) * (x < t1) +
Linf * (1 - exp(-K0 * (t1 - t0) - K1 * (x - t1))) * (x > t1))
ggplot(dataset, aes(Age, TL)) +
geom_point() +
geom_smooth(method = nls, formula = fit, method.args = list(
start = list(Linf=17, K0=0.3, t0=-2, K1=0.1, t1=3),
control = list(maxiter = 10000, minFactor = 1e-9, tol = 1e-3)),
se = FALSE, linetype = 2
)
但第二个数据集的数据和图表形状都相当不同:
ggplot(dataset2, aes(Age, TL)) +
geom_point() +
geom_smooth(method = nls, formula = fit, method.args = list(
start = list(Linf=17, K0=0.3, t0=-2, K1=0.1, t1=3),
control = list(maxiter = 10000, minFactor = 1e-9, tol = 1e-3)),
se = FALSE, linetype = 2
)
因此,问题仅仅在于您假设这两个数据集是_相似_的。它们在拟合此模型方面实际上并不相似。例如,第一个数据集在4岁以下只有52名个体(11%),而第二个数据集有1279名(42%)。两个样本的年龄分布显然存在很大差异。请注意,使用rbind
组合两个数据框将得到一个与仅对dataset2获得的值相似的大模型。
英文:
I don't think there's anything wrong with the fits per se - they both look like reasonable fits to the given data. The problem appears to be that in the first set there is an apparent change in gradient that occurs around an age where there are relatively few data points.
Here's the plot for the first data set:
library(ggplot2)
fit <-as.formula(y~ Linf * (1 - exp(-K0 * (x - t0))) * (x < t1) +
Linf * (1 - exp(-K0 * (t1 - t0) - K1 * (x - t1))) * (x > t1))
ggplot(dataset, aes(Age, TL)) +
geom_point() +
geom_smooth(method = nls, formula = fit, method.args = list(
start = list(Linf=17, K0=0.3, t0=-2, K1=0.1, t1=3),
control = list(maxiter = 10000, minFactor = 1e-9, tol = 1e-3)),
se = FALSE, linetype = 2
)
But the data, and the shape of the plot, is quite different for the second data set:
ggplot(dataset2, aes(Age, TL)) +
geom_point() +
geom_smooth(method = nls, formula = fit, method.args = list(
start = list(Linf=17, K0=0.3, t0=-2, K1=0.1, t1=3),
control = list(maxiter = 10000, minFactor = 1e-9, tol = 1e-3)),
se = FALSE, linetype = 2
)
So the problem simply lies in your assumption that both data sets are similar. They are not very similar at all, at least in terms of fitting this model. For example, the first data set only has 52 individuals (11%) under the age of 4, but the second data set has 1279 (42%). There is clearly a big difference in the age distribution of the two samples. Note that combining the two data frames using rbind
gives one big model that is similar to the values obtained for dataset2 alone.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论