英文:
Error in anova(): "Error in getResponseFormula(el) : 'form' must be a two-sided formula"
问题
我有一组植物生长的纵向数据,记录在不同的季节。我使用nlme
包的nlme
函数将数据拟合到具有/不具有季节效应的生长模型中。
**为了查看估计参数(k)的季节效应,我使用anova
函数比较了这两个模型,然而,我得到了"Error in getResponseFormula(el) : 'form' must be a two-sided formula"的错误,陷入了这个问题中。
有人可以帮我解决这个问题吗?
以下是可重现的示例。
数据
library(tidyverse)
# 生成虚拟数据
n = 100
id = as.factor(1:n)
season = c("s1","s2","s3","s4")
L0 = rnorm(n,40,5)
td = rnorm(n, 180, 10)
growth <- function(td, Lt_1){
return(function(k,p){
150*((1 + ((150/Lt_1)^(1/p)-1)*exp(-k*td/365))^(-p))
})
}
L1 = growth(Lt_1 = L0, td = td)(p = 1.2, k = 0.55)
L2 = growth(Lt_1 = L1, td = td)(p = 1.2, k = 1.09)
L3 = growth(Lt_1 = L2, td = td)(p = 1.2, k = 0.62)
L4 = growth(Lt_1 = L3, td = td)(p = 1.2, k = 0.97)
df <- data.frame(id = id, L0 = L0, L1 = L1, L2 = L2, L3 = L3, L4 = L4) %>%
pivot_longer(cols = L0:L4, names_to = "Ltype", values_to = "Lobs") %>%
mutate(Lobs = round(Lobs)) %>%
group_by(id) %>%
mutate(Lt_1 = lag(Lobs)) %>%
filter(!is.na(Lt_1)) %>%
mutate(td = round(rnorm(4, 180, 10))) %>%
mutate(season = factor(season)) %>%
ungroup() %>%
mutate(season2 = case_when(
season == "s3" ~ "s1",
season == "s4" ~ "s2",
TRUE ~ season
)) %>%
mutate(season2 = factor(season2)) %>%
mutate(log_Lobs = log(Lobs))
> head(df)
# A tibble: 6 x 8
id Ltype Lobs Lt_1 td season season2 log_Lobs
<fct> <chr> <dbl> <dbl> <dbl> <fct> <fct> <dbl>
1 1 L1 57 48 178 s1 s1 4.04
2 1 L2 76 57 184 s2 s2 4.33
3 1 L3 87 76 188 s3 s1 4.47
4 1 L4 103 87 194 s4 s2 4.63
5 2 L1 43 35 175 s1 s1 3.76
6 2 L2 63 43 176 s2 s2 4.14
拟合模型
library(nlme)
## 没有季节效应的模型
formula = Lobs ~ 150*((1 + ((150/Lt_1)^(1/p)-1)*exp(-k*td/365))^(-p))
p = 1.2
k = 0.4
fit_null <- nlme(formula,
fixed = c(p ~ 1, k ~ 1),
random = p ~ 1 | id,
data = df,
start = list(fixed = c(p, k)),
na.action = na.exclude,
control=list(maxIter=1e6, msMaxIter = 1e6, msVerbose = TRUE))
## 带季节效应的模型
p = 1.2
k = c(0.4, 1.09)
fit_withSeason <- nlme(formula,
fixed = c(p ~ 1, k ~ 1 + season2),
random = k ~ 1 | id,
data = df,
start = list(fixed = c(p, k)),
na.action = na.exclude,
control=list(maxIter=1e6, msMaxIter = 1e6, msVerbose = TRUE))
两个模型的摘要
> summary(fit_null)
Nonlinear mixed-effects model fit by maximum likelihood
Model: formula
Data: df
AIC BIC logLik
2312.975 2328.941 -1152.488
Random effects:
Formula: p ~ 1 | id
p Residual
StdDev: 8.300513e-05 4.315789
Fixed effects: c(p ~ 1, k ~ 1)
Value Std.Error DF t-value p-value
p 0.7335954 0.09075133 299 8.083577 0
k 0.9823510 0.05594228 299 17.560080 0
Correlation:
p
k -0.969
Standardized Within-Group Residuals:
Min Q1 Med Q3 Max
-1.4898414 -0.8966244 -0.2154542 0.8401292 2.2148509
Number of Observations: 400
Number of Groups: 100
> summary(fit_withSeason)
Nonlinear mixed-effects model fit by maximum likelihood
Model: formula
Data: df
AIC BIC logLik
1466.922 1486.879 -728.4608
Random effects:
Formula: k ~ 1 | id
k.(Intercept) Residual
StdDev: 0.03294767 1.37547
Fixed effects: c(p ~ 1, k ~ 1 + season2)
Value Std.Error DF t-value p-value
p 1.9509211 0.16984927 298 11.48619 0
k.(Intercept) 0.5212180 0.
<details>
<summary>英文:</summary>
I have a longitudinal dataset for plant growth recorded in different seasons. I fitted the data to growth models with/without seasonal effect using `nlme` function from `nlme` package.
**To see the seasonal effect on the estimated parameter (k), I compared the two models using `anova` function, however, I got "Error in getResponseFormula(el) : 'form' must be a two-sided formula"** and stuck in this problem.
Can anyone help me solve this problem?
Here is the reproducible example.
### data
```r
library(tidyverse)
# generate dummy data
n = 100
id = as.factor(1:n)
season = c("s1","s2","s3","s4")
L0 = rnorm(n,40,5)
td = rnorm(n, 180, 10)
growth <- function(td, Lt_1){
return(function(k,p){
150*((1 + ((150/Lt_1)^(1/p)-1)*exp(-k*td/365))^(-p))
})
}
L1 = growth(Lt_1 = L0, td = td)(p = 1.2, k = 0.55)
L2 = growth(Lt_1 = L1, td = td)(p = 1.2, k = 1.09)
L3 = growth(Lt_1 = L2, td = td)(p = 1.2, k = 0.62)
L4 = growth(Lt_1 = L3, td = td)(p = 1.2, k = 0.97)
df <- data.frame(id = id, L0 = L0, L1 = L1, L2 = L2, L3 = L3, L4 = L4) %>%
pivot_longer(cols = L0:L4, names_to = "Ltype", values_to = "Lobs") %>%
mutate(Lobs = round(Lobs)) %>%
group_by(id) %>%
mutate(Lt_1 = lag(Lobs)) %>%
filter(!is.na(Lt_1)) %>%
mutate(td = round(rnorm(4, 180, 10))) %>%
mutate(season = factor(season)) %>%
ungroup() %>%
mutate(season2 = case_when(
season == "s3" ~ "s1",
season == "s4" ~ "s2",
TRUE ~ season
)) %>%
mutate(season2 = factor(season2)) %>%
mutate(log_Lobs = log(Lobs))
> head(df)
# A tibble: 6 x 8
id Ltype Lobs Lt_1 td season season2 log_Lobs
<fct> <chr> <dbl> <dbl> <dbl> <fct> <fct> <dbl>
1 1 L1 57 48 178 s1 s1 4.04
2 1 L2 76 57 184 s2 s2 4.33
3 1 L3 87 76 188 s3 s1 4.47
4 1 L4 103 87 194 s4 s2 4.63
5 2 L1 43 35 175 s1 s1 3.76
6 2 L2 63 43 176 s2 s2 4.14
fitting
library(nlme)
## model without seasonal effect
formula = Lobs ~ 150*((1 + ((150/Lt_1)^(1/p)-1)*exp(-k*td/365))^(-p))
p = 1.2
k = 0.4
fit_null <- nlme(formula,
fixed = c(p ~ 1, k ~ 1),
random = p ~ 1 | id,
data = df,
start = list(fixed = c(p, k)),
na.action = na.exclude,
control=list(maxIter=1e6, msMaxIter = 1e6, msVerbose = TRUE))
## model with seasonal effect
p = 1.2
k = c(0.4, 1.09)
fit_withSeason <- nlme(formula,
fixed = c(p ~ 1, k ~ 1 + season2),
random = k ~ 1 | id,
data = df,
start = list(fixed = c(p, k)),
na.action = na.exclude,
control=list(maxIter=1e6, msMaxIter = 1e6, msVerbose = TRUE))
summary of the two models
> summary(fit_null)
Nonlinear mixed-effects model fit by maximum likelihood
Model: formula
Data: df
AIC BIC logLik
2312.975 2328.941 -1152.488
Random effects:
Formula: p ~ 1 | id
p Residual
StdDev: 8.300513e-05 4.315789
Fixed effects: c(p ~ 1, k ~ 1)
Value Std.Error DF t-value p-value
p 0.7335954 0.09075133 299 8.083577 0
k 0.9823510 0.05594228 299 17.560080 0
Correlation:
p
k -0.969
Standardized Within-Group Residuals:
Min Q1 Med Q3 Max
-1.4898414 -0.8966244 -0.2154542 0.8401292 2.2148509
Number of Observations: 400
Number of Groups: 100
> summary(fit_withSeason)
Nonlinear mixed-effects model fit by maximum likelihood
Model: formula
Data: df
AIC BIC logLik
1466.922 1486.879 -728.4608
Random effects:
Formula: k ~ 1 | id
k.(Intercept) Residual
StdDev: 0.03294767 1.37547
Fixed effects: c(p ~ 1, k ~ 1 + season2)
Value Std.Error DF t-value p-value
p 1.9509211 0.16984927 298 11.48619 0
k.(Intercept) 0.5212180 0.01156849 298 45.05498 0
k.season2s2 0.4151107 0.00828721 298 50.09055 0
Correlation:
p k.(In)
k.(Intercept) -0.868
k.season2s2 -0.572 0.265
Standardized Within-Group Residuals:
Min Q1 Med Q3 Max
-2.341255091 -0.727512901 0.003841993 0.622830078 3.190926108
Number of Observations: 400
Number of Groups: 100
Error
# model comparison to test the seasonal effect
> anova(fit_null, fit_withSeason)
Error in getResponseFormula(el) : 'form' must be a two-sided formula
答案1
得分: 1
这可能是nlme
中的一个错误,但是一个棘手的问题。当你将公式传递为一个符号 称为"formula"(即"formula",而不是"Lobs ~
如果你将公式的变量名从formula
更改为(例如)form1
,我认为一切都应该正常工作(另一个不使用内置对象的名称作为变量名的原因!)
更一般地说(对于涉及存储为符号的对象的不正确评估的其他问题),你可以通过使用do.call()
来解决这个问题,这会欺骗R在存储调用之前评估公式:
fit_null <- do.call(nlme,
list(formula,
fixed = c(p ~ 1, k ~ 1),
random = p ~ 1 | id,
data = df,
start = list(fixed = c(p, k)),
na.action = na.exclude,
control=list(maxIter=1e6, msMaxIter = 1e6, msVerbose = TRUE)
))
## 具有季节效应的模型
p = 1.2
k = c(0.4, 1.09)
fit_withSeason <- do.call(nlme,
list(formula,
fixed = c(p ~ 1, k ~ 1 + season2),
random = k ~ 1 | id,
data = df,
start = list(fixed = c(p, k)),
na.action = na.exclude,
control=list(maxIter=1e6, msMaxIter = 1e6, msVerbose = TRUE)))
anova(fit_null, fit_withSeason)
Model df AIC BIC logLik Test L.Ratio p-value
fit_null 1 4 2317.913 2333.879 -1154.9565
fit_withSeason 2 5 1540.088 1560.045 -765.0438 1 vs 2 779.8253 <.0001
我已将此提交为R bug
英文:
This is arguably a bug in nlme
, but a tricky one. When you pass the formula as a symbol called "formula" (i.e. "formula", rather than "Lobs ~ <stuff>"), it fails to extract the formula from the model call properly.
If you change the variable name of your formula from formula
to (e.g.) form1
, I think everything should work fine (another reason not to use names of built-in objects as variable names!)
More generally (for other issues involving improper evaluation of objects stored as symbols) you can work around this by using do.call()
, which tricks R into evaluating the formula before storing the call:
fit_null <- do.call(nlme,
list(formula,
fixed = c(p ~ 1, k ~ 1),
random = p ~ 1 | id,
data = df,
start = list(fixed = c(p, k)),
na.action = na.exclude,
control=list(maxIter=1e6, msMaxIter = 1e6, msVerbose = TRUE)
))
## model with seasonal effect
p = 1.2
k = c(0.4, 1.09)
fit_withSeason <- do.call(nlme,
list(formula,
fixed = c(p ~ 1, k ~ 1 + season2),
random = k ~ 1 | id,
data = df,
start = list(fixed = c(p, k)),
na.action = na.exclude,
control=list(maxIter=1e6, msMaxIter = 1e6, msVerbose = TRUE)))
anova(fit_null, fit_withSeason)
Model df AIC BIC logLik Test L.Ratio p-value
fit_null 1 4 2317.913 2333.879 -1154.9565
fit_withSeason 2 5 1540.088 1560.045 -765.0438 1 vs 2 779.8253 <.0001
I've submitted this as an R bug
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论