Error in anova(): "Error in getResponseFormula(el) : 'form' must be a two-sided formula"

huangapple go评论75阅读模式
英文:

Error in anova(): "Error in getResponseFormula(el) : 'form' must be a two-sided formula"

问题

我有一组植物生长的纵向数据,记录在不同的季节。我使用nlme包的nlme函数将数据拟合到具有/不具有季节效应的生长模型中。
**为了查看估计参数(k)的季节效应,我使用anova函数比较了这两个模型,然而,我得到了"Error in getResponseFormula(el) : 'form' must be a two-sided formula"的错误,陷入了这个问题中。
有人可以帮我解决这个问题吗?

以下是可重现的示例。

数据

library(tidyverse)
# 生成虚拟数据
n = 100
id = as.factor(1:n)
season = c("s1","s2","s3","s4")
L0 = rnorm(n,40,5)
td = rnorm(n, 180, 10)

growth <- function(td, Lt_1){
  return(function(k,p){
    150*((1 + ((150/Lt_1)^(1/p)-1)*exp(-k*td/365))^(-p))
  })
}
L1 = growth(Lt_1 = L0, td = td)(p = 1.2, k = 0.55)
L2 = growth(Lt_1 = L1, td = td)(p = 1.2, k = 1.09)
L3 = growth(Lt_1 = L2, td = td)(p = 1.2, k = 0.62)
L4 = growth(Lt_1 = L3, td = td)(p = 1.2, k = 0.97)

df <- data.frame(id = id, L0 = L0, L1 = L1, L2 = L2, L3 = L3, L4 = L4) %>%
  pivot_longer(cols = L0:L4, names_to = "Ltype", values_to = "Lobs") %>%
  mutate(Lobs = round(Lobs)) %>%
  group_by(id) %>%
  mutate(Lt_1 = lag(Lobs)) %>%
  filter(!is.na(Lt_1)) %>%
  mutate(td = round(rnorm(4, 180, 10))) %>%
  mutate(season = factor(season)) %>%
  ungroup() %>%
  mutate(season2 = case_when(
    season == "s3" ~ "s1",
    season == "s4" ~ "s2",
    TRUE ~ season
  )) %>%
  mutate(season2 = factor(season2)) %>%
  mutate(log_Lobs = log(Lobs))

> head(df)
# A tibble: 6 x 8
  id    Ltype  Lobs  Lt_1    td season season2 log_Lobs
  <fct> <chr> <dbl> <dbl> <dbl> <fct>  <fct>      <dbl>
1 1     L1       57    48   178 s1     s1          4.04
2 1     L2       76    57   184 s2     s2          4.33
3 1     L3       87    76   188 s3     s1          4.47
4 1     L4      103    87   194 s4     s2          4.63
5 2     L1       43    35   175 s1     s1          3.76
6 2     L2       63    43   176 s2     s2          4.14

拟合模型

library(nlme)
## 没有季节效应的模型
formula = Lobs ~ 150*((1 + ((150/Lt_1)^(1/p)-1)*exp(-k*td/365))^(-p))
p = 1.2
k = 0.4

fit_null <- nlme(formula,
              fixed = c(p ~ 1, k ~ 1),
              random = p ~ 1 | id,
              data = df,
              start = list(fixed = c(p, k)),
              na.action = na.exclude,
              control=list(maxIter=1e6, msMaxIter = 1e6, msVerbose = TRUE))

## 带季节效应的模型
p = 1.2
k = c(0.4, 1.09)

fit_withSeason <- nlme(formula,
                fixed = c(p ~ 1, k ~ 1 + season2),
                random = k ~ 1 | id,
                data = df,
                start = list(fixed = c(p, k)),
                na.action = na.exclude,
                control=list(maxIter=1e6, msMaxIter = 1e6, msVerbose = TRUE))

两个模型的摘要

> summary(fit_null)
Nonlinear mixed-effects model fit by maximum likelihood
  Model: formula 
  Data: df 
       AIC      BIC    logLik
  2312.975 2328.941 -1152.488

Random effects:
 Formula: p ~ 1 | id
                   p Residual
StdDev: 8.300513e-05 4.315789

Fixed effects:  c(p ~ 1, k ~ 1) 
      Value  Std.Error  DF   t-value p-value
p 0.7335954 0.09075133 299  8.083577       0
k 0.9823510 0.05594228 299 17.560080       0
 Correlation: 
  p     
k -0.969

Standardized Within-Group Residuals:
       Min         Q1        Med         Q3        Max 
-1.4898414 -0.8966244 -0.2154542  0.8401292  2.2148509 

Number of Observations: 400
Number of Groups: 100 

> summary(fit_withSeason)
Nonlinear mixed-effects model fit by maximum likelihood
  Model: formula 
  Data: df 
       AIC      BIC    logLik
  1466.922 1486.879 -728.4608

Random effects:
 Formula: k ~ 1 | id
        k.(Intercept) Residual
StdDev:    0.03294767  1.37547

Fixed effects:  c(p ~ 1, k ~ 1 + season2) 
                  Value  Std.Error  DF  t-value p-value
p             1.9509211 0.16984927 298 11.48619       0
k.(Intercept) 0.5212180 0.

<details>
<summary>英文:</summary>

I have a longitudinal dataset for plant growth recorded in different seasons. I fitted the data to growth models with/without seasonal effect using `nlme` function from `nlme` package.
**To see the seasonal effect on the estimated parameter (k), I compared the two models using `anova` function, however, I got &quot;Error in getResponseFormula(el) : &#39;form&#39; must be a two-sided formula&quot;** and stuck in this problem.
Can anyone help me solve this problem?

Here is the reproducible example.

### data
```r
library(tidyverse)
# generate dummy data
n = 100
id = as.factor(1:n)
season = c(&quot;s1&quot;,&quot;s2&quot;,&quot;s3&quot;,&quot;s4&quot;)
L0 = rnorm(n,40,5)
td = rnorm(n, 180, 10)

growth &lt;- function(td, Lt_1){
  return(function(k,p){
    150*((1 + ((150/Lt_1)^(1/p)-1)*exp(-k*td/365))^(-p))
  })
}
L1 = growth(Lt_1 = L0, td = td)(p = 1.2, k = 0.55)
L2 = growth(Lt_1 = L1, td = td)(p = 1.2, k = 1.09)
L3 = growth(Lt_1 = L2, td = td)(p = 1.2, k = 0.62)
L4 = growth(Lt_1 = L3, td = td)(p = 1.2, k = 0.97)

df &lt;- data.frame(id = id, L0 = L0, L1 = L1, L2 = L2, L3 = L3, L4 = L4) %&gt;% 
  pivot_longer(cols = L0:L4, names_to = &quot;Ltype&quot;, values_to = &quot;Lobs&quot;) %&gt;% 
  mutate(Lobs = round(Lobs)) %&gt;% 
  group_by(id) %&gt;% 
  mutate(Lt_1 = lag(Lobs)) %&gt;% 
  filter(!is.na(Lt_1)) %&gt;% 
  mutate(td = round(rnorm(4, 180, 10))) %&gt;% 
  mutate(season = factor(season)) %&gt;% 
  ungroup() %&gt;% 
  mutate(season2 = case_when(
    season == &quot;s3&quot; ~ &quot;s1&quot;,
    season == &quot;s4&quot; ~ &quot;s2&quot;,
    TRUE ~ season
  )) %&gt;% 
  mutate(season2 = factor(season2)) %&gt;% 
  mutate(log_Lobs = log(Lobs))

&gt; head(df)
# A tibble: 6 x 8
  id    Ltype  Lobs  Lt_1    td season season2 log_Lobs
  &lt;fct&gt; &lt;chr&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;fct&gt;  &lt;fct&gt;      &lt;dbl&gt;
1 1     L1       57    48   178 s1     s1          4.04
2 1     L2       76    57   184 s2     s2          4.33
3 1     L3       87    76   188 s3     s1          4.47
4 1     L4      103    87   194 s4     s2          4.63
5 2     L1       43    35   175 s1     s1          3.76
6 2     L2       63    43   176 s2     s2          4.14

fitting

library(nlme)
## model without seasonal effect
formula = Lobs ~ 150*((1 + ((150/Lt_1)^(1/p)-1)*exp(-k*td/365))^(-p))
p = 1.2
k = 0.4

fit_null &lt;- nlme(formula,
              fixed = c(p ~ 1, k ~ 1),
              random = p ~ 1 | id,
              data = df,
              start = list(fixed = c(p, k)),
              na.action = na.exclude,
              control=list(maxIter=1e6, msMaxIter = 1e6, msVerbose = TRUE))

## model with seasonal effect
p = 1.2
k = c(0.4, 1.09)

fit_withSeason &lt;- nlme(formula,
                fixed = c(p ~ 1, k ~ 1 + season2),
                random = k ~ 1 | id,
                data = df,
                start = list(fixed = c(p, k)),
                na.action = na.exclude,
                control=list(maxIter=1e6, msMaxIter = 1e6, msVerbose = TRUE))

summary of the two models

&gt; summary(fit_null)
Nonlinear mixed-effects model fit by maximum likelihood
Model: formula 
Data: df 
AIC      BIC    logLik
2312.975 2328.941 -1152.488
Random effects:
Formula: p ~ 1 | id
p Residual
StdDev: 8.300513e-05 4.315789
Fixed effects:  c(p ~ 1, k ~ 1) 
Value  Std.Error  DF   t-value p-value
p 0.7335954 0.09075133 299  8.083577       0
k 0.9823510 0.05594228 299 17.560080       0
Correlation: 
p     
k -0.969
Standardized Within-Group Residuals:
Min         Q1        Med         Q3        Max 
-1.4898414 -0.8966244 -0.2154542  0.8401292  2.2148509 
Number of Observations: 400
Number of Groups: 100 
&gt; summary(fit_withSeason)
Nonlinear mixed-effects model fit by maximum likelihood
Model: formula 
Data: df 
AIC      BIC    logLik
1466.922 1486.879 -728.4608
Random effects:
Formula: k ~ 1 | id
k.(Intercept) Residual
StdDev:    0.03294767  1.37547
Fixed effects:  c(p ~ 1, k ~ 1 + season2) 
Value  Std.Error  DF  t-value p-value
p             1.9509211 0.16984927 298 11.48619       0
k.(Intercept) 0.5212180 0.01156849 298 45.05498       0
k.season2s2   0.4151107 0.00828721 298 50.09055       0
Correlation: 
p      k.(In)
k.(Intercept) -0.868       
k.season2s2   -0.572  0.265
Standardized Within-Group Residuals:
Min           Q1          Med           Q3          Max 
-2.341255091 -0.727512901  0.003841993  0.622830078  3.190926108 
Number of Observations: 400
Number of Groups: 100 

Error

# model comparison to test the seasonal effect
&gt; anova(fit_null, fit_withSeason)
Error in getResponseFormula(el) : &#39;form&#39; must be a two-sided formula

答案1

得分: 1

这可能是nlme中的一个错误,但是一个棘手的问题。当你将公式传递为一个符号 称为"formula"(即"formula",而不是"Lobs ~ ")时,它无法正确提取模型调用中的公式。

如果你将公式的变量名从formula更改为(例如)form1,我认为一切都应该正常工作(另一个使用内置对象的名称作为变量名的原因!)

更一般地说(对于涉及存储为符号的对象的不正确评估的其他问题),你可以通过使用do.call()来解决这个问题,这会欺骗R在存储调用之前评估公式:

fit_null <- do.call(nlme,
          list(formula,
               fixed = c(p ~ 1, k ~ 1),
               random = p ~ 1 | id,
               data = df,
               start = list(fixed = c(p, k)),
               na.action = na.exclude,
               control=list(maxIter=1e6, msMaxIter = 1e6, msVerbose = TRUE)
         ))

## 具有季节效应的模型
p = 1.2
k = c(0.4, 1.09)

fit_withSeason <- do.call(nlme,
                list(formula,
                fixed = c(p ~ 1, k ~ 1 + season2),
                random = k ~ 1 | id,
                data = df,
                start = list(fixed = c(p, k)),
                na.action = na.exclude,
                control=list(maxIter=1e6, msMaxIter = 1e6, msVerbose = TRUE)))

anova(fit_null, fit_withSeason)

Model df      AIC      BIC     logLik   Test  L.Ratio p-value
fit_null           1  4 2317.913 2333.879 -1154.9565                        
fit_withSeason     2  5 1540.088 1560.045  -765.0438 1 vs 2 779.8253  &lt;.0001

我已将此提交为R bug

英文:

This is arguably a bug in nlme, but a tricky one. When you pass the formula as a symbol called "formula" (i.e. "formula", rather than "Lobs ~ <stuff>"), it fails to extract the formula from the model call properly.

If you change the variable name of your formula from formula to (e.g.) form1, I think everything should work fine (another reason not to use names of built-in objects as variable names!)

More generally (for other issues involving improper evaluation of objects stored as symbols) you can work around this by using do.call(), which tricks R into evaluating the formula before storing the call:

fit_null &lt;- do.call(nlme,
          list(formula,
               fixed = c(p ~ 1, k ~ 1),
               random = p ~ 1 | id,
               data = df,
               start = list(fixed = c(p, k)),
               na.action = na.exclude,
               control=list(maxIter=1e6, msMaxIter = 1e6, msVerbose = TRUE)
         ))

## model with seasonal effect
p = 1.2
k = c(0.4, 1.09)

fit_withSeason &lt;- do.call(nlme,
                list(formula,
                fixed = c(p ~ 1, k ~ 1 + season2),
                random = k ~ 1 | id,
                data = df,
                start = list(fixed = c(p, k)),
                na.action = na.exclude,
                control=list(maxIter=1e6, msMaxIter = 1e6, msVerbose = TRUE)))

anova(fit_null, fit_withSeason)

Model df      AIC      BIC     logLik   Test  L.Ratio p-value
fit_null           1  4 2317.913 2333.879 -1154.9565                        
fit_withSeason     2  5 1540.088 1560.045  -765.0438 1 vs 2 779.8253  &lt;.0001

I've submitted this as an R bug

huangapple
  • 本文由 发表于 2023年7月7日 07:02:49
  • 转载请务必保留本文链接:https://go.coder-hub.com/76632999.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定