英文:
R packages lme4 and glmmTMB produce different AIC for the same model and data
问题
我使用 lmer(){lme4} 和 glmmTMB(){glmmTMB} 分别对相同的数据拟合了相同的模型。模型的AIC因所使用的函数而异,我想知道为什么。
我生成了以下具有随机截距数据:
rm(list=ls())
library(lme4)
library(glmmTMB)
library(ggplot2)
# 具有8级随机效应和变量斜率的“回归”:
set.seed(1)
ii=8 # 参数设置大小; 组数
reps <- sample(x=8:12, size=ii, replace=T)
slope <- 3
intercept <- runif(ii, min = 70, max = 140)
group.sd <- runif(ii, min = 5, max = 15)
group.ID <- LETTERS[1:ii]
xx <- 1:15
out.list <- list() # 空列表用于存储模拟数据
for (i in 1:ii){
    df00 <- data.frame(Group=rep(group.ID[i], reps[i]*length(xx)),
                                x=rep(xx, reps[i]))
    df00$y = intercept[i] + df00$x * slope + rnorm(length(df00[,1]), 0, group.sd[i])
    out.list[[i]] = df00
    }
# 将out.list转换为数据框:
df <- do.call(rbind.data.frame, out.list)
# 数据可视化
gg20 <- ggplot(df, aes(x = x, y = y, colour = Group)) 
gg20 + geom_point() + geom_smooth(method="lm") + theme_classic()
如果我使用 lme4 和 glmmTMB 拟合相同的随机截距模型,相关的AIC不同:
glm4_ri <- lmer(y~x + (1|Group), data=df)
glmmTMB_ri <- glmmTMB(y~x + (1|Group), data=df)
AIC(glm4_ri, glmmTMB_ri)
           df      AIC
glm4_ri     4 8588.399
glmmTMB_ri  4 8590.300
我意识到差异很小,但为什么AIC不完全相同呢?
英文:
I fitted the same model to the same data using lmer(){lme4} and glmmTMB(){glmmTMB}. The model's AIC differed depending on the function used, and I would like to know why.
I generated the following random-intercept data:
rm(list=ls())
library(lme4)
library(glmmTMB)
library(ggplot2)
# "regression" with an 8-level RE with variable slopes:
set.seed(1)
ii=8 # parameter set size; no. groups
reps <- sample(x=8:12, size=ii, replace=T)
slope <- 3
intercept <- runif(ii, min = 70, max = 140)
group.sd <- runif(ii, min = 5, max = 15)
group.ID <- LETTERS[1:ii]
xx <- 1:15
out.list <- list() # empty list to store simulated data
for (i in 1:ii){
		df00 <- data.frame(Group=rep(group.ID[i], reps[i]*length(xx)),
									x=rep(xx, reps[i]))
		df00$y = intercept[i] + df00$x * slope + rnorm(length(df00[,1]), 0, group.sd[i])
		out.list[[i]] = df00
		}
# to turn out.list into a data.frame:
df <- do.call(rbind.data.frame, out.list)
# data visualisation
gg20 <- ggplot(df, aes(x = x, y = y, colour = Group)) 
gg20 + geom_point() + geom_smooth(method="lm") + theme_classic()
If I fit the same random-intercept model with lme4 and glmmTMB the associated AIC differs:
glm4_ri <- lmer(y~x + (1|Group), data=df)
glmmTMB_ri <- glmmTMB(y~x + (1|Group), data=df)
AIC(glm4_ri, glmmTMB_ri)
           df      AIC
glm4_ri     4 8588.399
glmmTMB_ri  4 8590.300
I realise that the discrepancy is small, but why aren't the AIC exactly the same?
答案1
得分: 1
tl;dr lmer默认使用REML,glmmTMB默认使用ML。这是一个令人惊讶和不幸的微妙差别 - lmer在某些情况下会尝试禁用REML(例如在anova()方法中),但有时候它会“不注意”。
如果你要比较具有不同固定效应成分的模型,你必须使用ML而不是REML。
英文:
tl;dr lmer is using REML by default, glmmTMB is using ML by default. This is surprisingly, and unfortunately, subtle - lmer does make some efforts to disable REML when it makes sense (e.g. in the anova() method), but sometimes it doesn't 'notice'.
> logLik(glm4_ri)
'log Lik.' -4290.2 (df=4)
> logLik(update(glm4_ri, REML = FALSE))
'log Lik.' -4291.15 (df=4)
> logLik(glmmTMB_ri)
'log Lik.' -4291.15 (df=4)
> logLik(update(glmmTMB_ri, REML = TRUE))
'log Lik.' -4290.2 (df=4)
If you're going to be comparing models with different fixed-effect components, you must use ML rather than REML.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。



评论