英文:
Error in computing GAIC of linear model and other GLMs
问题
我的数据集可以在这里找到:https://raw.githubusercontent.com/yuliaUU/test/main/test.csv
library(gamlss)
library(tidyverse)
data_final <- read_csv("https://raw.githubusercontent.com/yuliaUU/test/main/test.csv")
# 使用对数转换的正态模型
model_1 <- gamlss(log(Abundance) ~ salinity * avrg_dep, data = data_final, family = NO())
# 对数正态模型
model_2 <- gamlss(Abundance ~ salinity * avrg_dep, data = data_final, family = LOGNO())
# 带有逆高斯分布的模型
model_3 <- gamlss(Abundance ~ salinity * avrg_dep, data = data_final, family = IG())
# Gamma模型
model_4 <- gamlss(Abundance ~ salinity * avrg_dep, data = data_final, family = GA())
我想要使用GAIC来比较这些模型,但第一个模型的GAIC值与其余模型相差较大。
我读到:
> 为了确保具有转换响应的线性模型的GAIC可比性,使用了转换后的对数似然乘以Jacobian,并手动重新计算了GAIC。
我尝试按以下方式进行:
Jacobian <- 1/abs(data_final$Abundance)
# 手动计算拟合值(对数尺度)
fitted_values_log <- predict(model_1)
# 手动计算残差(对数尺度)
residuals_transformed <- log(data_final$Abundance) - fitted_values_log
# 计算残差的标准差
sd_residuals_transformed <- sd(residuals_transformed)
# 计算转换后的对数似然
log_likelihood_transformed <- sum(dnorm(log(data_final$Abundance), mean = fitted_values_log, sd = sd_residuals_transformed, log = TRUE) * Jacobian)
# 计算自由度:模型中的参数数量
df <- length(coef(model_1))
# 手动计算GAIC
GAIC_transformed <- -2 * log_likelihood_transformed + 2 * df
GAIC_transformed
但所产生的值差距很大,所以我认为我在某个地方犯了错误。
英文:
my dataset can be found here: https://raw.githubusercontent.com/yuliaUU/test/main/test.csv
library(gamlss)
library(tidyverse)
data_final<- read_csv("https://raw.githubusercontent.com/yuliaUU/test/main/test.csv")
# Normal model with log transformation
model_1 <- gamlss(log(Abundance) ~ salinity*avrg_dep, data = data_final, family = NO())
# log normal model
model_2 <- gamlss(Abundance ~ salinity*avrg_dep, data = data_final, family = LOGNO())
# Model with inverse gaussian distribution
model_3 <- gamlss(Abundance ~ salinity*avrg_dep, data = data_final, family = IG())
# Gamma model
model_4 <- gamlss(Abundance ~ salinity*avrg_dep, data = data_final, family = GA())
I want to use GAIC to compare between the models, but GAIC value for 1st model is far off from the rest
I read that:
>To ensure that the GAIC of the linear model with the transformed response was comparable, the transformed log-likelihood multiplied by the Jacobian was used, and the GAIC was re-calculated manually.
I tried to do it the following way:
Jacobian <- 1/abs(data_final$Abundance)
# Calculate fitted values (on the log scale)
fitted_values_log <- predict(model_1)
# Calculate residuals manually (on the log scale)
residuals_transformed <- log(data_final$Abundance) - fitted_values_log
# Calculate standard deviation of the residuals
sd_residuals_transformed <- sd(residuals_transformed)
# Transformed log-likelihood calculation
log_likelihood_transformed <- sum(dnorm(log(data_final$Abundance), mean=fitted_values_log, sd=sd_residuals_transformed, log=TRUE) * Jacobian)
# Calculate degrees of freedom: number of parameters in the model
df <- length(coef(model_1))
# Manually calculate GAIC
GAIC_transformed <- -2 * log_likelihood_transformed + 2 * df
GAIC_transformed
but the value produced is sooo off, so I think I made a mistake somewhere
答案1
得分: 0
最简单的答案是在gamlss中明确拟合对数正态分布,即family=LOGNO。
一个更一般的答案,适用于实数线上的正态分布以外的分布,例如TF,是创建相应的logTF分布:
gen.Family ("TF", type="log")
然后在gamlss拟合中使用
family=logTF
英文:
The easiest answer is to just fit the lognormal explicitly in gamlss,
i.e. family=LOGNO
A more general answer, that applies to distributions other than the normal distribution on the real line, e.g. TF, is to create the corresponding logTF distribution:
gen.Family ("TF", type="log")
and then in the gamlss fit use
family=logTF
答案2
得分: 0
Model 1: 使用lm()对log-transformed响应进行建模
model_1 <- lm(log(Abundance) ~ salinity * avrg_dep, data = data_final)
计算模型的log-likelihood
logL <- logLik(model_1)
使用log转换的Jacobian来调整log-likelihood
adjusted_logL <- logL + sum(log(1/data_final$Abundance))
统计模型中的参数数量(包括截距)
k <- length(coef(model_1))
获取样本大小
n <- length(model_1$residuals)
计算使用调整后log-likelihood的GAIC
GAIC_adjusted <- -2adjusted_logL + 2k + 2k(k+1)/(n-k-1)
print(GAIC_adjusted)
英文:
# Model 1: Log-transformed response with lm()
model_1 <- lm(log(Abundance) ~ salinity * avrg_dep, data = data_final)
# Calculate log-likelihood of the model
logL <- logLik(model_1)
# Adjust the log-likelihood using the Jacobian for a log transformation
adjusted_logL <- logL + sum(log(1/data_final$Abundance))
# Count the number of parameters in the model (including intercept)
k <- length(coef(model_1))
# Get sample size
n <- length(model_1$residuals)
# Compute GAIC with adjusted log-likelihood
GAIC_adjusted <- -2*adjusted_logL + 2*k + 2*k*(k+1)/(n-k-1)
print(GAIC_adjusted)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论