英文:
How to manually calculate Variance Inflation Factor (VIF)?
问题
我想手动计算mtcars数据集的方差膨胀因子(VIF)。
当我使用包中的函数进行计算时,我得到以下结果:
install.packages('car')
library(car)
fit <- lm(mtcars[,1] ~ ., mtcars[,-1])
summary(fit)
cyl disp hp drat wt qsec
15.373833 21.620241 9.832037 3.374620 15.164887 7.527958
vs am gear carb
4.965873 4.648487 5.357452 7.908747
然后,我逐个为每一列进行计算:
对于第一个列cyl
,一切都很正常:
fit <- lm(mtcars[,2] ~ ., mtcars[,- c(1:2)])
summary(fit)$r.squared
1/(1-summary(fit)$r.squared)
[1] 15.37383
然而,对于disp
,已经有一个差异。
fit2 <- lm(mtcars[,3] ~ ., mtcars[,- c(1:3)])
summary(fit2)$r.squared
1/(1-summary(fit2)$r.squared)
[1] 20.08864 # 但它应该是 21.620241
问题出在哪里?
英文:
I would like to manually calculate the Variance Inflation Factor (VIF) for mtcars dataset.
When I do it using the function from package I have the following result:
install.packages('car')
library(car)
fit <- lm(mtcars[,1] ~ ., mtcars[,-1])
summary(fit)
cyl disp hp drat wt qsec
15.373833 21.620241 9.832037 3.374620 15.164887 7.527958
vs am gear carb
4.965873 4.648487 5.357452 7.908747
Then I do it one by one for each column:
For the first one cyl
it's perfectly ok:
fit <- lm(mtcars[,2] ~ ., mtcars[,- c(1:2)])
summary(fit)$r.squared
1/(1-summary(fit)$r.squared)
[1] 15.37383
While for disp
there's a difference already.
fit2 <- lm(mtcars[,3] ~ ., mtcars[,- c(1:3)])
summary(fit2)$r.squared
1/(1-summary(fit2)$r.squared)
[1] 20.08864 # but it must be 21.620241
What's wrong?
答案1
得分: 1
正确的VIF公式是:
fit2 <- lm(mtcars[,3] ~ ., mtcars[,-c(1,3)])
1/(1-summary(fit2)$r.squared)
21.62024
请注意,1:3
表示 c(1,2,3)
。我们不想在设计矩阵中排除第二列。因此,我们应该指定 c(1,3)
。
英文:
The correct formula for VIF is:
fit2 <- lm(mtcars[,3] ~ ., mtcars[,- c(1,3)])
1/(1-summary(fit2)$r.squared)
21.62024
Note that 1:3
means c(1,2,3)
. We don't want to exclude the second column in the design matrix. Therefore, we should specify c(1,3)
instead.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论