英文:
Why do my p-values change after rescaling variables? Mixed Models
问题
我有这段代码 m2 <- lmer(SE ~ b95rel*MF*TASK*ROI*side*Run + (1|IDcheck),f)
,其中 SE
,b95rel
和 MF
是连续变量,TASK
,ROI
,side
和 Run
是分类变量。
我收到了警告信息:
> 警告信息:
1: 一些预测变量在非常不同的尺度上:考虑重新缩放
当我重新缩放 b95rel
时, p 值会变化(例如,主效应 TASK
变得显著):
当我也重新缩放 MF
时, p 值再次变化:
我使用 R 中的 scale()
函数。
在线上,我总是读到 p 值在缩放时不应该改变,所以我很困惑如果我的改变了是什么意思...
这是缩放后的 MF
变量:
这是未缩放的 MF
变量:点与彼此的距离不变。我在缩放前后的其他变量也看到了同样的情况。只有 y 轴的值发生了变化。所以 p 值的变化不能来自这里...
我目前无法复制 dput()
的结果到这里,因为我的数据文件太大,其中有许多重复的变量... 我会看看是否找到解决办法并将其放入评论中...
这是 sessionInfo:
> sessionInfo()
R 版本 4.2.2(2022-10-31 ucrt)
平台:x86_64-w64-mingw32/x64(64 位)
运行在:Windows 10 x64(版本 19044)
矩阵产品:默认
语言环境:
[1] LC_COLLATE=English_Canada.utf8 LC_CTYPE=English_Canada.utf8 LC_MONETARY=English_Canada.utf8
[4] LC_NUMERIC=C LC_TIME=English_Canada.utf8
已附加的基本包:
[1] stats graphics grDevices utils datasets methods base
其他已附加的包:
[1] rstatix_0.7.2 ggpubr_0.6.0 lubridate_1.9.2 forcats_1.0.0 stringr_1.5.0 dplyr_1.1.1
[7] purrr_1.0.1 readr_2.1.4 tidyr_1.3.0 tibble_3.2.1 ggplot2_3.4.2 tidyverse_2.0.0
[13] sjmisc_2.8.9 sjPlot_2.8.14 MuMIn_1.47.5 lmerTest_3.1-3 lme4_1.1-32 Matrix_1.5-4
[19] sjstats_0.18.2 emmeans_1.8.5
通过命名空间加载的包(而非附加):
[1] splines_4.2.2 carData_3.0-5 modelr_0.1.11 datawizard_0.7.1 stats4_4.2.2
[6] bayestestR_0.13.1 numDeriv_2016.8-1.1 pillar_1.9.0 backports_1.4.1 lattice_0.20-45
[11] glue_1.6.2 RColorBrewer_1.1-3 ggsignif_0.6.4 minqa_1.2.5 colorspace_2.1-0
[16] sandwich_3.0-2 pkgconfig_2.0.3 broom_1.0.4 haven_2.5.2 xtable_1.8-4
[21] mvtnorm_1.1-3 scales_1.2.1 tzdb_0.3.0 timechange_0.2.0 farver_2.1.1
[26] generics_0.1.3 car_3.1-2 sjlabelled_1.2.0 TH.data_1.1-1 withr_2.5.0
[31] cli_3.4.1 effectsize_0.8.3 survival_3.4-0 magrittr_2.0.3 estimability_1.4.1
[36] fansi_1.0.4 nlme_3.1-160 MASS_7.3-60 tools_4.2.2 hms_1.1.3
[41] lifecycle_1.0.3 multcomp_1.4-23 munsell_0.5.0 ggeffects_1.2.2 compiler_4.2.2
[46] rlang_1.1.0 grid_4.2.2 nloptr_2.0.3 parameters_0.20.3 rstudioapi_0.14
[51] labeling_0.4.2 boot_1.3-28 gtable_0.3.3 codetools_0.2-18 abind_1.4-5
[56] R6_2.5.1 zoo_1.8-11 knitr_1.42 performance_0.10.3 utf8_1.2.3
[61] insight_0.19.1 stringi_1.7.12 Rcpp_1.0.10 vctrs_0.6.1 tidyselect_1.2.0
[66] xfun_0.38 coda_0.19-4
英文:
I have this code m2 <- lmer(SE ~ b95rel*MF*TASK*ROI*side*Run + (1|IDcheck),f)
where SE
, b95rel
and MF
are continuous variables, TASK
, ROI
, side
and Run
are categorical variables.
I get the warning message:
> Warning messages:
1: Some predictor variables are on very different scales: consider rescaling
Here the first rows of the result:
When I rescale b95rel
, p-values change (e.g. the main effect TASK
becomes significant):
When I also rescale MF
, p-values change again:
I use the scale()
function in R.
Online I always read that p-values shouldn't change when scaling, so I'm confused what it means if mine change...
This is the scaled MF variable:
This is the unscaled MF
variable: dots do not change distance to each other. The same I see for the other variables after and before scaling. Only the y-axis values change. So the change in p values cannot come from here...
I currently don't manage to copy the dput()
outome here as my data file is too big with its numerous repeated variables...I will see if I find a solution and put it into a comment...
Here the sessionInfo:
> sessionInfo()
R version 4.2.2 (2022-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19044)
Matrix products: default
locale:
[1] LC_COLLATE=English_Canada.utf8 LC_CTYPE=English_Canada.utf8 LC_MONETARY=English_Canada.utf8
[4] LC_NUMERIC=C LC_TIME=English_Canada.utf8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] rstatix_0.7.2 ggpubr_0.6.0 lubridate_1.9.2 forcats_1.0.0 stringr_1.5.0 dplyr_1.1.1
[7] purrr_1.0.1 readr_2.1.4 tidyr_1.3.0 tibble_3.2.1 ggplot2_3.4.2 tidyverse_2.0.0
[13] sjmisc_2.8.9 sjPlot_2.8.14 MuMIn_1.47.5 lmerTest_3.1-3 lme4_1.1-32 Matrix_1.5-4
[19] sjstats_0.18.2 emmeans_1.8.5
loaded via a namespace (and not attached):
[1] splines_4.2.2 carData_3.0-5 modelr_0.1.11 datawizard_0.7.1 stats4_4.2.2
[6] bayestestR_0.13.1 numDeriv_2016.8-1.1 pillar_1.9.0 backports_1.4.1 lattice_0.20-45
[11] glue_1.6.2 RColorBrewer_1.1-3 ggsignif_0.6.4 minqa_1.2.5 colorspace_2.1-0
[16] sandwich_3.0-2 pkgconfig_2.0.3 broom_1.0.4 haven_2.5.2 xtable_1.8-4
[21] mvtnorm_1.1-3 scales_1.2.1 tzdb_0.3.0 timechange_0.2.0 farver_2.1.1
[26] generics_0.1.3 car_3.1-2 sjlabelled_1.2.0 TH.data_1.1-1 withr_2.5.0
[31] cli_3.4.1 effectsize_0.8.3 survival_3.4-0 magrittr_2.0.3 estimability_1.4.1
[36] fansi_1.0.4 nlme_3.1-160 MASS_7.3-60 tools_4.2.2 hms_1.1.3
[41] lifecycle_1.0.3 multcomp_1.4-23 munsell_0.5.0 ggeffects_1.2.2 compiler_4.2.2
[46] rlang_1.1.0 grid_4.2.2 nloptr_2.0.3 parameters_0.20.3 rstudioapi_0.14
[51] labeling_0.4.2 boot_1.3-28 gtable_0.3.3 codetools_0.2-18 abind_1.4-5
[56] R6_2.5.1 zoo_1.8-11 knitr_1.42 performance_0.10.3 utf8_1.2.3
[61] insight_0.19.1 stringi_1.7.12 Rcpp_1.0.10 vctrs_0.6.1 tidyselect_1.2.0
[66] xfun_0.38 coda_0.19-4
答案1
得分: 3
简而言之,如果你使用scale(..., center = FALSE)
,p值应该保持不变。但是,在存在交互作用的情况下,你应该始终谨慎解释主效应的显著性,因为它们确实取决于交互作用中其他效应的居中处理(Venables 1988)。Venables指出,实际上是“在存在交互作用的情况下,不要解释主效应”;我对此的修正是“除非你知道并能够解释为什么这可能是有问题的,并且因此了解你实际上在测试什么”。
你的问题有两个方面:(1)默认情况下,scale()
不仅缩放变量,还会进行居中处理;(2)当主效应涉及模型中包含的高阶效应时,居中处理会改变主效应的含义和p值。
以下示例基于summary()
中的系数表,而不是使用anova()
,但要点相同。
我将使用lm()
给出一个示例(因为这与混合模型无关):
m1 <- lm(mpg ~ hp*disp, mtcars)
## 实用函数
p <- function(m) print(coef(summary(m))[,"Pr(>|t|)"], digits = 3)
p(m1)
## (Intercept) hp disp hp:disp
## 7.18e-14 4.73e-04 2.11e-05 2.41e-03
缩放并居中:交互作用的p值保持不变,但其他效应发生变化(因为我们现在在其他变量的均值处评估效应,而不是在其他变量为0时)。
msc <- transform(mtcars, hp = scale(hp), disp = scale(disp))
p(update(m1, data = msc))
## (Intercept) hp disp hp:disp
## 1.64e-20 1.30e-02 4.36e-05 2.41e-03
缩放但不居中:p值与原模型相同。
msc <- transform(mtcars, hp = scale(hp, center = FALSE), disp = scale(disp, center= FALSE))
p(update(m1, data = msc))
## (Intercept) hp disp hp:disp
## 7.18e-14 4.73e-04 2.11e-05 2.41e-03
附注:如果你想解释“type 3”方差分析,你可能也应该使用总和为零对比(options(contrasts = c("contr.sum", "contr.poly"))
)。
Venables, W. N. 1998. “Exegeses on Linear Models.” 1998 International S-PLUS User Conference. Washington, DC. http://www.stats.ox.ac.uk/pub/MASS3/Exegeses.pdf。
英文:
tl;dr if you scale(..., center = FALSE)
the p-values should stay constant. However, you should always be cautious about interpreting the significance of main effects in the presence of interactions, precisely because they depend on the centering of the other effects in the interaction (Venables 1988). Venables says, effectively, "never interpret main effects in the presence of interactions"; my revision of that is "unless you know and can explain exactly why it could be problematic, and thus understand what you're actually testing".
Your problem is two-fold: (1) by default, scale()
centers as well as scaling variables; (2) centering changes the meaning, and p-values, of main effects when they are involved in higher-order effects that are also contained in the model.
The examples below show effects based on the coefficient tables from summary()
rather than using anova()
, but the point is the same.
I'll give an example with lm()
(since this isn't specific to mixed models):
m1 <- lm(mpg ~ hp*disp, mtcars)
## utility function
p <- function(m) print(coef(summary(m))[,"Pr(>|t|)"], digits = 3)
p(m1)
## (Intercept) hp disp hp:disp
## 7.18e-14 4.73e-04 2.11e-05 2.41e-03
Scale and center: the interaction p-value stays the same, but the other effects change (because we are now evaluating effects at the mean of the other variable, rather than when the other variable is 0)
msc <- transform(mtcars, hp = scale(hp), disp = scale(disp))
p(update(m1, data = msc))
## (Intercept) hp disp hp:disp
## 1.64e-20 1.30e-02 4.36e-05 2.41e-03
Scale but don't center: p-values the same as the original model.
msc <- transform(mtcars, hp = scale(hp, center = FALSE), disp = scale(disp, center= FALSE))
p(update(m1, data = msc))
## (Intercept) hp disp hp:disp
## 7.18e-14 4.73e-04 2.11e-05 2.41e-03
PS if you want to interpret 'type 3' anova, you should probably also be using sum-to-zero contrasts (options(contrasts = c("contr.sum", "contr.poly"))
)
Venables, W. N. 1998. “Exegeses on Linear Models.” 1998 International S-PLUS User Conference. Washington, DC. http://www.stats.ox.ac.uk/pub/MASS3/Exegeses.pdf.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论