我的 p 值在重新缩放变量后为什么会改变?混合模型

huangapple go评论60阅读模式
英文:

Why do my p-values change after rescaling variables? Mixed Models

问题

我有这段代码 m2 <- lmer(SE ~ b95rel*MF*TASK*ROI*side*Run + (1|IDcheck),f),其中 SEb95relMF 是连续变量,TASKROIsideRun 是分类变量。

我收到了警告信息:

> 警告信息:
1: 一些预测变量在非常不同的尺度上:考虑重新缩放

这是结果的前几行:
我的 p 值在重新缩放变量后为什么会改变?混合模型

当我重新缩放 b95rel 时, p 值会变化(例如,主效应 TASK 变得显著):

我的 p 值在重新缩放变量后为什么会改变?混合模型

当我也重新缩放 MF 时, p 值再次变化:

我的 p 值在重新缩放变量后为什么会改变?混合模型

我使用 R 中的 scale() 函数。

在线上,我总是读到 p 值在缩放时不应该改变,所以我很困惑如果我的改变了是什么意思...

这是缩放后的 MF 变量:

我的 p 值在重新缩放变量后为什么会改变?混合模型

这是未缩放的 MF 变量:点与彼此的距离不变。我在缩放前后的其他变量也看到了同样的情况。只有 y 轴的值发生了变化。所以 p 值的变化不能来自这里...

我的 p 值在重新缩放变量后为什么会改变?混合模型

我目前无法复制 dput() 的结果到这里,因为我的数据文件太大,其中有许多重复的变量... 我会看看是否找到解决办法并将其放入评论中...

这是 sessionInfo:

> sessionInfo()
R 版本 4.2.22022-10-31 ucrt)
平台:x86_64-w64-mingw32/x64(64 位)
运行在:Windows 10 x64(版本 19044
矩阵产品:默认

语言环境:
[1] LC_COLLATE=English_Canada.utf8  LC_CTYPE=English_Canada.utf8    LC_MONETARY=English_Canada.utf8
[4] LC_NUMERIC=C                    LC_TIME=English_Canada.utf8    

已附加的基本包:
[1] stats     graphics  grDevices utils     datasets  methods   base     

其他已附加的包:
 [1] rstatix_0.7.2   ggpubr_0.6.0    lubridate_1.9.2 forcats_1.0.0   stringr_1.5.0   dplyr_1.1.1    
 [7] purrr_1.0.1     readr_2.1.4     tidyr_1.3.0     tibble_3.2.1    ggplot2_3.4.2   tidyverse_2.0.0
[13] sjmisc_2.8.9    sjPlot_2.8.14   MuMIn_1.47.5    lmerTest_3.1-3  lme4_1.1-32     Matrix_1.5-4   
[19] sjstats_0.18.2  emmeans_1.8.5  

通过命名空间加载的包(而非附加):
 [1] splines_4.2.2       carData_3.0-5       modelr_0.1.11       datawizard_0.7.1    stats4_4.2.2       
 [6] bayestestR_0.13.1   numDeriv_2016.8-1.1 pillar_1.9.0        backports_1.4.1     lattice_0.20-45    
[11] glue_1.6.2          RColorBrewer_1.1-3  ggsignif_0.6.4      minqa_1.2.5         colorspace_2.1-0   
[16] sandwich_3.0-2      pkgconfig_2.0.3     broom_1.0.4         haven_2.5.2         xtable_1.8-4       
[21] mvtnorm_1.1-3       scales_1.2.1        tzdb_0.3.0          timechange_0.2.0    farver_2.1.1       
[26] generics_0.1.3      car_3.1-2           sjlabelled_1.2.0    TH.data_1.1-1       withr_2.5.0        
[31] cli_3.4.1           effectsize_0.8.3    survival_3.4-0      magrittr_2.0.3      estimability_1.4.1 
[36] fansi_1.0.4         nlme_3.1-160        MASS_7.3-60         tools_4.2.2         hms_1.1.3          
[41] lifecycle_1.0.3     multcomp_1.4-23     munsell_0.5.0       ggeffects_1.2.2     compiler_4.2.2     
[46] rlang_1.1.0         grid_4.2.2          nloptr_2.0.3        parameters_0.20.3   rstudioapi_0.14    
[51] labeling_0.4.2      boot_1.3-28         gtable_0.3.3        codetools_0.2-18    abind_1.4-5        
[56] R6_2.5.1            zoo_1.8-11          knitr_1.42          performance_0.10.3  utf8_1.2.3         
[61] insight_0.19.1      stringi_1.7.12      Rcpp_1.0.10         vctrs_0.6.1         tidyselect_1.2.0   
[66] xfun_0.38           coda_0.19-4    
英文:

I have this code m2 <- lmer(SE ~ b95rel*MF*TASK*ROI*side*Run + (1|IDcheck),f) where SE, b95rel and MF are continuous variables, TASK, ROI, side and Run are categorical variables.

I get the warning message:

> Warning messages:
1: Some predictor variables are on very different scales: consider rescaling

Here the first rows of the result:
我的 p 值在重新缩放变量后为什么会改变?混合模型

When I rescale b95rel, p-values change (e.g. the main effect TASK becomes significant):

我的 p 值在重新缩放变量后为什么会改变?混合模型

When I also rescale MF, p-values change again:

我的 p 值在重新缩放变量后为什么会改变?混合模型

I use the scale() function in R.

Online I always read that p-values shouldn't change when scaling, so I'm confused what it means if mine change...

This is the scaled MF variable:

我的 p 值在重新缩放变量后为什么会改变?混合模型

This is the unscaled MF variable: dots do not change distance to each other. The same I see for the other variables after and before scaling. Only the y-axis values change. So the change in p values cannot come from here...

我的 p 值在重新缩放变量后为什么会改变?混合模型

I currently don't manage to copy the dput() outome here as my data file is too big with its numerous repeated variables...I will see if I find a solution and put it into a comment...

Here the sessionInfo:

> sessionInfo()
R version 4.2.2 (2022-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19044)

Matrix products: default

locale:
[1] LC_COLLATE=English_Canada.utf8  LC_CTYPE=English_Canada.utf8    LC_MONETARY=English_Canada.utf8
[4] LC_NUMERIC=C                    LC_TIME=English_Canada.utf8    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] rstatix_0.7.2   ggpubr_0.6.0    lubridate_1.9.2 forcats_1.0.0   stringr_1.5.0   dplyr_1.1.1    
 [7] purrr_1.0.1     readr_2.1.4     tidyr_1.3.0     tibble_3.2.1    ggplot2_3.4.2   tidyverse_2.0.0
[13] sjmisc_2.8.9    sjPlot_2.8.14   MuMIn_1.47.5    lmerTest_3.1-3  lme4_1.1-32     Matrix_1.5-4   
[19] sjstats_0.18.2  emmeans_1.8.5  

loaded via a namespace (and not attached):
 [1] splines_4.2.2       carData_3.0-5       modelr_0.1.11       datawizard_0.7.1    stats4_4.2.2       
 [6] bayestestR_0.13.1   numDeriv_2016.8-1.1 pillar_1.9.0        backports_1.4.1     lattice_0.20-45    
[11] glue_1.6.2          RColorBrewer_1.1-3  ggsignif_0.6.4      minqa_1.2.5         colorspace_2.1-0   
[16] sandwich_3.0-2      pkgconfig_2.0.3     broom_1.0.4         haven_2.5.2         xtable_1.8-4       
[21] mvtnorm_1.1-3       scales_1.2.1        tzdb_0.3.0          timechange_0.2.0    farver_2.1.1       
[26] generics_0.1.3      car_3.1-2           sjlabelled_1.2.0    TH.data_1.1-1       withr_2.5.0        
[31] cli_3.4.1           effectsize_0.8.3    survival_3.4-0      magrittr_2.0.3      estimability_1.4.1 
[36] fansi_1.0.4         nlme_3.1-160        MASS_7.3-60         tools_4.2.2         hms_1.1.3          
[41] lifecycle_1.0.3     multcomp_1.4-23     munsell_0.5.0       ggeffects_1.2.2     compiler_4.2.2     
[46] rlang_1.1.0         grid_4.2.2          nloptr_2.0.3        parameters_0.20.3   rstudioapi_0.14    
[51] labeling_0.4.2      boot_1.3-28         gtable_0.3.3        codetools_0.2-18    abind_1.4-5        
[56] R6_2.5.1            zoo_1.8-11          knitr_1.42          performance_0.10.3  utf8_1.2.3         
[61] insight_0.19.1      stringi_1.7.12      Rcpp_1.0.10         vctrs_0.6.1         tidyselect_1.2.0   
[66] xfun_0.38           coda_0.19-4    

答案1

得分: 3

简而言之,如果你使用scale(..., center = FALSE),p值应该保持不变。但是,在存在交互作用的情况下,你应该始终谨慎解释主效应的显著性,因为它们确实取决于交互作用中其他效应的居中处理(Venables 1988)。Venables指出,实际上是“在存在交互作用的情况下,不要解释主效应”;我对此的修正是“除非你知道并能够解释为什么这可能是有问题的,并且因此了解你实际上在测试什么”。

你的问题有两个方面:(1)默认情况下,scale()不仅缩放变量,还会进行居中处理;(2)当主效应涉及模型中包含的高阶效应时,居中处理会改变主效应的含义和p值。

以下示例基于summary()中的系数表,而不是使用anova(),但要点相同。

我将使用lm()给出一个示例(因为这与混合模型无关):

m1 <- lm(mpg ~ hp*disp, mtcars)
## 实用函数
p <- function(m) print(coef(summary(m))[,"Pr(>|t|)"], digits = 3)
p(m1)
## (Intercept)          hp        disp     hp:disp 
##   7.18e-14    4.73e-04    2.11e-05    2.41e-03 

缩放并居中:交互作用的p值保持不变,但其他效应发生变化(因为我们现在在其他变量的均值处评估效应,而不是在其他变量为0时)。

msc <- transform(mtcars, hp = scale(hp), disp = scale(disp))
p(update(m1, data = msc))
## (Intercept)          hp        disp     hp:disp 
##   1.64e-20    1.30e-02    4.36e-05    2.41e-03 

缩放但不居中:p值与原模型相同。

msc <- transform(mtcars, hp = scale(hp, center = FALSE), disp = scale(disp, center= FALSE))
p(update(m1, data = msc))
## (Intercept)          hp        disp     hp:disp 
##   7.18e-14    4.73e-04    2.11e-05    2.41e-03 

附注:如果你想解释“type 3”方差分析,你可能也应该使用总和为零对比(options(contrasts = c("contr.sum", "contr.poly")))。


Venables, W. N. 1998. “Exegeses on Linear Models.” 1998 International S-PLUS User Conference. Washington, DC. http://www.stats.ox.ac.uk/pub/MASS3/Exegeses.pdf

英文:

tl;dr if you scale(..., center = FALSE) the p-values should stay constant. However, you should always be cautious about interpreting the significance of main effects in the presence of interactions, precisely because they depend on the centering of the other effects in the interaction (Venables 1988). Venables says, effectively, "never interpret main effects in the presence of interactions"; my revision of that is "unless you know and can explain exactly why it could be problematic, and thus understand what you're actually testing".

Your problem is two-fold: (1) by default, scale() centers as well as scaling variables; (2) centering changes the meaning, and p-values, of main effects when they are involved in higher-order effects that are also contained in the model.

The examples below show effects based on the coefficient tables from summary() rather than using anova(), but the point is the same.

I'll give an example with lm() (since this isn't specific to mixed models):

m1 &lt;- lm(mpg ~ hp*disp, mtcars)
## utility function
p &lt;- function(m) print(coef(summary(m))[,&quot;Pr(&gt;|t|)&quot;], digits = 3)
p(m1)
## (Intercept)          hp        disp     hp:disp 
##   7.18e-14    4.73e-04    2.11e-05    2.41e-03 

Scale and center: the interaction p-value stays the same, but the other effects change (because we are now evaluating effects at the mean of the other variable, rather than when the other variable is 0)

msc &lt;- transform(mtcars, hp = scale(hp), disp = scale(disp))
p(update(m1, data = msc))
## (Intercept)          hp        disp     hp:disp 
##   1.64e-20    1.30e-02    4.36e-05    2.41e-03 

Scale but don't center: p-values the same as the original model.

msc &lt;- transform(mtcars, hp = scale(hp, center = FALSE), disp = scale(disp, center= FALSE))
p(update(m1, data = msc))
## (Intercept)          hp        disp     hp:disp 
##   7.18e-14    4.73e-04    2.11e-05    2.41e-03 

PS if you want to interpret 'type 3' anova, you should probably also be using sum-to-zero contrasts (options(contrasts = c(&quot;contr.sum&quot;, &quot;contr.poly&quot;)))


Venables, W. N. 1998. “Exegeses on Linear Models.” 1998 International S-PLUS User Conference. Washington, DC. http://www.stats.ox.ac.uk/pub/MASS3/Exegeses.pdf.

huangapple
  • 本文由 发表于 2023年5月23日 00:35:36
  • 转载请务必保留本文链接:https://go.coder-hub.com/76308262.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定