2023年5月23日 00:35:36go评论106阅读模式

英文:

Why do my p-values change after rescaling variables? Mixed Models

问题

我有这段代码 m2 <- lmer(SE ~ b95rel*MF*TASK*ROI*side*Run + (1|IDcheck),f)，其中 SE，b95rel 和 MF 是连续变量，TASK，ROI，side 和 Run 是分类变量。

我收到了警告信息：

> 警告信息：
1: 一些预测变量在非常不同的尺度上：考虑重新缩放

这是结果的前几行：

当我重新缩放 b95rel 时， p 值会变化（例如，主效应 TASK 变得显著）：

当我也重新缩放 MF 时， p 值再次变化：

我使用 R 中的 scale() 函数。

在线上，我总是读到 p 值在缩放时不应该改变，所以我很困惑如果我的改变了是什么意思...

这是缩放后的 MF 变量：

这是未缩放的 MF 变量：点与彼此的距离不变。我在缩放前后的其他变量也看到了同样的情况。只有 y 轴的值发生了变化。所以 p 值的变化不能来自这里...

我目前无法复制 dput() 的结果到这里，因为我的数据文件太大，其中有许多重复的变量... 我会看看是否找到解决办法并将其放入评论中...

这是 sessionInfo：

&gt; sessionInfo()
R 版本 4.2.2（2022-10-31 ucrt）
平台：x86_64-w64-mingw32/x64（64 位）
运行在：Windows 10 x64（版本 19044）
矩阵产品：默认
语言环境：
[1] LC_COLLATE=English_Canada.utf8  LC_CTYPE=English_Canada.utf8    LC_MONETARY=English_Canada.utf8
[4] LC_NUMERIC=C                    LC_TIME=English_Canada.utf8    
已附加的基本包：
[1] stats     graphics  grDevices utils     datasets  methods   base     
其他已附加的包：
 [1] rstatix_0.7.2   ggpubr_0.6.0    lubridate_1.9.2 forcats_1.0.0   stringr_1.5.0   dplyr_1.1.1    
 [7] purrr_1.0.1     readr_2.1.4     tidyr_1.3.0     tibble_3.2.1    ggplot2_3.4.2   tidyverse_2.0.0
[13] sjmisc_2.8.9    sjPlot_2.8.14   MuMIn_1.47.5    lmerTest_3.1-3  lme4_1.1-32     Matrix_1.5-4   
[19] sjstats_0.18.2  emmeans_1.8.5  
通过命名空间加载的包（而非附加）：
 [1] splines_4.2.2       carData_3.0-5       modelr_0.1.11       datawizard_0.7.1    stats4_4.2.2       
 [6] bayestestR_0.13.1   numDeriv_2016.8-1.1 pillar_1.9.0        backports_1.4.1     lattice_0.20-45    
[11] glue_1.6.2          RColorBrewer_1.1-3  ggsignif_0.6.4      minqa_1.2.5         colorspace_2.1-0   
[16] sandwich_3.0-2      pkgconfig_2.0.3     broom_1.0.4         haven_2.5.2         xtable_1.8-4       
[21] mvtnorm_1.1-3       scales_1.2.1        tzdb_0.3.0          timechange_0.2.0    farver_2.1.1       
[26] generics_0.1.3      car_3.1-2           sjlabelled_1.2.0    TH.data_1.1-1       withr_2.5.0        
[31] cli_3.4.1           effectsize_0.8.3    survival_3.4-0      magrittr_2.0.3      estimability_1.4.1 
[36] fansi_1.0.4         nlme_3.1-160        MASS_7.3-60         tools_4.2.2         hms_1.1.3          
[41] lifecycle_1.0.3     multcomp_1.4-23     munsell_0.5.0       ggeffects_1.2.2     compiler_4.2.2     
[46] rlang_1.1.0         grid_4.2.2          nloptr_2.0.3        parameters_0.20.3   rstudioapi_0.14    
[51] labeling_0.4.2      boot_1.3-28         gtable_0.3.3        codetools_0.2-18    abind_1.4-5        
[56] R6_2.5.1            zoo_1.8-11          knitr_1.42          performance_0.10.3  utf8_1.2.3         
[61] insight_0.19.1      stringi_1.7.12      Rcpp_1.0.10         vctrs_0.6.1         tidyselect_1.2.0   
[66] xfun_0.38           coda_0.19-4

英文:

I have this code m2 <- lmer(SE ~ b95rel*MF*TASK*ROI*side*Run + (1|IDcheck),f) where SE, b95rel and MF are continuous variables, TASK, ROI, side and Run are categorical variables.

I get the warning message:

> Warning messages:
1: Some predictor variables are on very different scales: consider rescaling

Here the first rows of the result:

When I rescale b95rel, p-values change (e.g. the main effect TASK becomes significant):

When I also rescale MF, p-values change again:

I use the scale() function in R.

Online I always read that p-values shouldn't change when scaling, so I'm confused what it means if mine change...

This is the scaled MF variable:

This is the unscaled MF variable: dots do not change distance to each other. The same I see for the other variables after and before scaling. Only the y-axis values change. So the change in p values cannot come from here...

I currently don't manage to copy the dput() outome here as my data file is too big with its numerous repeated variables...I will see if I find a solution and put it into a comment...

Here the sessionInfo:

&gt; sessionInfo()
R version 4.2.2 (2022-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19044)
Matrix products: default
locale:
[1] LC_COLLATE=English_Canada.utf8  LC_CTYPE=English_Canada.utf8    LC_MONETARY=English_Canada.utf8
[4] LC_NUMERIC=C                    LC_TIME=English_Canada.utf8    
attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     
other attached packages:
 [1] rstatix_0.7.2   ggpubr_0.6.0    lubridate_1.9.2 forcats_1.0.0   stringr_1.5.0   dplyr_1.1.1    
 [7] purrr_1.0.1     readr_2.1.4     tidyr_1.3.0     tibble_3.2.1    ggplot2_3.4.2   tidyverse_2.0.0
[13] sjmisc_2.8.9    sjPlot_2.8.14   MuMIn_1.47.5    lmerTest_3.1-3  lme4_1.1-32     Matrix_1.5-4   
[19] sjstats_0.18.2  emmeans_1.8.5  
loaded via a namespace (and not attached):
 [1] splines_4.2.2       carData_3.0-5       modelr_0.1.11       datawizard_0.7.1    stats4_4.2.2       
 [6] bayestestR_0.13.1   numDeriv_2016.8-1.1 pillar_1.9.0        backports_1.4.1     lattice_0.20-45    
[11] glue_1.6.2          RColorBrewer_1.1-3  ggsignif_0.6.4      minqa_1.2.5         colorspace_2.1-0   
[16] sandwich_3.0-2      pkgconfig_2.0.3     broom_1.0.4         haven_2.5.2         xtable_1.8-4       
[21] mvtnorm_1.1-3       scales_1.2.1        tzdb_0.3.0          timechange_0.2.0    farver_2.1.1       
[26] generics_0.1.3      car_3.1-2           sjlabelled_1.2.0    TH.data_1.1-1       withr_2.5.0        
[31] cli_3.4.1           effectsize_0.8.3    survival_3.4-0      magrittr_2.0.3      estimability_1.4.1 
[36] fansi_1.0.4         nlme_3.1-160        MASS_7.3-60         tools_4.2.2         hms_1.1.3          
[41] lifecycle_1.0.3     multcomp_1.4-23     munsell_0.5.0       ggeffects_1.2.2     compiler_4.2.2     
[46] rlang_1.1.0         grid_4.2.2          nloptr_2.0.3        parameters_0.20.3   rstudioapi_0.14    
[51] labeling_0.4.2      boot_1.3-28         gtable_0.3.3        codetools_0.2-18    abind_1.4-5        
[56] R6_2.5.1            zoo_1.8-11          knitr_1.42          performance_0.10.3  utf8_1.2.3         
[61] insight_0.19.1      stringi_1.7.12      Rcpp_1.0.10         vctrs_0.6.1         tidyselect_1.2.0   
[66] xfun_0.38           coda_0.19-4

答案1

得分: 3

简而言之，如果你使用scale(..., center = FALSE)，p值应该保持不变。但是，在存在交互作用的情况下，你应该始终谨慎解释主效应的显著性，因为它们确实取决于交互作用中其他效应的居中处理（Venables 1988）。Venables指出，实际上是“在存在交互作用的情况下，不要解释主效应”；我对此的修正是“除非你知道并能够解释为什么这可能是有问题的，并且因此了解你实际上在测试什么”。

你的问题有两个方面：（1）默认情况下，scale()不仅缩放变量，还会进行居中处理；（2）当主效应涉及模型中包含的高阶效应时，居中处理会改变主效应的含义和p值。

以下示例基于summary()中的系数表，而不是使用anova()，但要点相同。

我将使用lm()给出一个示例（因为这与混合模型无关）：

m1 <- lm(mpg ~ hp*disp, mtcars)
## 实用函数
p <- function(m) print(coef(summary(m))[,"Pr(>|t|)"], digits = 3)
p(m1)
## (Intercept)          hp        disp     hp:disp 
##   7.18e-14    4.73e-04    2.11e-05    2.41e-03

缩放并居中：交互作用的p值保持不变，但其他效应发生变化（因为我们现在在其他变量的均值处评估效应，而不是在其他变量为0时）。

msc <- transform(mtcars, hp = scale(hp), disp = scale(disp))
p(update(m1, data = msc))
## (Intercept)          hp        disp     hp:disp 
##   1.64e-20    1.30e-02    4.36e-05    2.41e-03

缩放但不居中：p值与原模型相同。

msc <- transform(mtcars, hp = scale(hp, center = FALSE), disp = scale(disp, center= FALSE))
p(update(m1, data = msc))
## (Intercept)          hp        disp     hp:disp 
##   7.18e-14    4.73e-04    2.11e-05    2.41e-03

附注：如果你想解释“type 3”方差分析，你可能也应该使用总和为零对比（options(contrasts = c("contr.sum", "contr.poly"))）。

Venables, W. N. 1998. “Exegeses on Linear Models.” 1998 International S-PLUS User Conference. Washington, DC. http://www.stats.ox.ac.uk/pub/MASS3/Exegeses.pdf。

英文:

tl;dr if you scale(..., center = FALSE) the p-values should stay constant. However, you should always be cautious about interpreting the significance of main effects in the presence of interactions, precisely because they depend on the centering of the other effects in the interaction (Venables 1988). Venables says, effectively, "never interpret main effects in the presence of interactions"; my revision of that is "unless you know and can explain exactly why it could be problematic, and thus understand what you're actually testing".

Your problem is two-fold: (1) by default, scale() centers as well as scaling variables; (2) centering changes the meaning, and p-values, of main effects when they are involved in higher-order effects that are also contained in the model.

The examples below show effects based on the coefficient tables from summary() rather than using anova(), but the point is the same.

I'll give an example with lm() (since this isn't specific to mixed models):

m1 &lt;- lm(mpg ~ hp*disp, mtcars)
## utility function
p &lt;- function(m) print(coef(summary(m))[,&quot;Pr(&gt;|t|)&quot;], digits = 3)
p(m1)
## (Intercept)          hp        disp     hp:disp 
##   7.18e-14    4.73e-04    2.11e-05    2.41e-03

Scale and center: the interaction p-value stays the same, but the other effects change (because we are now evaluating effects at the mean of the other variable, rather than when the other variable is 0)

msc &lt;- transform(mtcars, hp = scale(hp), disp = scale(disp))
p(update(m1, data = msc))
## (Intercept)          hp        disp     hp:disp 
##   1.64e-20    1.30e-02    4.36e-05    2.41e-03

Scale but don't center: p-values the same as the original model.

msc &lt;- transform(mtcars, hp = scale(hp, center = FALSE), disp = scale(disp, center= FALSE))
p(update(m1, data = msc))
## (Intercept)          hp        disp     hp:disp 
##   7.18e-14    4.73e-04    2.11e-05    2.41e-03

PS if you want to interpret 'type 3' anova, you should probably also be using sum-to-zero contrasts (options(contrasts = c("contr.sum", "contr.poly")))

Venables, W. N. 1998. “Exegeses on Linear Models.” 1998 International S-PLUS User Conference. Washington, DC. http://www.stats.ox.ac.uk/pub/MASS3/Exegeses.pdf.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

我的 p 值在重新缩放变量后为什么会改变？混合模型

问题

答案1

如何使用逗号分隔的数字作为数值变量

在迭代字符向量时跳过某些元素时出错。

如何在R中将经纬度坐标转换为兰伯特等积圆锥投影

奇怪的错误，带有斜体大写希腊字母。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。