2023年6月26日 08:18:46go评论94阅读模式

英文:

Boxplot t-test significance different than ANOVA significance

问题

我已运行单向ANOVA和Tukey事后分析，使用aov()和TukeyHSD()函数。我想要在箱线图上显示与TukeyHSD输出相关的p值（即p adj）。我唯一知道的方法是使用ggpubr包中的stat_compare_means()函数。

问题在于，我只能使用stat_compare_means()函数运行t检验来获取p值，而这与TukeyHSD输出返回的p值不同。如何使箱线图显示TukeyHSD输出的p adj值？

注意：对于下面的示例，我只对比Fair切割的平均价格与所有其他切割的平均价格（例如，Good，Very Good，Premium和Ideal）。

library(ggplot2)
#> 警告: package 'ggplot2' was built under R version 4.2.3
library(ggpubr)
library(tidyverse)
#> 警告: package 'tidyverse' was built under R version 4.2.3
#> 警告: package 'tibble' was built under R version 4.2.3
#> 警告: package 'tidyr' was built under R version 4.2.3
#> 警告: package 'readr' was built under R version 4.2.3
#> 警告: package 'purrr' was built under R version 4.2.3
#> 警告: package 'dplyr' was built under R version 4.2.3
#> 警告: package 'stringr' was built under R version 4.2.3
#> 警告: package 'forcats' was built under R version 4.2.3
#> 警告: package 'lubridate' was built under R version 4.2.3
# 运行单向ANOVA和Tukey事后检验
# 注意Fair与所有其他切割之间的比较（前4行）
dat_aov <- aov(price ~ cut, data = diamonds)
TukeyHSD(dat_aov)
#>   Tukey multiple comparisons of means
#>     95% family-wise confidence level
#> 
#> Fit: aov(formula = price ~ cut, data = diamonds)
#> 
#> $cut
#>                          diff         lwr        upr     p adj
#> Good-Fair          -429.89331  -740.44880  -119.3378 0.0014980
#> Very Good-Fair     -376.99787  -663.86215   -90.1336 0.0031094
#> Premium-Fair        225.49994   -59.26664   510.2665 0.1950425
#> Ideal-Fair         -901.21579 -1180.57139  -621.8602 0.0000000
#> Very Good-Good       52.89544  -130.15186   235.9427 0.9341158
#> Premium-Good        655.39325   475.65120   835.1353 0.0000000
#> Ideal-Good         -471.32248  -642.36268  -300.2823 0.0000000
#> Premium-Very Good   602.49781   467.76249   737.2331 0.0000000
#> Ideal-Very Good    -524.21792  -647.10467  -401.3312 0.0000000
#> Ideal-Premium     -1126.71573 -1244.62267 -1008.8088 0.0000000
# 制作一个箱线图，显示diamond cut与Fair切割的比较的p值
# 请注意，从TukeyHSD中，Premium-Fair的p值为0.195，但下面的t.test是0.019
diamonds %>%
  ggplot(aes(x = fct_rev(cut), y = price)) +
  geom_boxplot() +
  coord_flip() +
  xlab('Diamond Cut') +
  stat_compare_means(method = "t.test",
                     comparisons = list(c('Fair','Good'),
                                        c('Fair','Very Good'),
                                        c('Fair','Premium'),
                                        c('Fair','Ideal')),
                     label = "p.format",
                     tip.length = 0) +
  theme_bw()

盒须图 t 检验的显著性与方差分析显著性不同。

^{创建于2023年6月25日，使用reprex v2.0.2}

英文:

I have run a one-way ANOVA and turkey post-hoc analysis using the aov() and TukeyHSD() functions. I want to display the p-values (i.e., p adj) associated with the TukeyHSD output on a boxplot. The only way I know how is with the stat_compare_means() function from the ggpubr package.

The issue is that I can only run t-tests to get p-values with the stat_compare_means() function and this returns different p-values compared to the TukeyHSD output. How can I make the boxplots show the p adj values from the TukeyHSD output?

Note: For the example below, I am only interested in comparing the mean price of diamond cuts Fair with all other diamond cuts (e.g., Good, Very Good, Premium, and Ideal).

library(ggplot2)
#&gt; Warning: package &#39;ggplot2&#39; was built under R version 4.2.3
library(ggpubr)
library(tidyverse)
#&gt; Warning: package &#39;tidyverse&#39; was built under R version 4.2.3
#&gt; Warning: package &#39;tibble&#39; was built under R version 4.2.3
#&gt; Warning: package &#39;tidyr&#39; was built under R version 4.2.3
#&gt; Warning: package &#39;readr&#39; was built under R version 4.2.3
#&gt; Warning: package &#39;purrr&#39; was built under R version 4.2.3
#&gt; Warning: package &#39;dplyr&#39; was built under R version 4.2.3
#&gt; Warning: package &#39;stringr&#39; was built under R version 4.2.3
#&gt; Warning: package &#39;forcats&#39; was built under R version 4.2.3
#&gt; Warning: package &#39;lubridate&#39; was built under R version 4.2.3
# Run the one-way ANOVA and Tukey post-hoc test
# Note the cut comparisons between Fair and all others (first 4 rows)
dat_aov &lt;- aov(price~cut, data = diamonds)
TukeyHSD(dat_aov)
#&gt;   Tukey multiple comparisons of means
#&gt;     95% family-wise confidence level
#&gt; 
#&gt; Fit: aov(formula = price ~ cut, data = diamonds)
#&gt; 
#&gt; $cut
#&gt;                          diff         lwr        upr     p adj
#&gt; Good-Fair          -429.89331  -740.44880  -119.3378 0.0014980
#&gt; Very Good-Fair     -376.99787  -663.86215   -90.1336 0.0031094
#&gt; Premium-Fair        225.49994   -59.26664   510.2665 0.1950425
#&gt; Ideal-Fair         -901.21579 -1180.57139  -621.8602 0.0000000
#&gt; Very Good-Good       52.89544  -130.15186   235.9427 0.9341158
#&gt; Premium-Good        655.39325   475.65120   835.1353 0.0000000
#&gt; Ideal-Good         -471.32248  -642.36268  -300.2823 0.0000000
#&gt; Premium-Very Good   602.49781   467.76249   737.2331 0.0000000
#&gt; Ideal-Very Good    -524.21792  -647.10467  -401.3312 0.0000000
#&gt; Ideal-Premium     -1126.71573 -1244.62267 -1008.8088 0.0000000
# Make a boxplot that displays the p-values of diamond cut
# comparisons with cut Fair
# Note that from TukeyHSD, the Premium-Fair p-val is 0.195 but
# using t.test below it is 0.019
diamonds %&gt;%
  ggplot(aes(x = fct_rev(cut), y = price)) +
  geom_boxplot() +
  coord_flip() +
  xlab(&#39;Diamond Cut&#39;) +
  stat_compare_means(method = &quot;t.test&quot;,
                     comparisons = list(c(&#39;Fair&#39;,&#39;Good&#39;),
                                        c(&#39;Fair&#39;,&#39;Very Good&#39;),
                                        c(&#39;Fair&#39;,&#39;Premium&#39;),
                                        c(&#39;Fair&#39;,&#39;Ideal&#39;)),
                     label = &quot;p.format&quot;,
                     tip.length = 0) +
  theme_bw()

盒须图 t 检验的显著性与方差分析显著性不同。

Created on 2023-06-25 with reprex v2.0.2

答案1

得分: 4

你可以使用基本的 boxplot。

data('diamonds', package='ggplot2')
tuk <- TukeyHSD(aov(price ~ cut, data=diamonds))
par(mar=c(4, 7, 4, 2)+.1)
b <- boxplot(price ~ cut, data=diamonds, horizontal=TRUE, col=0, pch=20, cex=.8,
        las=1, ylab='', ylim=c(0, max(diamonds$price)*1.5), border='grey15')
mtext('Diamonds cut', 2, 6)
mx <- max(diamonds$price)
for (i in sq <- seq_along(b$names)[-length(b$names)]) segments(mx + 2e3*i, 5, mx + 2e3*i, 5 - i)
text(mx + 2e3*sq + 750, seq.int(4.5, by=-.5, length.out=4), 
     signif(tuk$cut[1:4, 4], 3), adj=.5, srt=270, cex=.9, col=c('red', 'blue')[(tuk$cut < .05) + 1])

英文:

You can use base boxplots.

data(&#39;diamonds&#39;, package=&#39;ggplot2&#39;)
tuk &lt;- TukeyHSD(aov(price ~ cut, data=diamonds))
par(mar=c(4, 7, 4, 2)+.1)
b &lt;- boxplot(price ~ cut, data=diamonds, horizontal=TRUE, col=0, pch=20, cex=.8,
las=1, ylab=&#39;&#39;, ylim=c(0, max(diamonds$price)*1.5), border=&#39;grey15&#39;)
mtext(&#39;Diamonds cut&#39;, 2, 6)
mx &lt;- max(diamonds$price)
for (i in sq &lt;- seq_along(b$names)[-length(b$names)]) segments(mx + 2e3*i, 5, mx + 2e3*i, 5 - i)
text(mx + 2e3*sq + 750, seq.int(4.5, by=-.5, length.out=4), 
signif(tuk$cut[1:4, 4], 3), adj=.5, srt=270, cex=.9, col=c(&#39;red&#39;, &#39;blue&#39;)[(tuk$cut &lt; .05) + 1])

答案2

得分: 0

这是一个使用ggplot2的解决方法，利用了annotate()函数。

library(ggplot2)
#&gt; 警告: package &#39;ggplot2&#39; was built under R version 4.2.3
library(ggpubr)
library(tidyverse)
#&gt; 警告: package &#39;tidyverse&#39; was built under R version 4.2.3
#&gt; 警告: package &#39;tibble&#39; was built under R version 4.2.3
#&gt; 警告: package &#39;tidyr&#39; was built under R version 4.2.3
#&gt; 警告: package &#39;readr&#39; was built under R version 4.2.3
#&gt; 警告: package &#39;purrr&#39; was built under R version 4.2.3
#&gt; 警告: package &#39;dplyr&#39; was built under R version 4.2.3
#&gt; 警告: package &#39;stringr&#39; was built under R version 4.2.3
#&gt; 警告: package &#39;forcats&#39; was built under R version 4.2.3
#&gt; 警告: package &#39;lubridate&#39; was built under R version 4.2.3
# 运行单因素方差分析和Tukey事后检验
# 注意在“Fair”和其他所有类别之间的比较（前4行）
dat_aov &lt;- aov(price~cut, data = diamonds)
TukeyHSD(dat_aov)
#&gt;   Tukey multiple comparisons of means
#&gt;     95% family-wise confidence level
#&gt; 
#&gt; Fit: aov(formula = price ~ cut, data = diamonds)
#&gt; 
#&gt; $cut
#&gt;                          diff         lwr        upr     p adj
#&gt; Good-Fair          -429.89331  -740.44880  -119.3378 0.0014980
#&gt; Very Good-Fair     -376.99787  -663.86215   -90.1336 0.0031094
#&gt; Premium-Fair        225.49994   -59.26664   510.2665 0.1950425
#&gt; Ideal-Fair         -901.21579 -1180.57139  -621.8602 0.0000000
#&gt; Very Good-Good       52.89544  -130.15186   235.9427 0.9341158
#&gt; Premium-Good        655.39325   475.65120   835.1353 0.0000000
#&gt; Ideal-Good         -471.32248  -642.36268  -300.2823 0.0000000
#&gt; Premium-Very Good   602.49781   467.76249   737.2331 0.0000000
#&gt; Ideal-Very Good    -524.21792  -647.10467  -401.3312 0.0000000
#&gt; Ideal-Premium     -1126.71573 -1244.62267 -1008.8088 0.0000000
# 制作一个箱线图，显示与“Fair”类别的钻石切割的p值比较
diamonds %&gt;%
  ggplot(aes(x = fct_rev(cut), y = price)) +
  geom_boxplot() +
  coord_flip(ylim = c(0,30000)) +
  xlab('Diamond Cut') +
  annotate('segment', x = c(5,5,5,5), xend = c(4,3,2,1), y = c(20000,23000,26000,29000), yend = c(20000,23000,26000,29000), color = 'black') +
  annotate('text', x = c(4.5,4,3.5,3), y = c(20900,23900,26900,29900),label = c('0.001','0.003','> 0.05', '< 0.001'), color = 'black', angle = -90) +
  theme_bw()

盒须图 t 检验的显著性与方差分析显著性不同。

创建于2023年06月26日，使用reprex v2.0.2

英文:

Here is a workaround using ggplot2 that takes advantage of the annotate() function.

library(ggplot2)
#&gt; Warning: package &#39;ggplot2&#39; was built under R version 4.2.3
library(ggpubr)
library(tidyverse)
#&gt; Warning: package &#39;tidyverse&#39; was built under R version 4.2.3
#&gt; Warning: package &#39;tibble&#39; was built under R version 4.2.3
#&gt; Warning: package &#39;tidyr&#39; was built under R version 4.2.3
#&gt; Warning: package &#39;readr&#39; was built under R version 4.2.3
#&gt; Warning: package &#39;purrr&#39; was built under R version 4.2.3
#&gt; Warning: package &#39;dplyr&#39; was built under R version 4.2.3
#&gt; Warning: package &#39;stringr&#39; was built under R version 4.2.3
#&gt; Warning: package &#39;forcats&#39; was built under R version 4.2.3
#&gt; Warning: package &#39;lubridate&#39; was built under R version 4.2.3
# Run the one-way ANOVA and Tukey post-hoc test
# Note the cut comparisons between Fair and all others (first 4 rows)
dat_aov &lt;- aov(price~cut, data = diamonds)
TukeyHSD(dat_aov)
#&gt;   Tukey multiple comparisons of means
#&gt;     95% family-wise confidence level
#&gt; 
#&gt; Fit: aov(formula = price ~ cut, data = diamonds)
#&gt; 
#&gt; $cut
#&gt;                          diff         lwr        upr     p adj
#&gt; Good-Fair          -429.89331  -740.44880  -119.3378 0.0014980
#&gt; Very Good-Fair     -376.99787  -663.86215   -90.1336 0.0031094
#&gt; Premium-Fair        225.49994   -59.26664   510.2665 0.1950425
#&gt; Ideal-Fair         -901.21579 -1180.57139  -621.8602 0.0000000
#&gt; Very Good-Good       52.89544  -130.15186   235.9427 0.9341158
#&gt; Premium-Good        655.39325   475.65120   835.1353 0.0000000
#&gt; Ideal-Good         -471.32248  -642.36268  -300.2823 0.0000000
#&gt; Premium-Very Good   602.49781   467.76249   737.2331 0.0000000
#&gt; Ideal-Very Good    -524.21792  -647.10467  -401.3312 0.0000000
#&gt; Ideal-Premium     -1126.71573 -1244.62267 -1008.8088 0.0000000
# Make a boxplot that displays the p-values of diamond cut
# comparisons with cut Fair
diamonds %&gt;%
  ggplot(aes(x = fct_rev(cut), y = price)) +
  geom_boxplot() +
  coord_flip(ylim = c(0,30000)) +
  xlab(&#39;Diamond Cut&#39;) +
  annotate(&#39;segment&#39;, x = c(5,5,5,5), xend = c(4,3,2,1), y = c(20000,23000,26000,29000), yend = c(20000,23000,26000,29000), color = &#39;black&#39;) +
  annotate(&#39;text&#39;, x = c(4.5,4,3.5,3), y = c(20900,23900,26900,29900),label = c(&#39;0.001&#39;,&#39;0.003&#39;,&#39;&gt; 0.05&#39;, &#39;&lt; 0.001&#39;), color = &#39;black&#39;, angle = -90) +
  theme_bw()

盒须图 t 检验的显著性与方差分析显著性不同。

Created on 2023-06-26 with reprex v2.0.2

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

盒须图 t 检验的显著性与方差分析显著性不同。

问题

答案1

答案2

Is there a way to select one string value that contains the latest date relative to the values in the same list

你可以使用R语言如何通过通用名称获取科学名称？

连接类别时间序列的相邻点 – ggplot

使用ifelse改变我的计算结果。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

发表评论