英文:
Boxplot t-test significance different than ANOVA significance
问题
我已运行单向ANOVA和Tukey事后分析,使用aov()
和TukeyHSD()
函数。我想要在箱线图上显示与TukeyHSD
输出相关的p值(即p adj
)。我唯一知道的方法是使用ggpubr
包中的stat_compare_means()
函数。
问题在于,我只能使用stat_compare_means()
函数运行t检验来获取p值,而这与TukeyHSD
输出返回的p值不同。如何使箱线图显示TukeyHSD
输出的p adj
值?
注意:对于下面的示例,我只对比Fair切割的平均价格与所有其他切割的平均价格(例如,Good,Very Good,Premium和Ideal)。
library(ggplot2)
#> 警告: package 'ggplot2' was built under R version 4.2.3
library(ggpubr)
library(tidyverse)
#> 警告: package 'tidyverse' was built under R version 4.2.3
#> 警告: package 'tibble' was built under R version 4.2.3
#> 警告: package 'tidyr' was built under R version 4.2.3
#> 警告: package 'readr' was built under R version 4.2.3
#> 警告: package 'purrr' was built under R version 4.2.3
#> 警告: package 'dplyr' was built under R version 4.2.3
#> 警告: package 'stringr' was built under R version 4.2.3
#> 警告: package 'forcats' was built under R version 4.2.3
#> 警告: package 'lubridate' was built under R version 4.2.3
# 运行单向ANOVA和Tukey事后检验
# 注意Fair与所有其他切割之间的比较(前4行)
dat_aov <- aov(price ~ cut, data = diamonds)
TukeyHSD(dat_aov)
#> Tukey multiple comparisons of means
#> 95% family-wise confidence level
#>
#> Fit: aov(formula = price ~ cut, data = diamonds)
#>
#> $cut
#> diff lwr upr p adj
#> Good-Fair -429.89331 -740.44880 -119.3378 0.0014980
#> Very Good-Fair -376.99787 -663.86215 -90.1336 0.0031094
#> Premium-Fair 225.49994 -59.26664 510.2665 0.1950425
#> Ideal-Fair -901.21579 -1180.57139 -621.8602 0.0000000
#> Very Good-Good 52.89544 -130.15186 235.9427 0.9341158
#> Premium-Good 655.39325 475.65120 835.1353 0.0000000
#> Ideal-Good -471.32248 -642.36268 -300.2823 0.0000000
#> Premium-Very Good 602.49781 467.76249 737.2331 0.0000000
#> Ideal-Very Good -524.21792 -647.10467 -401.3312 0.0000000
#> Ideal-Premium -1126.71573 -1244.62267 -1008.8088 0.0000000
# 制作一个箱线图,显示diamond cut与Fair切割的比较的p值
# 请注意,从TukeyHSD中,Premium-Fair的p值为0.195,但下面的t.test是0.019
diamonds %>%
ggplot(aes(x = fct_rev(cut), y = price)) +
geom_boxplot() +
coord_flip() +
xlab('Diamond Cut') +
stat_compare_means(method = "t.test",
comparisons = list(c('Fair','Good'),
c('Fair','Very Good'),
c('Fair','Premium'),
c('Fair','Ideal')),
label = "p.format",
tip.length = 0) +
theme_bw()
创建于2023年6月25日,使用reprex v2.0.2
英文:
I have run a one-way ANOVA and turkey post-hoc analysis using the aov()
and TukeyHSD()
functions. I want to display the p-values (i.e., p adj
) associated with the TukeyHSD
output on a boxplot. The only way I know how is with the stat_compare_means()
function from the ggpubr
package.
The issue is that I can only run t-tests to get p-values with the stat_compare_means()
function and this returns different p-values compared to the TukeyHSD
output. How can I make the boxplots show the p adj
values from the TukeyHSD
output?
Note: For the example below, I am only interested in comparing the mean price of diamond cuts Fair with all other diamond cuts (e.g., Good, Very Good, Premium, and Ideal).
library(ggplot2)
#> Warning: package 'ggplot2' was built under R version 4.2.3
library(ggpubr)
library(tidyverse)
#> Warning: package 'tidyverse' was built under R version 4.2.3
#> Warning: package 'tibble' was built under R version 4.2.3
#> Warning: package 'tidyr' was built under R version 4.2.3
#> Warning: package 'readr' was built under R version 4.2.3
#> Warning: package 'purrr' was built under R version 4.2.3
#> Warning: package 'dplyr' was built under R version 4.2.3
#> Warning: package 'stringr' was built under R version 4.2.3
#> Warning: package 'forcats' was built under R version 4.2.3
#> Warning: package 'lubridate' was built under R version 4.2.3
# Run the one-way ANOVA and Tukey post-hoc test
# Note the cut comparisons between Fair and all others (first 4 rows)
dat_aov <- aov(price~cut, data = diamonds)
TukeyHSD(dat_aov)
#> Tukey multiple comparisons of means
#> 95% family-wise confidence level
#>
#> Fit: aov(formula = price ~ cut, data = diamonds)
#>
#> $cut
#> diff lwr upr p adj
#> Good-Fair -429.89331 -740.44880 -119.3378 0.0014980
#> Very Good-Fair -376.99787 -663.86215 -90.1336 0.0031094
#> Premium-Fair 225.49994 -59.26664 510.2665 0.1950425
#> Ideal-Fair -901.21579 -1180.57139 -621.8602 0.0000000
#> Very Good-Good 52.89544 -130.15186 235.9427 0.9341158
#> Premium-Good 655.39325 475.65120 835.1353 0.0000000
#> Ideal-Good -471.32248 -642.36268 -300.2823 0.0000000
#> Premium-Very Good 602.49781 467.76249 737.2331 0.0000000
#> Ideal-Very Good -524.21792 -647.10467 -401.3312 0.0000000
#> Ideal-Premium -1126.71573 -1244.62267 -1008.8088 0.0000000
# Make a boxplot that displays the p-values of diamond cut
# comparisons with cut Fair
# Note that from TukeyHSD, the Premium-Fair p-val is 0.195 but
# using t.test below it is 0.019
diamonds %>%
ggplot(aes(x = fct_rev(cut), y = price)) +
geom_boxplot() +
coord_flip() +
xlab('Diamond Cut') +
stat_compare_means(method = "t.test",
comparisons = list(c('Fair','Good'),
c('Fair','Very Good'),
c('Fair','Premium'),
c('Fair','Ideal')),
label = "p.format",
tip.length = 0) +
theme_bw()
<sup>Created on 2023-06-25 with reprex v2.0.2</sup>
答案1
得分: 4
你可以使用基本的 boxplot
。
data('diamonds', package='ggplot2')
tuk <- TukeyHSD(aov(price ~ cut, data=diamonds))
par(mar=c(4, 7, 4, 2)+.1)
b <- boxplot(price ~ cut, data=diamonds, horizontal=TRUE, col=0, pch=20, cex=.8,
las=1, ylab='', ylim=c(0, max(diamonds$price)*1.5), border='grey15')
mtext('Diamonds cut', 2, 6)
mx <- max(diamonds$price)
for (i in sq <- seq_along(b$names)[-length(b$names)]) segments(mx + 2e3*i, 5, mx + 2e3*i, 5 - i)
text(mx + 2e3*sq + 750, seq.int(4.5, by=-.5, length.out=4),
signif(tuk$cut[1:4, 4], 3), adj=.5, srt=270, cex=.9, col=c('red', 'blue')[(tuk$cut < .05) + 1])
英文:
You can use base boxplot
s.
data('diamonds', package='ggplot2')
tuk <- TukeyHSD(aov(price ~ cut, data=diamonds))
par(mar=c(4, 7, 4, 2)+.1)
b <- boxplot(price ~ cut, data=diamonds, horizontal=TRUE, col=0, pch=20, cex=.8,
las=1, ylab='', ylim=c(0, max(diamonds$price)*1.5), border='grey15')
mtext('Diamonds cut', 2, 6)
mx <- max(diamonds$price)
for (i in sq <- seq_along(b$names)[-length(b$names)]) segments(mx + 2e3*i, 5, mx + 2e3*i, 5 - i)
text(mx + 2e3*sq + 750, seq.int(4.5, by=-.5, length.out=4),
signif(tuk$cut[1:4, 4], 3), adj=.5, srt=270, cex=.9, col=c('red', 'blue')[(tuk$cut < .05) + 1])
答案2
得分: 0
这是一个使用ggplot2
的解决方法,利用了annotate()
函数。
library(ggplot2)
#> 警告: package 'ggplot2' was built under R version 4.2.3
library(ggpubr)
library(tidyverse)
#> 警告: package 'tidyverse' was built under R version 4.2.3
#> 警告: package 'tibble' was built under R version 4.2.3
#> 警告: package 'tidyr' was built under R version 4.2.3
#> 警告: package 'readr' was built under R version 4.2.3
#> 警告: package 'purrr' was built under R version 4.2.3
#> 警告: package 'dplyr' was built under R version 4.2.3
#> 警告: package 'stringr' was built under R version 4.2.3
#> 警告: package 'forcats' was built under R version 4.2.3
#> 警告: package 'lubridate' was built under R version 4.2.3
# 运行单因素方差分析和Tukey事后检验
# 注意在“Fair”和其他所有类别之间的比较(前4行)
dat_aov <- aov(price~cut, data = diamonds)
TukeyHSD(dat_aov)
#> Tukey multiple comparisons of means
#> 95% family-wise confidence level
#>
#> Fit: aov(formula = price ~ cut, data = diamonds)
#>
#> $cut
#> diff lwr upr p adj
#> Good-Fair -429.89331 -740.44880 -119.3378 0.0014980
#> Very Good-Fair -376.99787 -663.86215 -90.1336 0.0031094
#> Premium-Fair 225.49994 -59.26664 510.2665 0.1950425
#> Ideal-Fair -901.21579 -1180.57139 -621.8602 0.0000000
#> Very Good-Good 52.89544 -130.15186 235.9427 0.9341158
#> Premium-Good 655.39325 475.65120 835.1353 0.0000000
#> Ideal-Good -471.32248 -642.36268 -300.2823 0.0000000
#> Premium-Very Good 602.49781 467.76249 737.2331 0.0000000
#> Ideal-Very Good -524.21792 -647.10467 -401.3312 0.0000000
#> Ideal-Premium -1126.71573 -1244.62267 -1008.8088 0.0000000
# 制作一个箱线图,显示与“Fair”类别的钻石切割的p值比较
diamonds %>%
ggplot(aes(x = fct_rev(cut), y = price)) +
geom_boxplot() +
coord_flip(ylim = c(0,30000)) +
xlab('Diamond Cut') +
annotate('segment', x = c(5,5,5,5), xend = c(4,3,2,1), y = c(20000,23000,26000,29000), yend = c(20000,23000,26000,29000), color = 'black') +
annotate('text', x = c(4.5,4,3.5,3), y = c(20900,23900,26900,29900),label = c('0.001','0.003','> 0.05', '< 0.001'), color = 'black', angle = -90) +
theme_bw()
<sup>创建于2023年06月26日,使用reprex v2.0.2</sup>
英文:
Here is a workaround using ggplot2
that takes advantage of the annotate()
function.
library(ggplot2)
#> Warning: package 'ggplot2' was built under R version 4.2.3
library(ggpubr)
library(tidyverse)
#> Warning: package 'tidyverse' was built under R version 4.2.3
#> Warning: package 'tibble' was built under R version 4.2.3
#> Warning: package 'tidyr' was built under R version 4.2.3
#> Warning: package 'readr' was built under R version 4.2.3
#> Warning: package 'purrr' was built under R version 4.2.3
#> Warning: package 'dplyr' was built under R version 4.2.3
#> Warning: package 'stringr' was built under R version 4.2.3
#> Warning: package 'forcats' was built under R version 4.2.3
#> Warning: package 'lubridate' was built under R version 4.2.3
# Run the one-way ANOVA and Tukey post-hoc test
# Note the cut comparisons between Fair and all others (first 4 rows)
dat_aov <- aov(price~cut, data = diamonds)
TukeyHSD(dat_aov)
#> Tukey multiple comparisons of means
#> 95% family-wise confidence level
#>
#> Fit: aov(formula = price ~ cut, data = diamonds)
#>
#> $cut
#> diff lwr upr p adj
#> Good-Fair -429.89331 -740.44880 -119.3378 0.0014980
#> Very Good-Fair -376.99787 -663.86215 -90.1336 0.0031094
#> Premium-Fair 225.49994 -59.26664 510.2665 0.1950425
#> Ideal-Fair -901.21579 -1180.57139 -621.8602 0.0000000
#> Very Good-Good 52.89544 -130.15186 235.9427 0.9341158
#> Premium-Good 655.39325 475.65120 835.1353 0.0000000
#> Ideal-Good -471.32248 -642.36268 -300.2823 0.0000000
#> Premium-Very Good 602.49781 467.76249 737.2331 0.0000000
#> Ideal-Very Good -524.21792 -647.10467 -401.3312 0.0000000
#> Ideal-Premium -1126.71573 -1244.62267 -1008.8088 0.0000000
# Make a boxplot that displays the p-values of diamond cut
# comparisons with cut Fair
diamonds %>%
ggplot(aes(x = fct_rev(cut), y = price)) +
geom_boxplot() +
coord_flip(ylim = c(0,30000)) +
xlab('Diamond Cut') +
annotate('segment', x = c(5,5,5,5), xend = c(4,3,2,1), y = c(20000,23000,26000,29000), yend = c(20000,23000,26000,29000), color = 'black') +
annotate('text', x = c(4.5,4,3.5,3), y = c(20900,23900,26900,29900),label = c('0.001','0.003','> 0.05', '< 0.001'), color = 'black', angle = -90) +
theme_bw()
<sup>Created on 2023-06-26 with reprex v2.0.2</sup>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论