盒须图 t 检验的显著性与方差分析显著性不同。

huangapple go评论94阅读模式
英文:

Boxplot t-test significance different than ANOVA significance

问题

我已运行单向ANOVA和Tukey事后分析,使用aov()TukeyHSD()函数。我想要在箱线图上显示与TukeyHSD输出相关的p值(即p adj)。我唯一知道的方法是使用ggpubr包中的stat_compare_means()函数。

问题在于,我只能使用stat_compare_means()函数运行t检验来获取p值,而这与TukeyHSD输出返回的p值不同。如何使箱线图显示TukeyHSD输出的p adj值?

注意:对于下面的示例,我只对比Fair切割的平均价格与所有其他切割的平均价格(例如,Good,Very Good,Premium和Ideal)。

  1. library(ggplot2)
  2. #> 警告: package 'ggplot2' was built under R version 4.2.3
  3. library(ggpubr)
  4. library(tidyverse)
  5. #> 警告: package 'tidyverse' was built under R version 4.2.3
  6. #> 警告: package 'tibble' was built under R version 4.2.3
  7. #> 警告: package 'tidyr' was built under R version 4.2.3
  8. #> 警告: package 'readr' was built under R version 4.2.3
  9. #> 警告: package 'purrr' was built under R version 4.2.3
  10. #> 警告: package 'dplyr' was built under R version 4.2.3
  11. #> 警告: package 'stringr' was built under R version 4.2.3
  12. #> 警告: package 'forcats' was built under R version 4.2.3
  13. #> 警告: package 'lubridate' was built under R version 4.2.3
  14. # 运行单向ANOVA和Tukey事后检验
  15. # 注意Fair与所有其他切割之间的比较(前4行)
  16. dat_aov <- aov(price ~ cut, data = diamonds)
  17. TukeyHSD(dat_aov)
  18. #> Tukey multiple comparisons of means
  19. #> 95% family-wise confidence level
  20. #>
  21. #> Fit: aov(formula = price ~ cut, data = diamonds)
  22. #>
  23. #> $cut
  24. #> diff lwr upr p adj
  25. #> Good-Fair -429.89331 -740.44880 -119.3378 0.0014980
  26. #> Very Good-Fair -376.99787 -663.86215 -90.1336 0.0031094
  27. #> Premium-Fair 225.49994 -59.26664 510.2665 0.1950425
  28. #> Ideal-Fair -901.21579 -1180.57139 -621.8602 0.0000000
  29. #> Very Good-Good 52.89544 -130.15186 235.9427 0.9341158
  30. #> Premium-Good 655.39325 475.65120 835.1353 0.0000000
  31. #> Ideal-Good -471.32248 -642.36268 -300.2823 0.0000000
  32. #> Premium-Very Good 602.49781 467.76249 737.2331 0.0000000
  33. #> Ideal-Very Good -524.21792 -647.10467 -401.3312 0.0000000
  34. #> Ideal-Premium -1126.71573 -1244.62267 -1008.8088 0.0000000
  35. # 制作一个箱线图,显示diamond cut与Fair切割的比较的p值
  36. # 请注意,从TukeyHSD中,Premium-Fair的p值为0.195,但下面的t.test是0.019
  37. diamonds %>%
  38. ggplot(aes(x = fct_rev(cut), y = price)) +
  39. geom_boxplot() +
  40. coord_flip() +
  41. xlab('Diamond Cut') +
  42. stat_compare_means(method = "t.test",
  43. comparisons = list(c('Fair','Good'),
  44. c('Fair','Very Good'),
  45. c('Fair','Premium'),
  46. c('Fair','Ideal')),
  47. label = "p.format",
  48. tip.length = 0) +
  49. theme_bw()

盒须图 t 检验的显著性与方差分析显著性不同。

创建于2023年6月25日,使用reprex v2.0.2

英文:

I have run a one-way ANOVA and turkey post-hoc analysis using the aov() and TukeyHSD() functions. I want to display the p-values (i.e., p adj) associated with the TukeyHSD output on a boxplot. The only way I know how is with the stat_compare_means() function from the ggpubr package.

The issue is that I can only run t-tests to get p-values with the stat_compare_means() function and this returns different p-values compared to the TukeyHSD output. How can I make the boxplots show the p adj values from the TukeyHSD output?

Note: For the example below, I am only interested in comparing the mean price of diamond cuts Fair with all other diamond cuts (e.g., Good, Very Good, Premium, and Ideal).

  1. library(ggplot2)
  2. #&gt; Warning: package &#39;ggplot2&#39; was built under R version 4.2.3
  3. library(ggpubr)
  4. library(tidyverse)
  5. #&gt; Warning: package &#39;tidyverse&#39; was built under R version 4.2.3
  6. #&gt; Warning: package &#39;tibble&#39; was built under R version 4.2.3
  7. #&gt; Warning: package &#39;tidyr&#39; was built under R version 4.2.3
  8. #&gt; Warning: package &#39;readr&#39; was built under R version 4.2.3
  9. #&gt; Warning: package &#39;purrr&#39; was built under R version 4.2.3
  10. #&gt; Warning: package &#39;dplyr&#39; was built under R version 4.2.3
  11. #&gt; Warning: package &#39;stringr&#39; was built under R version 4.2.3
  12. #&gt; Warning: package &#39;forcats&#39; was built under R version 4.2.3
  13. #&gt; Warning: package &#39;lubridate&#39; was built under R version 4.2.3
  14. # Run the one-way ANOVA and Tukey post-hoc test
  15. # Note the cut comparisons between Fair and all others (first 4 rows)
  16. dat_aov &lt;- aov(price~cut, data = diamonds)
  17. TukeyHSD(dat_aov)
  18. #&gt; Tukey multiple comparisons of means
  19. #&gt; 95% family-wise confidence level
  20. #&gt;
  21. #&gt; Fit: aov(formula = price ~ cut, data = diamonds)
  22. #&gt;
  23. #&gt; $cut
  24. #&gt; diff lwr upr p adj
  25. #&gt; Good-Fair -429.89331 -740.44880 -119.3378 0.0014980
  26. #&gt; Very Good-Fair -376.99787 -663.86215 -90.1336 0.0031094
  27. #&gt; Premium-Fair 225.49994 -59.26664 510.2665 0.1950425
  28. #&gt; Ideal-Fair -901.21579 -1180.57139 -621.8602 0.0000000
  29. #&gt; Very Good-Good 52.89544 -130.15186 235.9427 0.9341158
  30. #&gt; Premium-Good 655.39325 475.65120 835.1353 0.0000000
  31. #&gt; Ideal-Good -471.32248 -642.36268 -300.2823 0.0000000
  32. #&gt; Premium-Very Good 602.49781 467.76249 737.2331 0.0000000
  33. #&gt; Ideal-Very Good -524.21792 -647.10467 -401.3312 0.0000000
  34. #&gt; Ideal-Premium -1126.71573 -1244.62267 -1008.8088 0.0000000
  35. # Make a boxplot that displays the p-values of diamond cut
  36. # comparisons with cut Fair
  37. # Note that from TukeyHSD, the Premium-Fair p-val is 0.195 but
  38. # using t.test below it is 0.019
  39. diamonds %&gt;%
  40. ggplot(aes(x = fct_rev(cut), y = price)) +
  41. geom_boxplot() +
  42. coord_flip() +
  43. xlab(&#39;Diamond Cut&#39;) +
  44. stat_compare_means(method = &quot;t.test&quot;,
  45. comparisons = list(c(&#39;Fair&#39;,&#39;Good&#39;),
  46. c(&#39;Fair&#39;,&#39;Very Good&#39;),
  47. c(&#39;Fair&#39;,&#39;Premium&#39;),
  48. c(&#39;Fair&#39;,&#39;Ideal&#39;)),
  49. label = &quot;p.format&quot;,
  50. tip.length = 0) +
  51. theme_bw()

盒须图 t 检验的显著性与方差分析显著性不同。

<sup>Created on 2023-06-25 with reprex v2.0.2</sup>

答案1

得分: 4

你可以使用基本的 boxplot

  1. data('diamonds', package='ggplot2')
  2. tuk <- TukeyHSD(aov(price ~ cut, data=diamonds))
  3. par(mar=c(4, 7, 4, 2)+.1)
  4. b <- boxplot(price ~ cut, data=diamonds, horizontal=TRUE, col=0, pch=20, cex=.8,
  5. las=1, ylab='', ylim=c(0, max(diamonds$price)*1.5), border='grey15')
  6. mtext('Diamonds cut', 2, 6)
  7. mx <- max(diamonds$price)
  8. for (i in sq <- seq_along(b$names)[-length(b$names)]) segments(mx + 2e3*i, 5, mx + 2e3*i, 5 - i)
  9. text(mx + 2e3*sq + 750, seq.int(4.5, by=-.5, length.out=4),
  10. signif(tuk$cut[1:4, 4], 3), adj=.5, srt=270, cex=.9, col=c('red', 'blue')[(tuk$cut < .05) + 1])

盒须图 t 检验的显著性与方差分析显著性不同。

英文:

You can use base boxplots.

  1. data(&#39;diamonds&#39;, package=&#39;ggplot2&#39;)
  2. tuk &lt;- TukeyHSD(aov(price ~ cut, data=diamonds))
  3. par(mar=c(4, 7, 4, 2)+.1)
  4. b &lt;- boxplot(price ~ cut, data=diamonds, horizontal=TRUE, col=0, pch=20, cex=.8,
  5. las=1, ylab=&#39;&#39;, ylim=c(0, max(diamonds$price)*1.5), border=&#39;grey15&#39;)
  6. mtext(&#39;Diamonds cut&#39;, 2, 6)
  7. mx &lt;- max(diamonds$price)
  8. for (i in sq &lt;- seq_along(b$names)[-length(b$names)]) segments(mx + 2e3*i, 5, mx + 2e3*i, 5 - i)
  9. text(mx + 2e3*sq + 750, seq.int(4.5, by=-.5, length.out=4),
  10. signif(tuk$cut[1:4, 4], 3), adj=.5, srt=270, cex=.9, col=c(&#39;red&#39;, &#39;blue&#39;)[(tuk$cut &lt; .05) + 1])

盒须图 t 检验的显著性与方差分析显著性不同。

答案2

得分: 0

这是一个使用ggplot2的解决方法,利用了annotate()函数。

  1. library(ggplot2)
  2. #&gt; 警告: package &#39;ggplot2&#39; was built under R version 4.2.3
  3. library(ggpubr)
  4. library(tidyverse)
  5. #&gt; 警告: package &#39;tidyverse&#39; was built under R version 4.2.3
  6. #&gt; 警告: package &#39;tibble&#39; was built under R version 4.2.3
  7. #&gt; 警告: package &#39;tidyr&#39; was built under R version 4.2.3
  8. #&gt; 警告: package &#39;readr&#39; was built under R version 4.2.3
  9. #&gt; 警告: package &#39;purrr&#39; was built under R version 4.2.3
  10. #&gt; 警告: package &#39;dplyr&#39; was built under R version 4.2.3
  11. #&gt; 警告: package &#39;stringr&#39; was built under R version 4.2.3
  12. #&gt; 警告: package &#39;forcats&#39; was built under R version 4.2.3
  13. #&gt; 警告: package &#39;lubridate&#39; was built under R version 4.2.3
  14. # 运行单因素方差分析和Tukey事后检验
  15. # 注意在“Fair”和其他所有类别之间的比较(前4行)
  16. dat_aov &lt;- aov(price~cut, data = diamonds)
  17. TukeyHSD(dat_aov)
  18. #&gt; Tukey multiple comparisons of means
  19. #&gt; 95% family-wise confidence level
  20. #&gt;
  21. #&gt; Fit: aov(formula = price ~ cut, data = diamonds)
  22. #&gt;
  23. #&gt; $cut
  24. #&gt; diff lwr upr p adj
  25. #&gt; Good-Fair -429.89331 -740.44880 -119.3378 0.0014980
  26. #&gt; Very Good-Fair -376.99787 -663.86215 -90.1336 0.0031094
  27. #&gt; Premium-Fair 225.49994 -59.26664 510.2665 0.1950425
  28. #&gt; Ideal-Fair -901.21579 -1180.57139 -621.8602 0.0000000
  29. #&gt; Very Good-Good 52.89544 -130.15186 235.9427 0.9341158
  30. #&gt; Premium-Good 655.39325 475.65120 835.1353 0.0000000
  31. #&gt; Ideal-Good -471.32248 -642.36268 -300.2823 0.0000000
  32. #&gt; Premium-Very Good 602.49781 467.76249 737.2331 0.0000000
  33. #&gt; Ideal-Very Good -524.21792 -647.10467 -401.3312 0.0000000
  34. #&gt; Ideal-Premium -1126.71573 -1244.62267 -1008.8088 0.0000000
  35. # 制作一个箱线图,显示与“Fair”类别的钻石切割的p值比较
  36. diamonds %&gt;%
  37. ggplot(aes(x = fct_rev(cut), y = price)) +
  38. geom_boxplot() +
  39. coord_flip(ylim = c(0,30000)) +
  40. xlab('Diamond Cut') +
  41. annotate('segment', x = c(5,5,5,5), xend = c(4,3,2,1), y = c(20000,23000,26000,29000), yend = c(20000,23000,26000,29000), color = 'black') +
  42. annotate('text', x = c(4.5,4,3.5,3), y = c(20900,23900,26900,29900),label = c('0.001','0.003','> 0.05', '< 0.001'), color = 'black', angle = -90) +
  43. theme_bw()

盒须图 t 检验的显著性与方差分析显著性不同。

<sup>创建于2023年06月26日,使用reprex v2.0.2</sup>

英文:

Here is a workaround using ggplot2 that takes advantage of the annotate() function.

  1. library(ggplot2)
  2. #&gt; Warning: package &#39;ggplot2&#39; was built under R version 4.2.3
  3. library(ggpubr)
  4. library(tidyverse)
  5. #&gt; Warning: package &#39;tidyverse&#39; was built under R version 4.2.3
  6. #&gt; Warning: package &#39;tibble&#39; was built under R version 4.2.3
  7. #&gt; Warning: package &#39;tidyr&#39; was built under R version 4.2.3
  8. #&gt; Warning: package &#39;readr&#39; was built under R version 4.2.3
  9. #&gt; Warning: package &#39;purrr&#39; was built under R version 4.2.3
  10. #&gt; Warning: package &#39;dplyr&#39; was built under R version 4.2.3
  11. #&gt; Warning: package &#39;stringr&#39; was built under R version 4.2.3
  12. #&gt; Warning: package &#39;forcats&#39; was built under R version 4.2.3
  13. #&gt; Warning: package &#39;lubridate&#39; was built under R version 4.2.3
  14. # Run the one-way ANOVA and Tukey post-hoc test
  15. # Note the cut comparisons between Fair and all others (first 4 rows)
  16. dat_aov &lt;- aov(price~cut, data = diamonds)
  17. TukeyHSD(dat_aov)
  18. #&gt; Tukey multiple comparisons of means
  19. #&gt; 95% family-wise confidence level
  20. #&gt;
  21. #&gt; Fit: aov(formula = price ~ cut, data = diamonds)
  22. #&gt;
  23. #&gt; $cut
  24. #&gt; diff lwr upr p adj
  25. #&gt; Good-Fair -429.89331 -740.44880 -119.3378 0.0014980
  26. #&gt; Very Good-Fair -376.99787 -663.86215 -90.1336 0.0031094
  27. #&gt; Premium-Fair 225.49994 -59.26664 510.2665 0.1950425
  28. #&gt; Ideal-Fair -901.21579 -1180.57139 -621.8602 0.0000000
  29. #&gt; Very Good-Good 52.89544 -130.15186 235.9427 0.9341158
  30. #&gt; Premium-Good 655.39325 475.65120 835.1353 0.0000000
  31. #&gt; Ideal-Good -471.32248 -642.36268 -300.2823 0.0000000
  32. #&gt; Premium-Very Good 602.49781 467.76249 737.2331 0.0000000
  33. #&gt; Ideal-Very Good -524.21792 -647.10467 -401.3312 0.0000000
  34. #&gt; Ideal-Premium -1126.71573 -1244.62267 -1008.8088 0.0000000
  35. # Make a boxplot that displays the p-values of diamond cut
  36. # comparisons with cut Fair
  37. diamonds %&gt;%
  38. ggplot(aes(x = fct_rev(cut), y = price)) +
  39. geom_boxplot() +
  40. coord_flip(ylim = c(0,30000)) +
  41. xlab(&#39;Diamond Cut&#39;) +
  42. annotate(&#39;segment&#39;, x = c(5,5,5,5), xend = c(4,3,2,1), y = c(20000,23000,26000,29000), yend = c(20000,23000,26000,29000), color = &#39;black&#39;) +
  43. annotate(&#39;text&#39;, x = c(4.5,4,3.5,3), y = c(20900,23900,26900,29900),label = c(&#39;0.001&#39;,&#39;0.003&#39;,&#39;&gt; 0.05&#39;, &#39;&lt; 0.001&#39;), color = &#39;black&#39;, angle = -90) +
  44. theme_bw()

盒须图 t 检验的显著性与方差分析显著性不同。

<sup>Created on 2023-06-26 with reprex v2.0.2</sup>

huangapple
  • 本文由 发表于 2023年6月26日 08:18:46
  • 转载请务必保留本文链接:https://go.coder-hub.com/76552898.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定