A ggplot solution for plotting a Tukey test result from the agricolae package in correct (numerical) order on the x axis?

huangapple go评论81阅读模式
英文:

A ggplot solution for plotting a Tukey test result from the agricolae package in correct (numerical) order on the x axis?

问题

I am plotting some plant and soil data along a transect covering the sampling years 2000-2020, testing simple one-way ANOVA with the excellent agricolae package, and doing simple explorative base plot graphs.

However, I find it difficult to manipulate with the output from the agricolae package and the base plot function, and I would now like to make a nice (preferably ggplot) graph with data in proper order in presentation quality.

My data (named "biochemA") looks like this:

Year Transect Depth NH4_(ug/L)
2000 1 1 391,777
2009 1 1 471,616
2011 1 1 362,964
2013 1 1 348,208
2015 1 1 234,341
2017 1 1 768,844
2019 1 1 599,063
2020 1 1 912,435
2000 2 1 452,272
2009 2 1 391,134
2011 2 1 285,286
2013 2 1 755,01
2015 2 1 376,022
2017 2 1 1205,095
2019 2 1 2940,163
2020 2 1 298,487
2000 3 1 1409,322
2009 3 1 847,658
2011 3 1 332,635
2013 3 1 487,695
2015 3 1 337,721
2017 3 1 1702,21
2019 3 1 2684,409
2020 3 1 448,644

  • and I have done a really simple one-way ANOVA + Tukey test, using the agricolae package:

modelNH4<-aov(biochemA$NH4_(ug/L) ~ Year, data = biochemA)

out_NH4 <- HSD.test(modelNH4,"Year", group=TRUE,console=TRUE)

plot(out_NH4,main="NH4", xlab="Year", ylab="μg/g soil", las=2)

This gives me a graph like this:

A ggplot solution for plotting a Tukey test result from the agricolae package in correct (numerical) order on the x axis?

which is fine for exploratory purposes. However, I would like to plot it with the years on the x-axis in chronological order (2020-2000), whereas the base plot function automatically displays the graphs by decreasing means. Also I would like to show standard error and not standard deviations.

The output from the HSD test looks like this:

out_bioANH4
$statistics
MSerror Df Mean CV MSD
173.8115 38 11.83804 111.3678 26.72761

$parameters
test name.t ntr StudentizedRange alpha
Tukey Year 8 4.53321 0.05

$means
biochemA1$NH4_.ug.g_soil._ std r Min Max
2000 7.662795 6.182201 5 3.000779 18.067119
2009 6.314628 3.589682 5 2.905835 12.042757
2011 4.866242 1.180330 5 3.698008 6.192181
2013 6.401148 3.330437 5 3.628023 10.752502
2015 4.286512 1.424005 5 2.777662 6.101243
2017 14.385232 8.981692 5 6.405698 29.624701
2019 42.317093 18.145402 5 12.244628 57.310143
2020 8.470637 4.223832 5 4.573570 15.409000
Q25 Q50 Q75
2000 3.547272 5.327341 8.371463
2009 3.728181 6.008561 6.887809
2011 4.097738 4.257121 6.086162
2013 3.684275 4.766714 9.174225
2015 3.089635 4.135425 5.328595
2017 9.569202 12.620927 13.705629
2019 38.758582 50.761796 52.510317
2020 5.761486 7.738720 8.870408

$comparison
NULL

$groups
biochemA1$NH4_.ug.g_soil._ groups
2019 42.317093 a
2017 14.385232 b
2020 8.470637 b
2000 7.662795 b
2013 6.401148 b
2009 6.314628 b
2011 4.866242 b
2015 4.286512 b

attr(,"class")
1 "group"

英文:

I am plotting some plant and soil data along a transect covering the sampling years 2000-2020, testing simple one-way ANOVA with the excellent agricolae package, and doing simple explorative base plot graphs.

However, I find it difficult to manipulate with the output from the agricolae package and the base plot function, and I would now like to make a nice (preferably ggplot) graph with data in proper order in presentation quality.

My data (named "biochemA") looks like this:

Year Transect Depth NH4_(ug/L)  
2000 1 1 391,777  
2009 1 1 471,616 
2011 1 1 362,964
2013 1 1 348,208 
2015 1 1 234,341 
2017 1 1 768,844
2019 1 1 599,063 
2020 1 1 912,435
2000 2 1 452,272 
2009 2 1 391,134
2011 2 1 285,286 
2013 2 1 755,01 
2015 2 1 376,022 
2017 2 1 1205,095
2019 2 1 2940,163 
2020 2 1 298,487 
2000 3 1 1409,322 
2009 3 1 847,658 
2011 3 1 332,635 
2013 3 1 487,695 
2015 3 1 337,721 
2017 3 1 1702,21 
2019 3 1 2684,409 
2020 3 1 448,644
  • and I have done a really simple one-way ANOVA + Tukey test, using the agricolae package:

> modelNH4<-aov(biochemA$NH4_(ug/L) ~ Year, data = biochemA)
>
> out_NH4 <- HSD.test(modelNH4,"Year", group=TRUE,console=TRUE)
>
> plot(out_NH4,main="NH4", xlab="Year", ylab="μg/g soil", las=2)

This gives me a graph like this:

A ggplot solution for plotting a Tukey test result from the agricolae package in correct (numerical) order on the x axis?

which is fine for exploratory purposes. However, I would like to plot it with the years on the x-axis in chronological order (2020-2000), whereas the base plot function automatically displays the graphs by decreasing means. Also I would like to show standard error and not standard deviations.

The output from the HSD tst looks like this:

&gt; out_bioANH4
$statistics
   MSerror Df     Mean       CV      MSD
  173.8115 38 11.83804 111.3678 26.72761

$parameters
   test name.t ntr StudentizedRange alpha
  Tukey   Year   8          4.53321  0.05

$means
     biochemA1$NH4_.ug.g_soil._       std r       Min       Max
2000                   7.662795  6.182201 5  3.000779 18.067119
2009                   6.314628  3.589682 5  2.905835 12.042757
2011                   4.866242  1.180330 5  3.698008  6.192181
2013                   6.401148  3.330437 5  3.628023 10.752502
2015                   4.286512  1.424005 5  2.777662  6.101243
2017                  14.385232  8.981692 5  6.405698 29.624701
2019                  42.317093 18.145402 5 12.244628 57.310143
2020                   8.470637  4.223832 5  4.573570 15.409000
           Q25       Q50       Q75
2000  3.547272  5.327341  8.371463
2009  3.728181  6.008561  6.887809
2011  4.097738  4.257121  6.086162
2013  3.684275  4.766714  9.174225
2015  3.089635  4.135425  5.328595
2017  9.569202 12.620927 13.705629
2019 38.758582 50.761796 52.510317
2020  5.761486  7.738720  8.870408

$comparison
NULL

$groups
     biochemA1$NH4_.ug.g_soil._ groups
2019                  42.317093      a
2017                  14.385232      b
2020                   8.470637      b
2000                   7.662795      b
2013                   6.401148      b
2009                   6.314628      b
2011                   4.866242      b
2015                   4.286512      b

attr(,&quot;class&quot;)
[1] &quot;group&quot;

答案1

得分: 1

你可能需要做的是从out_bioANH4中提取元素并在自定义绘图中使用,比如ggplot2

如果在提示符下键入out_bioANH4,你会看到out_bioANH4$means包含数据的描述统计信息矩阵。如果你使用ggplot2,你可以使用geom_errorbar来生成类似于agricolae包的图形。

看看以下代码会得到什么结果:

library(ggplot2)

# 从摘要统计信息计算SE并将其添加到摘要矩阵中
out_bioANH4$means$se <- out_bioANH4$means$std/sqrt(out_bioANH4$means$r)

ggplot(data=out_bioANH4$means) +
# 计算ymin和ymax,即均值+/- SE的位置
  geom_errorbar(aes(ymin=out_bioANH4$means$biochemA1$NH4_.ug.g_soil._ - out_bioANH4$means$se, 
                    ymax=out_bioANH4$means$biochemA1$NH4_.ug.g_soil._ + out_bioANH4$means$se, 
                    x=row.names(out_bioANH4$means), color=out_bioANH4$groups$groups), width=0) +
# 在图上放置均值点,并为每个点使用不同的颜色
  geom_point(aes(y=out_bioANH4$means$biochemA1$NH4_.ug.g_soil._,
                   x=row.names(out_bioANH4$means), 
  color=out_bioANH4$groups$groups[order(row.names(out_NH4$groups))])) +
  xlab("Year") +
  ylab("NH4_(ug/L)") +
  scale_colour_discrete(name="Group")

我无法使用你的原始数据在你的"biochemA"示例中复制你的结果。如果你的out_bioANH4示例是正确的,这些ggplot2的内容应该可以帮助你入门。

由于geom_errorbar中的x是按数字顺序排列的,所以x轴应该按递增顺序排列。如果你想反转它,你可以很容易地使用ggplot来做到这一点。

英文:

What you'll probably have to do is take the elements from out_bioANH4 and use them in your custom plot, such as ggplot2.

If you type out_bioANH4 at the prompt, you'll see that out_bioANH4$means contains a matrix of the descriptive statistics of your data. If you use ggplot2, you can get similar plots as the agricolae package using geom_errorbar.

See what this gives you:

library(ggplot2)

#this calculates SE from the summary statistics and adds it to the summary matrix
out_bioANH4$means$se &lt;- out_bioANH4$means$std/sqrt(out_bioANH4$means$r)

ggplot(data=out_bioANH4$means) +
#this calculates ymin and ymax, which are the location of mean +/- SE
  geom_errorbar(aes(ymin=out_bioANH4$means$biochemA1$NH4_.ug.g_soil._ - out_bioANH4$means$se, 
                    ymax=out_bioANH4$means$biochemA1$NH4_.ug.g_soil._ + out_bioANH4$means$se, 
                    x=row.names(out_bioANH4$means), color=out_bioANH4$groups$groups), width=0) +
#this puts points at the means of the graph and does a different color for each
  geom_point(aes(y=out_bioANH4$means$biochemA1$NH4_.ug.g_soil._,
                   x=row.names(out_bioANH4$means), 
  color=out_bioANH4$groups$groups[order(row.names(out_NH4$groups))])) +
  xlab(&quot;Year&quot;) +
  ylab(&quot;NH4_(ug/L)&quot;) +
  scale_colour_discrete(name=&quot;Group&quot;)

I couldn't reproduce your results with HSD using your raw data in your "biochemA" example. If your out_bioANH4 example is correct, this ggplot2 stuff should get you started.

Because the x in the geom_errorbar is in numerical order, the x axis should be ordered in increasing order. If you want to reverse it, you can do that easily with ggplot.

huangapple
  • 本文由 发表于 2023年6月1日 03:19:15
  • 转载请务必保留本文链接:https://go.coder-hub.com/76376639.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定