如何制作小提琴图?

huangapple go评论68阅读模式
英文:

How to make a violin plot?

问题

我正在尝试制作一个小提琴图,用于展示岩石海岸上不同动物的覆盖百分比。我以前从未使用过小提琴图,所以理解这些图应该是什么样子或如何工作的都很困难。

显然,有些问题。我认为最简单的问题是我的Excel格式,但我不知道如何修复它或从何处开始。

我的当前代码:

kitep <- ggplot(data = kite, aes(x = meters, y = percentage)) + geom_violin()

我的目标是获得3个小提琴图,它们位于彼此之上,用于展示四种物种中每种物种在米数上的覆盖百分比。

英文:

I am attempting to make a violin plot using the percentage cover of different animals over a rocky shore. I've never used a violin graph before, so understanding how the plots are supposed to look like or work is proving difficult.

如何制作小提琴图?

如何制作小提琴图?

Obviously, something is wrong. I think that the simplest issue is my Excel formatting, but I have no idea how to fix that or where to even start.

My current code:

kitep &lt;- ggplot(data = kite, aes(x = meters, y = percentage)) + geom_violin()

I am aiming to get 3 violin plots, one on top of each other, for each of the four species showcasing their percentage cover over the meters.

答案1

得分: 1

你需要确保首先以正确的格式导入数据。R不能像Excel数据一样在数据框中具有嵌套的标题。以下数据框以R友好的格式重现了您的Excel数据:

df <- data.frame(meters = c(0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 
                            20, 22, 24, 26, 28, 30), `castle barnacles` = c(0, 0, 0, 0, 0, 
                                                                      3, 0, 39, 25, 39, 50, 19, 36, 25, 31, 0), `diamond barnacles` = c(6, 
                                                                                                                                         14, 28, 53, 39, 44, 56, 0, 0, 0, 0, 19, 22, 11, 42, 0), `toothed wrack` = c(94, 
                                                                                                                                                                                                                           25, 53, 14, 6, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), `bladder wrack` = c(0, 
                                                                                                                                                                                                                                                                                                  28, 6, 22, 19, 42, 19, 36, 56, 39, 28, 17, 31, 14, 3, 0), check.names = FALSE)

现在,对象 df 看起来像这样:

df
#>    meters castle barnacles diamond barnacles toothed wrack bladder wrack
#> 1       0                0                 6            94             0
#> 2       2                0                14            25            28
#> 3       4                0                28            53             6
#> 4       6                0                53            14            22
#> 5       8                0                39             6            19
#> 6      10                3                44             0            42
#> 7      12                0                56             0            19
#> 8      14               39                 0             0            36
#> 9      16               25                 0             0            56
#> 10     18               39                 0             0            39
#> 11     20               50                 0             0            28
#> 12     22               19                19             0            17
#> 13     24               36                22             0            31
#> 14     26               25                11             0            14
#> 15     28               31                42             0             3
#> 16     30                0                 0             0             0

要使用ggplot绘制它,最好使用tidyr包中的pivot_longer将其转换为长格式,然后使用tidyr包中的uncount对其进行 "uncount"。当您执行 library(tidyverse) 时,ggplot2tidyr 包都会加载。

df %>%
  pivot_longer(-meters, names_to = 'Species', values_to = 'Count') %>%
  uncount(Count) %>%
  ggplot(aes(x = meters, y = Species, color = Species)) +
  geom_violin(aes(fill = after_scale(alpha(color, 0.6))),
              width = 1.8, position = 'identity', trim = FALSE) +
  scale_color_brewer(palette = 'Set1', guide = 'none') +
  theme_minimal(base_size = 16) +
  labs(x = '距离海岸距离(米)', title = '物种分布', y = NULL) +
  coord_cartesian(xlim = c(0, 30), expand = FALSE) +
  theme(plot.title.position = 'plot')

由于您的数字是百分比,并且您希望在离海岸的每个距离上显示相对丰度,另一种方法可能是使用平滑面积图:

df %>%
  summarize(across(-meters, ~ spline(meters, .x, n = 1000)$y)) %>%
  mutate(meters = seq(0, 30, length = 1000)) %>%
  pivot_longer(-meters, names_to = 'Species', values_to = 'Count') %>%
  mutate(Count = ifelse(Count < 0, 0, Count)) %>%
  ggplot(aes(x = meters, y = Count, colour = Species)) +
  geom_area(aes(fill = after_scale(alpha(colour, 0.5))), position = 'fill') +
  scale_colour_brewer(palette = 'Set1') +
  scale_y_continuous(labels = scales::percent) +
  theme_minimal(base_size = 16) +
  labs(x = '距离海岸距离(米)', title = '物种分布', y = NULL) +
  coord_cartesian(xlim = c(0, 30), expand = FALSE) +
  theme(plot.title.position = 'plot',
        legend.position = 'bottom')

希望这有所帮助!

英文:

You need to ensure that your data is imported in the correct format first. R cannot have nested headings in data frames the way your Excel data does. The following data frame reproduces your Excel data in an R friendly format:

df &lt;- data.frame(meters = c(0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 
20, 22, 24, 26, 28, 30), `castle barnacles` = c(0, 0, 0, 0, 0, 
3, 0, 39, 25, 39, 50, 19, 36, 25, 31, 0), `diamond barnacles` = c(6, 
14, 28, 53, 39, 44, 56, 0, 0, 0, 0, 19, 22, 11, 42, 0), `toothed wrack` = c(94, 
25, 53, 14, 6, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), `bladder wrack` = c(0, 
28, 6, 22, 19, 42, 19, 36, 56, 39, 28, 17, 31, 14, 3, 0), check.names = FALSE)

Now the object df looks like this:

df
#&gt;    meters castle barnacles diamond barnacles toothed wrack bladder wrack
#&gt; 1       0                0                 6            94             0
#&gt; 2       2                0                14            25            28
#&gt; 3       4                0                28            53             6
#&gt; 4       6                0                53            14            22
#&gt; 5       8                0                39             6            19
#&gt; 6      10                3                44             0            42
#&gt; 7      12                0                56             0            19
#&gt; 8      14               39                 0             0            36
#&gt; 9      16               25                 0             0            56
#&gt; 10     18               39                 0             0            39
#&gt; 11     20               50                 0             0            28
#&gt; 12     22               19                19             0            17
#&gt; 13     24               36                22             0            31
#&gt; 14     26               25                11             0            14
#&gt; 15     28               31                42             0             3
#&gt; 16     30                0                 0             0             0

To plot it using ggplot, it would be best to pivot into long format using pivot_longer from the tidyr package, and then "uncount" it using uncount, also from the tidyr package. Both the ggplot2 and tidyr package are loaded when you do library(tidyverse)

df %&gt;%
  pivot_longer(-meters, names_to = &#39;Species&#39;, values_to = &#39;Count&#39;) %&gt;%
  uncount(Count) %&gt;%
  ggplot(aes(x = meters, y = Species, color = Species)) +
  geom_violin(aes(fill = after_scale(alpha(color, 0.6))),
              width = 1.8, position = &#39;identity&#39;, trim = FALSE) +
  scale_color_brewer(palette = &#39;Set1&#39;, guide = &#39;none&#39;) +
  theme_minimal(base_size = 16) +
  labs(x = &#39;Meters from shore&#39;, title = &#39;Species distribution&#39;, y = NULL) +
  coord_cartesian(xlim = c(0, 30), expand = FALSE) +
  theme(plot.title.position = &#39;plot&#39;)

如何制作小提琴图?

Since your numbers are percentages and you wish to show relative abundance at each distance from the shore, an alternative approach might be a smoothed area plot:

df %&gt;%
  summarize(across(-meters, ~ spline(meters, .x, n = 1000)$y)) %&gt;%
  mutate(meters = seq(0, 30, length = 1000)) %&gt;%
  pivot_longer(-meters, names_to = &#39;Species&#39;, values_to = &#39;Count&#39;) %&gt;%
  mutate(Count = ifelse(Count &lt; 0, 0, Count)) %&gt;%
  ggplot(aes(x = meters, y = Count, colour = Species)) +
  geom_area(aes(fill = after_scale(alpha(colour, 0.5))), position = &#39;fill&#39;) +
  scale_colour_brewer(palette = &#39;Set1&#39;) +
  scale_y_continuous(labels = scales::percent) +
  theme_minimal(base_size = 16) +
  labs(x = &#39;Meters from shore&#39;, title = &#39;Species distribution&#39;, y = NULL) +
  coord_cartesian(xlim = c(0, 30), expand = FALSE) +
  theme(plot.title.position = &#39;plot&#39;,
        legend.position = &#39;bottom&#39;)

如何制作小提琴图?

<sup>Created on 2023-07-10 with reprex v2.0.2</sup>

答案2

得分: 0

我认为问题出在第一行有 "meters" 和 "percentage",然后第二行以米的数字开始,但百分比中有一些字符值。我的建议是删除 "percentage" 单元格,以便一切都在相同的级别上。一旦你这样做了,你需要将数据转换为只有三列 "meters"、"variable"(城堡、钻石...)和 "percentage"。然后你应该能够完成它。以下是我会这样做的方式。

df <- data.frame(meters = seq(0,30,2),
                 castle = c(rep(0, 5), 3, 0, 39, 25, 39, 50, 19, 36, 25, 31, 0),
                 diamond = c(6, 14, 28, 53, 39, 44, 56, rep(0,4), 19, 22, 11, 42, 0),
                 tooth = c(94, 25, 53, 14, 6, rep(0, 11)),
                 bladder = c(0, 28, 6, 22, 19, 42, 19, 36, 56, 39, 28, 17, 31, 14, 3, 0))

# 转换数据框为长格式
df_long <- tidyr::gather(df, key = "variable", value = "value", -meters)

# 创建小提琴图
ggplot(df_long, aes(x = meters, y = value, fill = variable)) +
  geom_violin(scale = "width", trim = FALSE) +
  labs(x = "Meters", y = "Percentage", fill = "Variable") +
  scale_fill_manual(values = c("castle" = "red", "diamond" = "blue", "tooth" = "green", "bladder" = "purple")) +
  theme_minimal()

希望这对你有所帮助!

英文:

I think the issue is that you have in row 1: "meters" and "percentage", then in row 2 you start with the numeric of the meters but you have some character values in percentage. My advice would be to remove the "percentage" cell so everything is in the same level. Once you do that you would need to transform the data to have only three columns "meters", "variable" (castle, diamonds...), and "percentage". Then you should be able to do it. Here is how I would do it.

df &lt;- data.frame(meters = seq(0,30,2),
             castle = c(rep(0, 5), 3, 0, 39, 25, 39, 50, 19, 36, 25, 31, 0),
             diamond = c(6, 14, 28, 53, 39, 44, 56, rep(0,4), 19, 22, 11, 42, 0),
             tooth = c(94, 25, 53, 14, 6, rep(0, 11)),
             bladder = c(0, 28, 6, 22, 19, 42, 19, 36, 56, 39, 28, 17, 31, 14, 3, 0))

Convert the data frame to long format

df_long &lt;- tidyr::gather(df, key = &quot;variable&quot;, value = &quot;value&quot;, -meters)

Create the violin plot

ggplot(df_long, aes(x = meters, y = value, fill = variable)) +
  geom_violin(scale = &quot;width&quot;, trim = FALSE) +
  labs(x = &quot;Meters&quot;, y = &quot;Percentage&quot;, fill = &quot;Variable&quot;) +
  scale_fill_manual(values = c(&quot;castle&quot; = &quot;red&quot;, &quot;diamond&quot; = &quot;blue&quot;, &quot;tooth&quot; = &quot;green&quot;, &quot;bladder&quot; = &quot;purple&quot;)) +
  theme_minimal()

如何制作小提琴图?

I hope this helps!

huangapple
  • 本文由 发表于 2023年7月10日 22:08:28
  • 转载请务必保留本文链接:https://go.coder-hub.com/76654579.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定