英文:
How to make a violin plot?
问题
我正在尝试制作一个小提琴图,用于展示岩石海岸上不同动物的覆盖百分比。我以前从未使用过小提琴图,所以理解这些图应该是什么样子或如何工作的都很困难。
显然,有些问题。我认为最简单的问题是我的Excel格式,但我不知道如何修复它或从何处开始。
我的当前代码:
kitep <- ggplot(data = kite, aes(x = meters, y = percentage)) + geom_violin()
我的目标是获得3个小提琴图,它们位于彼此之上,用于展示四种物种中每种物种在米数上的覆盖百分比。
英文:
I am attempting to make a violin plot using the percentage cover of different animals over a rocky shore. I've never used a violin graph before, so understanding how the plots are supposed to look like or work is proving difficult.
Obviously, something is wrong. I think that the simplest issue is my Excel formatting, but I have no idea how to fix that or where to even start.
My current code:
kitep <- ggplot(data = kite, aes(x = meters, y = percentage)) + geom_violin()
I am aiming to get 3 violin plots, one on top of each other, for each of the four species showcasing their percentage cover over the meters.
答案1
得分: 1
你需要确保首先以正确的格式导入数据。R不能像Excel数据一样在数据框中具有嵌套的标题。以下数据框以R友好的格式重现了您的Excel数据:
df <- data.frame(meters = c(0, 2, 4, 6, 8, 10, 12, 14, 16, 18,
20, 22, 24, 26, 28, 30), `castle barnacles` = c(0, 0, 0, 0, 0,
3, 0, 39, 25, 39, 50, 19, 36, 25, 31, 0), `diamond barnacles` = c(6,
14, 28, 53, 39, 44, 56, 0, 0, 0, 0, 19, 22, 11, 42, 0), `toothed wrack` = c(94,
25, 53, 14, 6, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), `bladder wrack` = c(0,
28, 6, 22, 19, 42, 19, 36, 56, 39, 28, 17, 31, 14, 3, 0), check.names = FALSE)
现在,对象 df
看起来像这样:
df
#> meters castle barnacles diamond barnacles toothed wrack bladder wrack
#> 1 0 0 6 94 0
#> 2 2 0 14 25 28
#> 3 4 0 28 53 6
#> 4 6 0 53 14 22
#> 5 8 0 39 6 19
#> 6 10 3 44 0 42
#> 7 12 0 56 0 19
#> 8 14 39 0 0 36
#> 9 16 25 0 0 56
#> 10 18 39 0 0 39
#> 11 20 50 0 0 28
#> 12 22 19 19 0 17
#> 13 24 36 22 0 31
#> 14 26 25 11 0 14
#> 15 28 31 42 0 3
#> 16 30 0 0 0 0
要使用ggplot绘制它,最好使用tidyr
包中的pivot_longer
将其转换为长格式,然后使用tidyr
包中的uncount
对其进行 "uncount"。当您执行 library(tidyverse)
时,ggplot2
和 tidyr
包都会加载。
df %>%
pivot_longer(-meters, names_to = 'Species', values_to = 'Count') %>%
uncount(Count) %>%
ggplot(aes(x = meters, y = Species, color = Species)) +
geom_violin(aes(fill = after_scale(alpha(color, 0.6))),
width = 1.8, position = 'identity', trim = FALSE) +
scale_color_brewer(palette = 'Set1', guide = 'none') +
theme_minimal(base_size = 16) +
labs(x = '距离海岸距离(米)', title = '物种分布', y = NULL) +
coord_cartesian(xlim = c(0, 30), expand = FALSE) +
theme(plot.title.position = 'plot')
由于您的数字是百分比,并且您希望在离海岸的每个距离上显示相对丰度,另一种方法可能是使用平滑面积图:
df %>%
summarize(across(-meters, ~ spline(meters, .x, n = 1000)$y)) %>%
mutate(meters = seq(0, 30, length = 1000)) %>%
pivot_longer(-meters, names_to = 'Species', values_to = 'Count') %>%
mutate(Count = ifelse(Count < 0, 0, Count)) %>%
ggplot(aes(x = meters, y = Count, colour = Species)) +
geom_area(aes(fill = after_scale(alpha(colour, 0.5))), position = 'fill') +
scale_colour_brewer(palette = 'Set1') +
scale_y_continuous(labels = scales::percent) +
theme_minimal(base_size = 16) +
labs(x = '距离海岸距离(米)', title = '物种分布', y = NULL) +
coord_cartesian(xlim = c(0, 30), expand = FALSE) +
theme(plot.title.position = 'plot',
legend.position = 'bottom')
希望这有所帮助!
英文:
You need to ensure that your data is imported in the correct format first. R cannot have nested headings in data frames the way your Excel data does. The following data frame reproduces your Excel data in an R friendly format:
df <- data.frame(meters = c(0, 2, 4, 6, 8, 10, 12, 14, 16, 18,
20, 22, 24, 26, 28, 30), `castle barnacles` = c(0, 0, 0, 0, 0,
3, 0, 39, 25, 39, 50, 19, 36, 25, 31, 0), `diamond barnacles` = c(6,
14, 28, 53, 39, 44, 56, 0, 0, 0, 0, 19, 22, 11, 42, 0), `toothed wrack` = c(94,
25, 53, 14, 6, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), `bladder wrack` = c(0,
28, 6, 22, 19, 42, 19, 36, 56, 39, 28, 17, 31, 14, 3, 0), check.names = FALSE)
Now the object df
looks like this:
df
#> meters castle barnacles diamond barnacles toothed wrack bladder wrack
#> 1 0 0 6 94 0
#> 2 2 0 14 25 28
#> 3 4 0 28 53 6
#> 4 6 0 53 14 22
#> 5 8 0 39 6 19
#> 6 10 3 44 0 42
#> 7 12 0 56 0 19
#> 8 14 39 0 0 36
#> 9 16 25 0 0 56
#> 10 18 39 0 0 39
#> 11 20 50 0 0 28
#> 12 22 19 19 0 17
#> 13 24 36 22 0 31
#> 14 26 25 11 0 14
#> 15 28 31 42 0 3
#> 16 30 0 0 0 0
To plot it using ggplot, it would be best to pivot into long format using pivot_longer
from the tidyr
package, and then "uncount" it using uncount
, also from the tidyr
package. Both the ggplot2
and tidyr
package are loaded when you do library(tidyverse)
df %>%
pivot_longer(-meters, names_to = 'Species', values_to = 'Count') %>%
uncount(Count) %>%
ggplot(aes(x = meters, y = Species, color = Species)) +
geom_violin(aes(fill = after_scale(alpha(color, 0.6))),
width = 1.8, position = 'identity', trim = FALSE) +
scale_color_brewer(palette = 'Set1', guide = 'none') +
theme_minimal(base_size = 16) +
labs(x = 'Meters from shore', title = 'Species distribution', y = NULL) +
coord_cartesian(xlim = c(0, 30), expand = FALSE) +
theme(plot.title.position = 'plot')
Since your numbers are percentages and you wish to show relative abundance at each distance from the shore, an alternative approach might be a smoothed area plot:
df %>%
summarize(across(-meters, ~ spline(meters, .x, n = 1000)$y)) %>%
mutate(meters = seq(0, 30, length = 1000)) %>%
pivot_longer(-meters, names_to = 'Species', values_to = 'Count') %>%
mutate(Count = ifelse(Count < 0, 0, Count)) %>%
ggplot(aes(x = meters, y = Count, colour = Species)) +
geom_area(aes(fill = after_scale(alpha(colour, 0.5))), position = 'fill') +
scale_colour_brewer(palette = 'Set1') +
scale_y_continuous(labels = scales::percent) +
theme_minimal(base_size = 16) +
labs(x = 'Meters from shore', title = 'Species distribution', y = NULL) +
coord_cartesian(xlim = c(0, 30), expand = FALSE) +
theme(plot.title.position = 'plot',
legend.position = 'bottom')
<sup>Created on 2023-07-10 with reprex v2.0.2</sup>
答案2
得分: 0
我认为问题出在第一行有 "meters" 和 "percentage",然后第二行以米的数字开始,但百分比中有一些字符值。我的建议是删除 "percentage" 单元格,以便一切都在相同的级别上。一旦你这样做了,你需要将数据转换为只有三列 "meters"、"variable"(城堡、钻石...)和 "percentage"。然后你应该能够完成它。以下是我会这样做的方式。
df <- data.frame(meters = seq(0,30,2),
castle = c(rep(0, 5), 3, 0, 39, 25, 39, 50, 19, 36, 25, 31, 0),
diamond = c(6, 14, 28, 53, 39, 44, 56, rep(0,4), 19, 22, 11, 42, 0),
tooth = c(94, 25, 53, 14, 6, rep(0, 11)),
bladder = c(0, 28, 6, 22, 19, 42, 19, 36, 56, 39, 28, 17, 31, 14, 3, 0))
# 转换数据框为长格式
df_long <- tidyr::gather(df, key = "variable", value = "value", -meters)
# 创建小提琴图
ggplot(df_long, aes(x = meters, y = value, fill = variable)) +
geom_violin(scale = "width", trim = FALSE) +
labs(x = "Meters", y = "Percentage", fill = "Variable") +
scale_fill_manual(values = c("castle" = "red", "diamond" = "blue", "tooth" = "green", "bladder" = "purple")) +
theme_minimal()
希望这对你有所帮助!
英文:
I think the issue is that you have in row 1: "meters" and "percentage", then in row 2 you start with the numeric of the meters but you have some character values in percentage. My advice would be to remove the "percentage" cell so everything is in the same level. Once you do that you would need to transform the data to have only three columns "meters", "variable" (castle, diamonds...), and "percentage". Then you should be able to do it. Here is how I would do it.
df <- data.frame(meters = seq(0,30,2),
castle = c(rep(0, 5), 3, 0, 39, 25, 39, 50, 19, 36, 25, 31, 0),
diamond = c(6, 14, 28, 53, 39, 44, 56, rep(0,4), 19, 22, 11, 42, 0),
tooth = c(94, 25, 53, 14, 6, rep(0, 11)),
bladder = c(0, 28, 6, 22, 19, 42, 19, 36, 56, 39, 28, 17, 31, 14, 3, 0))
Convert the data frame to long format
df_long <- tidyr::gather(df, key = "variable", value = "value", -meters)
Create the violin plot
ggplot(df_long, aes(x = meters, y = value, fill = variable)) +
geom_violin(scale = "width", trim = FALSE) +
labs(x = "Meters", y = "Percentage", fill = "Variable") +
scale_fill_manual(values = c("castle" = "red", "diamond" = "blue", "tooth" = "green", "bladder" = "purple")) +
theme_minimal()
I hope this helps!
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论