英文:
R ggplot2 aesthetics: Color, point shape, and point filled/unfilled based on 3 separate variables
问题
# 每年种子宽度的变化
ggplot(phys_MLW, aes(x = year, y = avg_width, color = site)) +
geom_point(aes(shape = interaction(data_collection, ifelse(no_seeds > 20, 'over_20', 'under_20'))), size = 2.5) +
geom_line() +
scale_x_continuous(breaks = round(seq(min(phys_MLW$year), max(phys_MLW$year), by = 2))) +
labs(x = "年份", y = "平均宽度", col = "地点") +
scale_shape(name = "样本来源", labels = c("个体树木", "全站调查"))
英文:
I'm trying to plot a graph of tree seed width over the past few decades from data that was collected from 6 different sites. I want the graph to have several features depending on different conditions in my dataset:
-
Color represents the site at which the data was collected
-
Point shape represents the type of data (in my case this is data collected either from a sitewide survey or from individual trees)
-
Point fill/unfilled represents whether there were over/under 20 samples weighed from a particular site within a given year.
I've managed to get the first two down fine, producing the following graph:
I'm now struggling to figure out the last part. I think it may be just to add a column to my dataset with the numbers that correspond to the shape I want to be plotted depending on two variables (data type and sample count) and then use that column in the geom_point shape argument, but I am unsure how to go about that.
I've previously done a graph where I didn't need to specify the data type so I used an ifelse statement to create a new column with the numbers corresponding to the shape I want depending on whether there are over/under 20 seeds, but I don't know how to do this with the 4 possible combinations I could have in this situation. It looked something like this:
EDIT: I've actually just realized the below code does not work...
# if length >20 make it a circle (1 in ggplot2 shape arg), if <20 make it a cross (4 in ggplot 2 shape arg)
phys_MLW$pt_shape <- ifelse(phys_MLW$no_seeds > 20, '1', '4')
I then used this new column in my geom_point shape aesthetic which worked. So I guess something along these lines would work for my current problem?
Here's my code at the minute, which produces the graph in the image above:
# yearly variation in width
ggplot(phys_MLW, aes(x = year, y = avg_width, color = site)) +
geom_point(aes(shape = data_collection), size = 2.5) +
geom_line() +
scale_x_continuous(breaks = round(seq(min(phys_MLW$year),
max(phys_MLW$year), by = 2))) +
labs(x = "Year", y = "Average Width", col = "Site") +
scale_shape(name = "Source of samples", labels = c("Individual trees", "Sitewide"))
EDIT: Here is a (hopefully) copy/pastable example of the data I'm using:
> site year avg_width no_seeds data_collection
> NETTLEBED 2007 6.7925 36 indiv_phys
> NETTLEBED 2009 6.825555556 30 site_phys
> BENWELL 2007 8.14 30 site_phys
> BENWELL 2019 8.039333333 50 indiv_phys
> FISH HILL 2007 7.241975309 32 indiv_phys
> FISH HILL 2009 6.7 8 site_phys
> SPENNYMOOR 2007 7.260606061 11 site_phys
> SPENNYMOOR 2019 7.057037037 38 indiv_phys
> PATCHAM PLACE 2007 6.920952381 29 indiv_phys
> PATCHAM PLACE 2009 6.99 30 site_phys
> RIPON 2007 6.635416667 16 site_phys
> RIPON 2008 6.35037037 10 indiv_phys
Any help would be greatly appreciated!
答案1
得分: 1
使用点的大小来表示种子数量呢?
库(ggplot2)
dat <- data.frame(site = rep(c("A", "B", "C"), each = 4),
year = rep(2011:2014, times = 3),
avg_width = runif(n = 12, min = 6, max = 8),
no_seeds = sample(3:30, size = 12, replace = TRUE),
data_collection = sample(c("个体", "站点"), size = 12, replace = TRUE))
ggplot(dat, aes(x = year, y = avg_width, color = site, size = no_seeds)) +
geom_point(aes(shape = data_collection)) +
geom_line(linewidth = 1) +
scale_size(range = c(1, 10), breaks = c(5, 10, 15, 20, 25))
如果20颗种子的精确阈值很重要,这可能不是最好的展示方式,因为除了圆形之外的其他形状的大小可能很难看清。
更多气泡图选项请参考这里:https://r-graph-gallery.com/320-the-basis-of-bubble-plot.html
英文:
How about using point size to represent the number of seeds?
library(ggplot2)
dat <- data.frame(site = rep(c("A", "B", "C"), each = 4),
year = rep(2011:2014, times = 3),
avg_width = runif(n = 12, min = 6, max = 8),
no_seeds = sample(3:30, size = 12, replace = TRUE),
data_collection = sample(c("indiv", "site"), size = 12, replace = TRUE))
ggplot(dat, aes(x = year, y = avg_width, color = site, size = no_seeds)) +
geom_point(aes(shape = data_collection)) +
geom_line(linewidth = 1) +
scale_size(range = c(1, 10), breaks = c(5, 10, 15, 20, 25))
If the precise threshold of 20 seeds is important, this might not be the best way to present it though, because the size of other shapes than a circle can be difficult to see.
More bubble plot options here: https://r-graph-gallery.com/320-the-basis-of-bubble-plot.html
答案2
得分: 0
这是可能的,但在ggplot样式中不推荐,因为并非所有形状都可填充。如果你仍然想这样做,那么你必须首先手动指定形状,以确保它们是可填充的。我希望下面的示例能澄清我的意思。但请注意,图例显示可能不正确。我找不到解决方法。
另一种选择是使用其他美学元素来表示该变量。也许可以使用你建议的星号作为示意。
英文:
This is possible but not recommended in the ggplot style as not all the shapes are fillable. If you still want to do it then you have to manually specify the shapes manually first, to ensure that they are fillable. I hope the example below clarifies what I mean. But be aware that the legend won't display correctly. I can't find a way to fix that.
Another option is to use some other aesthetic to signify that variable. Perhaps using asterisk as you suggested.
library(tidyverse)
mtcars |>
ggplot(aes(x = mpg,y = hp,
color = factor(cyl),
shape = ifelse(disp > 300,TRUE,FALSE),
label = ifelse(wt > 2.6,"*",""))) +
geom_point(size = 2) +
geom_line() +
geom_text(nudge_y = +20) +
labs(shape = "Disp > 300",
caption = "* indicates wt > 2.6",
color = "Cyl") +
scale_shape_manual(values = c(21,24)) +
scale_fill_manual(values = c("black","pink")) +
theme_minimal(base_size = 16)
<sup>Created on 2023-04-21 with reprex v2.0.2</sup>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论