英文:
Changing boxplot width (measuring multiple categorical variables) for categorical conditions with missing data
问题
作为初步声明,我对R仍然非常陌生(这是我独立进行的第一项分析),我希望这是一个可重现的示例。
我有一个数据集,测量了不同时间和空间中各种牙釉质样本的d.13.C和d.18.O值。我想要表示在不同家族中随时间和空间的趋势。我在ggplot2中生成了一个箱线图,但遇到了一些问题:
d %>%
mutate(across(Member, factor, levels = c("UpperBurgi", "KBS", "Okote"))) %>%
mutate(across(Dep_context, factor, levels = c("Lacustrine", "Deltaic", "Fluvial ")) %>%
ggplot(aes(x = Member, y = d.13.C)) +
geom_boxplot(aes(x = Member, y = d.13.C, col = Dep_context, fill = Dep_context), alpha = 0.5, lwd = 1) +
facet_wrap(~Family) +
scale_fill_brewer(palette = "Dark2") +
scale_color_brewer(palette = "Dark2") +
theme_bw()
它生成类似于以下的图表:
由于我的数据不均匀分布(不是每个地质成员中都包含每个沉积环境),每个沉积环境的箱线图都不同。我希望它们的宽度都相同,不管数据是否存在(例如,与KBS Member中的Bovidae的大小相等)。
我尝试在geom_boxplot中使用width = 参数,尝试使用theme()来更改网格的一些方面,并尝试使用drop = FALSE参数,但这些都没有改变任何事情。我还尝试了对成员和沉积环境进行分面,但这看起来不太吸引人,而且似乎有点笨重。是否有一种方法可以实现这一点,或者分面是正确的方法?
我在下面提供了我的数据框。 *注意:这只是一个子集,否则输出会太长。
dput(head(d))
structure(list(CA = c("6", "1", "104", "105", "6A", "6A"), Member = c("KBS",
"Okote", "KBS", "KBS", "KBS", "KBS"), Dep_context = c("Deltaic",
"Fluvial ", "Fluvial ", "Fluvial ", "Deltaic", "Deltaic"), Family = c("Equidae",
"Equidae", "Equidae", "Equidae", "Equidae", "Equidae"), Tribe = c("",
"", "", "", "", ""), Genus = c("Equus", "Equus", "Equus", "Equus",
"Equus", "Equus"), d.13.C = c(-0.3, -0.7, 0.7, -0.9, -0.1, -0.8
), d.18.O = c(0, 1.6, 4, 2.6, 1.8, 0.2), Age.range = c("1.87-1.56",
"1.56-1.38", "1.87-1.56", "1.87-1.56", "1.87-1.56", "1.87-1.56"
)), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"
))
英文:
As a preliminary disclaimer, I am still very new to R (this is the first analysis I've performed independently), and am hoping this is a reproducible example.
I have a dataset measuring the d.13.C and d.18.O values of various enamel samples through time and space. I want to represent trends within Families across space and time. I have a boxplot I generated in ggplot2 that does this, but I'm running into a few problems:
d %>%
mutate(across(Member, factor, levels = c("UpperBurgi", "KBS", "Okote"))) %>%
mutate(across(Dep_context, factor, levels = c("Lacustrine", "Deltaic", "Fluvial "))) %>%
ggplot(aes(x = Member, y = d.13.C)) +
geom_boxplot(aes(x = Member, y = d.13.C, col = Dep_context, fill = Dep_context), alpha = 0.5, lwd = 1) +
facet_wrap(~Family) +
scale_fill_brewer(palette = "Dark2") +
scale_color_brewer(palette = "Dark2") +
theme_bw()
It produces something like this:
Since my data is not evenly distributed (not every depositional context is represented in each geologic member in each family), the boxplots for each depositional environment are different. I would like them to all be the same width, regardless of if the data is present or not (e.g., equivalent to the size of the ones in Bovidae in the KBS Member).
I've tried messing around with width = in the geom_boxplot call, I've tried using theme() to change aspects of the grid, and I've tried the drop = FALSE call, but that didn't change anything. I've also tried faceting my member and depositional environment, but that did not look as appealing and seemed clunkier. Is there a way to accomplish this, or is faceting the way to go?
I provided my dataframe below. *note: it's a subset since otherwise, the output was too long.
dput(head(d))
structure(list(CA = c("6", "1", "104", "105", "6A", "6A"), Member = c("KBS",
"Okote", "KBS", "KBS", "KBS", "KBS"), Dep_context = c("Deltaic",
"Fluvial ", "Fluvial ", "Fluvial ", "Deltaic", "Deltaic"), Family = c("Equidae",
"Equidae", "Equidae", "Equidae", "Equidae", "Equidae"), Tribe = c("",
"", "", "", "", ""), Genus = c("Equus", "Equus", "Equus", "Equus",
"Equus", "Equus"), d.13.C = c(-0.3, -0.7, 0.7, -0.9, -0.1, -0.8
), d.18.O = c(0, 1.6, 4, 2.6, 1.8, 0.2), Age.range = c("1.87-1.56",
"1.56-1.38", "1.87-1.56", "1.87-1.56", "1.87-1.56", "1.87-1.56"
)), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"
))
答案1
得分: 0
你可以使用position_dodge2
与preserve = "single"
来保持不同组的箱线图宽度相同,就像这样:
library(ggplot2)
library(dplyr)
d %>%
mutate(across(Member, factor, levels = c("UpperBurgi", "KBS", "Okote"))) %>%
mutate(across(Dep_context, factor, levels = c("Lacustrine", "Deltaic", "Fluvial ")) %>%
ggplot(aes(x = Member, y = d.13.C)) +
geom_boxplot(aes(x = Member, y = d.13.C, col = Dep_context, fill = Dep_context), alpha = 0.5, lwd = 1,
position = position_dodge2(preserve = "single")) +
facet_wrap(~Family) +
scale_fill_brewer(palette = "Dark2") +
scale_color_brewer(palette = "Dark2") +
theme_bw()
使用 reprex v2.0.2 在 2023-02-08 创建
英文:
You could use position_dodge2
with preserve = "single"
to keep the boxplot width the same across different groups like this:
library(ggplot2)
library(dplyr)
d %>%
mutate(across(Member, factor, levels = c("UpperBurgi", "KBS", "Okote"))) %>%
mutate(across(Dep_context, factor, levels = c("Lacustrine", "Deltaic", "Fluvial "))) %>%
ggplot(aes(x = Member, y = d.13.C)) +
geom_boxplot(aes(x = Member, y = d.13.C, col = Dep_context, fill = Dep_context), alpha = 0.5, lwd = 1,
position = position_dodge2(preserve = "single")) +
facet_wrap(~Family) +
scale_fill_brewer(palette = "Dark2") +
scale_color_brewer(palette = "Dark2") +
theme_bw()
<!-- -->
<sup>Created on 2023-02-08 with reprex v2.0.2</sup>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论