Changing boxplot width (measuring multiple categorical variables) for categorical conditions with missing data

huangapple go评论76阅读模式
英文:

Changing boxplot width (measuring multiple categorical variables) for categorical conditions with missing data

问题

作为初步声明,我对R仍然非常陌生(这是我独立进行的第一项分析),我希望这是一个可重现的示例。

我有一个数据集,测量了不同时间和空间中各种牙釉质样本的d.13.C和d.18.O值。我想要表示在不同家族中随时间和空间的趋势。我在ggplot2中生成了一个箱线图,但遇到了一些问题:

  1. d %>%
  2. mutate(across(Member, factor, levels = c("UpperBurgi", "KBS", "Okote"))) %>%
  3. mutate(across(Dep_context, factor, levels = c("Lacustrine", "Deltaic", "Fluvial ")) %>%
  4. ggplot(aes(x = Member, y = d.13.C)) +
  5. geom_boxplot(aes(x = Member, y = d.13.C, col = Dep_context, fill = Dep_context), alpha = 0.5, lwd = 1) +
  6. facet_wrap(~Family) +
  7. scale_fill_brewer(palette = "Dark2") +
  8. scale_color_brewer(palette = "Dark2") +
  9. theme_bw()

它生成类似于以下的图表:

Changing boxplot width (measuring multiple categorical variables) for categorical conditions with missing data

由于我的数据不均匀分布(不是每个地质成员中都包含每个沉积环境),每个沉积环境的箱线图都不同。我希望它们的宽度都相同,不管数据是否存在(例如,与KBS Member中的Bovidae的大小相等)。

我尝试在geom_boxplot中使用width = 参数,尝试使用theme()来更改网格的一些方面,并尝试使用drop = FALSE参数,但这些都没有改变任何事情。我还尝试了对成员和沉积环境进行分面,但这看起来不太吸引人,而且似乎有点笨重。是否有一种方法可以实现这一点,或者分面是正确的方法?

我在下面提供了我的数据框。 *注意:这只是一个子集,否则输出会太长。

  1. dput(head(d))
  2. structure(list(CA = c("6", "1", "104", "105", "6A", "6A"), Member = c("KBS",
  3. "Okote", "KBS", "KBS", "KBS", "KBS"), Dep_context = c("Deltaic",
  4. "Fluvial ", "Fluvial ", "Fluvial ", "Deltaic", "Deltaic"), Family = c("Equidae",
  5. "Equidae", "Equidae", "Equidae", "Equidae", "Equidae"), Tribe = c("",
  6. "", "", "", "", ""), Genus = c("Equus", "Equus", "Equus", "Equus",
  7. "Equus", "Equus"), d.13.C = c(-0.3, -0.7, 0.7, -0.9, -0.1, -0.8
  8. ), d.18.O = c(0, 1.6, 4, 2.6, 1.8, 0.2), Age.range = c("1.87-1.56",
  9. "1.56-1.38", "1.87-1.56", "1.87-1.56", "1.87-1.56", "1.87-1.56"
  10. )), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"
  11. ))
英文:

As a preliminary disclaimer, I am still very new to R (this is the first analysis I've performed independently), and am hoping this is a reproducible example.

I have a dataset measuring the d.13.C and d.18.O values of various enamel samples through time and space. I want to represent trends within Families across space and time. I have a boxplot I generated in ggplot2 that does this, but I'm running into a few problems:

  1. d %>%
  2. mutate(across(Member, factor, levels = c("UpperBurgi", "KBS", "Okote"))) %>%
  3. mutate(across(Dep_context, factor, levels = c("Lacustrine", "Deltaic", "Fluvial "))) %>%
  4. ggplot(aes(x = Member, y = d.13.C)) +
  5. geom_boxplot(aes(x = Member, y = d.13.C, col = Dep_context, fill = Dep_context), alpha = 0.5, lwd = 1) +
  6. facet_wrap(~Family) +
  7. scale_fill_brewer(palette = "Dark2") +
  8. scale_color_brewer(palette = "Dark2") +
  9. theme_bw()

It produces something like this:

Changing boxplot width (measuring multiple categorical variables) for categorical conditions with missing data

Since my data is not evenly distributed (not every depositional context is represented in each geologic member in each family), the boxplots for each depositional environment are different. I would like them to all be the same width, regardless of if the data is present or not (e.g., equivalent to the size of the ones in Bovidae in the KBS Member).

I've tried messing around with width = in the geom_boxplot call, I've tried using theme() to change aspects of the grid, and I've tried the drop = FALSE call, but that didn't change anything. I've also tried faceting my member and depositional environment, but that did not look as appealing and seemed clunkier. Is there a way to accomplish this, or is faceting the way to go?

I provided my dataframe below. *note: it's a subset since otherwise, the output was too long.

  1. dput(head(d))
  2. structure(list(CA = c("6", "1", "104", "105", "6A", "6A"), Member = c("KBS",
  3. "Okote", "KBS", "KBS", "KBS", "KBS"), Dep_context = c("Deltaic",
  4. "Fluvial ", "Fluvial ", "Fluvial ", "Deltaic", "Deltaic"), Family = c("Equidae",
  5. "Equidae", "Equidae", "Equidae", "Equidae", "Equidae"), Tribe = c("",
  6. "", "", "", "", ""), Genus = c("Equus", "Equus", "Equus", "Equus",
  7. "Equus", "Equus"), d.13.C = c(-0.3, -0.7, 0.7, -0.9, -0.1, -0.8
  8. ), d.18.O = c(0, 1.6, 4, 2.6, 1.8, 0.2), Age.range = c("1.87-1.56",
  9. "1.56-1.38", "1.87-1.56", "1.87-1.56", "1.87-1.56", "1.87-1.56"
  10. )), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"
  11. ))

答案1

得分: 0

你可以使用position_dodge2preserve = "single"来保持不同组的箱线图宽度相同,就像这样:

  1. library(ggplot2)
  2. library(dplyr)
  3. d %>%
  4. mutate(across(Member, factor, levels = c("UpperBurgi", "KBS", "Okote"))) %>%
  5. mutate(across(Dep_context, factor, levels = c("Lacustrine", "Deltaic", "Fluvial ")) %>%
  6. ggplot(aes(x = Member, y = d.13.C)) +
  7. geom_boxplot(aes(x = Member, y = d.13.C, col = Dep_context, fill = Dep_context), alpha = 0.5, lwd = 1,
  8. position = position_dodge2(preserve = "single")) +
  9. facet_wrap(~Family) +
  10. scale_fill_brewer(palette = "Dark2") +
  11. scale_color_brewer(palette = "Dark2") +
  12. theme_bw()

Changing boxplot width (measuring multiple categorical variables) for categorical conditions with missing data

使用 reprex v2.0.2 在 2023-02-08 创建

英文:

You could use position_dodge2 with preserve = "single" to keep the boxplot width the same across different groups like this:

  1. library(ggplot2)
  2. library(dplyr)
  3. d %>%
  4. mutate(across(Member, factor, levels = c("UpperBurgi", "KBS", "Okote"))) %>%
  5. mutate(across(Dep_context, factor, levels = c("Lacustrine", "Deltaic", "Fluvial "))) %>%
  6. ggplot(aes(x = Member, y = d.13.C)) +
  7. geom_boxplot(aes(x = Member, y = d.13.C, col = Dep_context, fill = Dep_context), alpha = 0.5, lwd = 1,
  8. position = position_dodge2(preserve = "single")) +
  9. facet_wrap(~Family) +
  10. scale_fill_brewer(palette = "Dark2") +
  11. scale_color_brewer(palette = "Dark2") +
  12. theme_bw()

Changing boxplot width (measuring multiple categorical variables) for categorical conditions with missing data<!-- -->

<sup>Created on 2023-02-08 with reprex v2.0.2</sup>

huangapple
  • 本文由 发表于 2023年2月9日 00:07:53
  • 转载请务必保留本文链接:https://go.coder-hub.com/75388577.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定