geom_boxplot的箱线图在改变y轴刻度时,箱线的长度会发生变化。

huangapple go评论68阅读模式
英文:

geom_boxplot whisker length changing when changing y-axis scale

问题

geom_boxplot() 的箱线图的箱须在改变 y 轴刻度和移除异常值时会缩短。如何确保箱须不缩短呢?

在下面的示例中,第一个图显示了所有数据,包括异常值。对于 cyl = 4,上部箱须延伸到 mpg = 45。类似地,对于 cyl = 8,下部箱须延伸到 mpg = 11。

在第二个图中,使用 outlier.shape = NA 移除了异常值,并缩小了 y 轴范围。然而,对于 cyl = 4,上部箱须现在是 mpg = 34,而对于 cyl = 8,下部箱须现在是 mpg = 13。我期望 cyl = 4 的上部箱须仍然是 mpg = 45,而 cyl = 8 的下部箱须仍然是 mpg = 11。

library(ggplot2)
library(dplyr)

# 加载 mtcars 数据集
data(mtcars)

data <- mtcars %>%
  select(mpg, cyl)

# 向特定组添加异常值
outliers <- data.frame(
  mpg = c(45, 50, 55, 45, 50, 55),
  cyl = c(4, 4, 4, 8, 8, 8)
)

# 创建示例数据集
data <- rbind(data, outliers)

# cyl = 4 的上部箱须长度为 45 mpg
# cyl = 8 的下部箱须长度为 11 mpg
data %>%
  ggplot(aes(x = factor(cyl), y = mpg)) +
  geom_boxplot()

geom_boxplot的箱线图在改变y轴刻度时,箱线的长度会发生变化。


# 移除异常值并缩小 y 轴刻度
# cyl = 4 的上部箱须长度为 34 mpg
# cyl = 8 的下部箱须长度为 13 mpg
data %>%
  ggplot(aes(x = factor(cyl), y = mpg)) +
  geom_boxplot(outlier.shape = NA) +
  scale_y_continuous(breaks = seq(0, 45, 5), limits = c(0, 45))
# Warning: Removed 4 rows containing non-finite values (`stat_boxplot()`).

geom_boxplot的箱线图在改变y轴刻度时,箱线的长度会发生变化。

创建于 2023-06-21,使用 reprex v2.0.2

英文:

The length of geom_boxplot() whisker shrinks when changing the y-axis scale and removing outliers. How can I make sure the whiskers do not shrink?

In the example below, the first plot shows all the data, inclusive of outliers. For cyl = 4, the upper whisker extends to mpg = 45. Similarly, the lower whisker for cyl = 8 extends to mpg = 11

In the second plot, the outliers are removed using outlier.shape = NA and the y-axis range is reduced. However, for cyl = 4 the upper whisker is now mpg = 34 and the lower whisker for cyl = 8 is now mpg = 13. I would expect the upper whisker for cyl = 4 to still be mpg = 45 and the lower whisker for cyl = 8 to still be mpg = 11.

library(ggplot2)
#&gt; Warning: package &#39;ggplot2&#39; was built under R version 4.2.3
library(dplyr)
#&gt; Warning: package &#39;dplyr&#39; was built under R version 4.2.3
#&gt; 
#&gt; Attaching package: &#39;dplyr&#39;
#&gt; The following objects are masked from &#39;package:stats&#39;:
#&gt; 
#&gt;     filter, lag
#&gt; The following objects are masked from &#39;package:base&#39;:
#&gt; 
#&gt;     intersect, setdiff, setequal, union

# Load the mtcars dataset
data(mtcars)

data &lt;- mtcars %&gt;%
  select(mpg,cyl)

# Add outliers to specific groups
outliers &lt;- data.frame(
  mpg = c(45,50,55,45,50,55),
  cyl = c(4,4,4,8,8,8))

# Create example dataset
data &lt;- rbind(data, outliers)

# Upper whisker length for cyl = 4 is at 45 mpg
# Lower whisker length for cyl = 8 is at 11 mpg
data %&gt;%
  ggplot(aes(x = factor(cyl), y = mpg)) +
  geom_boxplot()

geom_boxplot的箱线图在改变y轴刻度时,箱线的长度会发生变化。


# Remove outliers and reduce y-axis scale
# Upper whisker length for cyl = 4 is at 34 mpg
# Lower whisker length for cyl = 8 is at 13 mpg
data %&gt;%
  ggplot(aes(x = factor(cyl), y = mpg)) +
  geom_boxplot(outlier.shape = NA) +
  scale_y_continuous(breaks = seq(0,45,5), limits = c(0,45))
#&gt; Warning: Removed 4 rows containing non-finite values (`stat_boxplot()`).

geom_boxplot的箱线图在改变y轴刻度时,箱线的长度会发生变化。

<sup>Created on 2023-06-21 with reprex v2.0.2</sup>

答案1

得分: 2

通过将限制设置为 45,实际上是移除了范围之外的任何数据,从而导致图表较短。

通过设置 breaks = seq(0,55,5), limits = c(0,55),您可以使您的盒须图的结束处保持在 45。

如果您想保持图表在 0 到 45 之间,您可以使用函数 coord_cartesian(ylim=c(0, 45)),如下所示:

data %>%
  ggplot(aes(x = factor(cyl), y = mpg)) +
  geom_boxplot(outlier.shape = NA) +
  coord_cartesian(ylim=c(0, 45))

请参阅完整说明的链接

英文:

By setting your limit to 45 you are actually removing any data outside of this range, resulting in a shorter plot.

You can tell by setting breaks = seq(0,55,5), limits = c(0,55) will keep your whisker ending at 45.

If you'd like to keep the plot between 0 and 45 you can use the function coord_cartesian(ylim=c(0, 45)) as in:

data %&gt;%
  ggplot(aes(x = factor(cyl), y = mpg)) +
  geom_boxplot(outlier.shape = NA) +
  coord_cartesian(ylim=c(0, 45))

Please see https://stackoverflow.com/questions/25685185/limit-ggplot2-axes-without-removing-data-outside-limits-zoom for complete explanation.

huangapple
  • 本文由 发表于 2023年6月21日 23:03:44
  • 转载请务必保留本文链接:https://go.coder-hub.com/76524718.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定