英文:
geom_boxplot whisker length changing when changing y-axis scale
问题
geom_boxplot()
的箱线图的箱须在改变 y 轴刻度和移除异常值时会缩短。如何确保箱须不缩短呢?
在下面的示例中,第一个图显示了所有数据,包括异常值。对于 cyl
= 4,上部箱须延伸到 mpg
= 45。类似地,对于 cyl
= 8,下部箱须延伸到 mpg
= 11。
在第二个图中,使用 outlier.shape = NA
移除了异常值,并缩小了 y 轴范围。然而,对于 cyl
= 4,上部箱须现在是 mpg
= 34,而对于 cyl
= 8,下部箱须现在是 mpg
= 13。我期望 cyl
= 4 的上部箱须仍然是 mpg
= 45,而 cyl
= 8 的下部箱须仍然是 mpg
= 11。
library(ggplot2)
library(dplyr)
# 加载 mtcars 数据集
data(mtcars)
data <- mtcars %>%
select(mpg, cyl)
# 向特定组添加异常值
outliers <- data.frame(
mpg = c(45, 50, 55, 45, 50, 55),
cyl = c(4, 4, 4, 8, 8, 8)
)
# 创建示例数据集
data <- rbind(data, outliers)
# cyl = 4 的上部箱须长度为 45 mpg
# cyl = 8 的下部箱须长度为 11 mpg
data %>%
ggplot(aes(x = factor(cyl), y = mpg)) +
geom_boxplot()
# 移除异常值并缩小 y 轴刻度
# cyl = 4 的上部箱须长度为 34 mpg
# cyl = 8 的下部箱须长度为 13 mpg
data %>%
ggplot(aes(x = factor(cyl), y = mpg)) +
geom_boxplot(outlier.shape = NA) +
scale_y_continuous(breaks = seq(0, 45, 5), limits = c(0, 45))
# Warning: Removed 4 rows containing non-finite values (`stat_boxplot()`).
创建于 2023-06-21,使用 reprex v2.0.2
英文:
The length of geom_boxplot()
whisker shrinks when changing the y-axis scale and removing outliers. How can I make sure the whiskers do not shrink?
In the example below, the first plot shows all the data, inclusive of outliers. For cyl
= 4, the upper whisker extends to mpg
= 45. Similarly, the lower whisker for cyl
= 8 extends to mpg
= 11
In the second plot, the outliers are removed using outlier.shape = NA
and the y-axis range is reduced. However, for cyl
= 4 the upper whisker is now mpg
= 34 and the lower whisker for cyl
= 8 is now mpg
= 13. I would expect the upper whisker for cyl
= 4 to still be mpg
= 45 and the lower whisker for cyl
= 8 to still be mpg
= 11.
library(ggplot2)
#> Warning: package 'ggplot2' was built under R version 4.2.3
library(dplyr)
#> Warning: package 'dplyr' was built under R version 4.2.3
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
# Load the mtcars dataset
data(mtcars)
data <- mtcars %>%
select(mpg,cyl)
# Add outliers to specific groups
outliers <- data.frame(
mpg = c(45,50,55,45,50,55),
cyl = c(4,4,4,8,8,8))
# Create example dataset
data <- rbind(data, outliers)
# Upper whisker length for cyl = 4 is at 45 mpg
# Lower whisker length for cyl = 8 is at 11 mpg
data %>%
ggplot(aes(x = factor(cyl), y = mpg)) +
geom_boxplot()
# Remove outliers and reduce y-axis scale
# Upper whisker length for cyl = 4 is at 34 mpg
# Lower whisker length for cyl = 8 is at 13 mpg
data %>%
ggplot(aes(x = factor(cyl), y = mpg)) +
geom_boxplot(outlier.shape = NA) +
scale_y_continuous(breaks = seq(0,45,5), limits = c(0,45))
#> Warning: Removed 4 rows containing non-finite values (`stat_boxplot()`).
<sup>Created on 2023-06-21 with reprex v2.0.2</sup>
答案1
得分: 2
通过将限制设置为 45
,实际上是移除了范围之外的任何数据,从而导致图表较短。
通过设置 breaks = seq(0,55,5), limits = c(0,55)
,您可以使您的盒须图的结束处保持在 45。
如果您想保持图表在 0 到 45 之间,您可以使用函数 coord_cartesian(ylim=c(0, 45))
,如下所示:
data %>%
ggplot(aes(x = factor(cyl), y = mpg)) +
geom_boxplot(outlier.shape = NA) +
coord_cartesian(ylim=c(0, 45))
请参阅完整说明的链接。
英文:
By setting your limit to 45
you are actually removing any data outside of this range, resulting in a shorter plot.
You can tell by setting breaks = seq(0,55,5), limits = c(0,55)
will keep your whisker ending at 45.
If you'd like to keep the plot between 0 and 45 you can use the function coord_cartesian(ylim=c(0, 45))
as in:
data %>%
ggplot(aes(x = factor(cyl), y = mpg)) +
geom_boxplot(outlier.shape = NA) +
coord_cartesian(ylim=c(0, 45))
Please see https://stackoverflow.com/questions/25685185/limit-ggplot2-axes-without-removing-data-outside-limits-zoom for complete explanation.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论