英文:
`R`/`ggplot2`: Strangeness when combining individual `geom_histogram` layers
问题
I'm striving to replicate aspects of this nice demonstration using R
/ggplot2
.
For the first point (plotting different bin counts) I have managed nicely, but when replicating that approach for the second point (different maxima), it strangely fails and I can't figure out why.
Here's what I do for demonstrating the lack of robustness as pointed out in 2. of the above post:
library(dplyr)
library(ggplot2)
library(magrittr)
requireNamespace("kmed")
# Define the new maxima to be used
maxima <- c(197, 202, 213, 224)
# Replicate the UCI heart data set
m_df <- bind_rows(
replicate(length(maxima), kmed::heart, simplify = FALSE)) %>%
dplyr::mutate(
sm = rep(maxima, each = nrow(kmed::heart)),
fsm = paste("Maximum Edited to:", rep(maxima, each = nrow(kmed::heart)))
)
# Modify the maxima
m_df[which(m_df$thalach == max(m_df$thalach)), "thalach"] <- maxima
# Generate individual geom_histogram layers, forcing an identical number of bins
m_lp_hist <- plyr::llply(maxima, function(b) {
geom_histogram(
data = m_df %>% filter(sm == b),
mapping = aes(x = thalach),
bins = 20
)
})
# Combine the layers
m_p_hist <- Reduce("+", m_lp_hist, init = ggplot2::ggplot())
m_p_hist +
ggplot2::facet_wrap(. ~ fsm, scales = "free_y") +
ggplot2::theme_bw() +
ggplot2::labs(x = "Maximum Heart Rate Achieved", y = "Count")
When evaluating the resulting plot, the histograms are very similar (!), while stepping through the individual layers generated using ggplot() + m_p_list[[1]]
etc. nicely shows the difference between the layers and thus the effect I intend to demonstrate (lack of robustness in histograms). What is going on here!?
Thanks for any pointers.
英文:
I'm striving to replicate aspects of this nice demonstration using R
/ggplot2
.
For the first point (plotting different bin counts) I have managed nicely, but when replicating that approach for the second point (different maxima), it strangely fails and I can't figure out why.
Here's what I do for demonstrating the lack of robustness as pointed out in 2. of above post:
library(dplyr)
library(ggplot2)
library(magrittr)
requireNamespace("kmed")
# Define the new maxima to be used
maxima <- c(197, 202, 213, 224)
# Replicate the UCI heart data set
m_df <- bind_rows(
replicate(length(maxima), kmed::heart, simplify = FALSE)) %>%
dplyr::mutate(
sm = rep(maxima, each = nrow(kmed::heart)),
fsm = paste("Maximum Edited to:", rep(maxima, each = nrow(kmed::heart))))
# Modify the maxima
m_df[which(m_df$thalach == max(m_df$thalach)), "thalach"] <- maxima
# Generate individual geom_histogram layers, forcing identical number of bins
m_lp_hist <- plyr::llply(maxima, function(b) {
geom_histogram(
data = m_df %>% filter(sm == b),
mapping = aes(x = thalach),
bins = 20)
})
# Combine the layers
m_p_hist <- Reduce("+", m_lp_hist, init = ggplot2::ggplot())
m_p_hist +
ggplot2::facet_wrap(. ~ fsm, scales = "free_y") +
ggplot2::theme_bw() +
ggplot2::labs(x = "Maximum Heart Rate Achieved", y = "Count")
When evaluating the resulting plot, the histograms are very similar (!), while stepping through the individual layers generated using ggplot() + m_p_list[[1]]
etc. nicely shows the difference between the layers and thus the effect I intend to demonstrate (lack of robustness in histograms). What is going on here!?
Thanks for any pointers
答案1
得分: 1
Here is the translated content:
你距离(我认为的)你想要的很接近。请尝试使用以下代码:
```R
m_p_hist +
ggplot2::facet_wrap(. ~ fsm, scales = "free") + # 替代 free_y
ggplot2::theme_bw() +
ggplot2::labs(x = "达到的最大心率", y = "计数")
解释
你原始的代码将所有4个直方图层添加到单个 ggplot 对象中,然后再将它们分为4个单独的图表面板,而不允许 x 轴刻度在不同图表面板之间变化。因此,x 轴范围/限制是根据所有4个图层的组合数据计算的,每个直方图的20个箱子也是从相同的范围计算的。
通过将 scales = "free_y"
更改为 scales = "free"
,我们允许 x 轴刻度变化,并且每个直方图的20个箱子是根据不同范围的值计算的。
顺便说一下,我不太清楚为什么你要通过 facet_wrap
创建这个图。我会选择创建4个单独的 ggplot 对象,然后将它们拼接成一个单一的图表进行演示,这将完全避免上述问题。例如:
lapply(m_lp_hist,
function(p) ggplot() +
p +
labs(title = p$data$fsm[[1]]) +
theme_bw()) %>%
cowplot::plot_grid(plotlist = ., nrow = 2, ncol = 2)
Please note that the code and content have been translated, and I have excluded the parts you mentioned not to translate.
<details>
<summary>英文:</summary>
You are very close to (what I think) you want. Try this instead:
m_p_hist +
ggplot2::facet_wrap(. ~ fsm, scales = "free") + # instead of free_y
ggplot2::theme_bw() +
ggplot2::labs(x = "Maximum Heart Rate Achieved", y = "Count")
[![result][1]][1]
## Explanation
Your original code adds all 4 histogram layers into a single ggplot object before faceting them out into 4 separate plot panels again, **without allowing x-axis scales to vary across facets.** As a result, the x axis range / limits are calculated from the combined data of all 4 layers, and the 20 bins for each histogram are calculated from the same range.
By changing `scales = "free_y"` to `scales = "free"`, we allow x-axis scales to vary, and the 20 bins for each histogram are calculated from a different range of values.
By the way, it's not very clear to me why you'd want to create this via `facet_wrap`. I'd have gone with making 4 separate ggplot objects & stitching them together into a single chart afterwards for presentation purpose, which would have avoided the above issue all together. E.g.:
lapply(m_lp_hist,
function(p) ggplot() +
p +
labs(title = p$data$fsm[1]) +
theme_bw()) %>%
cowplot::plot_grid(plotlist = ., nrow = 2, ncol = 2)
[1]: https://i.stack.imgur.com/iuWld.png
</details>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论