`R`/`ggplot2`:合并个别`geom_histogram`层时的奇怪现象

huangapple go评论149阅读模式
英文:

`R`/`ggplot2`: Strangeness when combining individual `geom_histogram` layers

问题

I'm striving to replicate aspects of this nice demonstration using R/ggplot2.

For the first point (plotting different bin counts) I have managed nicely, but when replicating that approach for the second point (different maxima), it strangely fails and I can't figure out why.

Here's what I do for demonstrating the lack of robustness as pointed out in 2. of the above post:

  1. library(dplyr)
  2. library(ggplot2)
  3. library(magrittr)
  4. requireNamespace("kmed")
  5. # Define the new maxima to be used
  6. maxima <- c(197, 202, 213, 224)
  7. # Replicate the UCI heart data set
  8. m_df <- bind_rows(
  9. replicate(length(maxima), kmed::heart, simplify = FALSE)) %>%
  10. dplyr::mutate(
  11. sm = rep(maxima, each = nrow(kmed::heart)),
  12. fsm = paste("Maximum Edited to:", rep(maxima, each = nrow(kmed::heart)))
  13. )
  14. # Modify the maxima
  15. m_df[which(m_df$thalach == max(m_df$thalach)), "thalach"] <- maxima
  16. # Generate individual geom_histogram layers, forcing an identical number of bins
  17. m_lp_hist <- plyr::llply(maxima, function(b) {
  18. geom_histogram(
  19. data = m_df %>% filter(sm == b),
  20. mapping = aes(x = thalach),
  21. bins = 20
  22. )
  23. })
  24. # Combine the layers
  25. m_p_hist <- Reduce("+", m_lp_hist, init = ggplot2::ggplot())
  26. m_p_hist +
  27. ggplot2::facet_wrap(. ~ fsm, scales = "free_y") +
  28. ggplot2::theme_bw() +
  29. ggplot2::labs(x = "Maximum Heart Rate Achieved", y = "Count")

When evaluating the resulting plot, the histograms are very similar (!), while stepping through the individual layers generated using ggplot() + m_p_list[[1]] etc. nicely shows the difference between the layers and thus the effect I intend to demonstrate (lack of robustness in histograms). What is going on here!?

Thanks for any pointers.

英文:

I'm striving to replicate aspects of this nice demonstration using R/ggplot2.

For the first point (plotting different bin counts) I have managed nicely, but when replicating that approach for the second point (different maxima), it strangely fails and I can't figure out why.

Here's what I do for demonstrating the lack of robustness as pointed out in 2. of above post:

  1. library(dplyr)
  2. library(ggplot2)
  3. library(magrittr)
  4. requireNamespace(&quot;kmed&quot;)
  5. # Define the new maxima to be used
  6. maxima &lt;- c(197, 202, 213, 224)
  7. # Replicate the UCI heart data set
  8. m_df &lt;- bind_rows(
  9. replicate(length(maxima), kmed::heart, simplify = FALSE)) %&gt;%
  10. dplyr::mutate(
  11. sm = rep(maxima, each = nrow(kmed::heart)),
  12. fsm = paste(&quot;Maximum Edited to:&quot;, rep(maxima, each = nrow(kmed::heart))))
  13. # Modify the maxima
  14. m_df[which(m_df$thalach == max(m_df$thalach)), &quot;thalach&quot;] &lt;- maxima
  15. # Generate individual geom_histogram layers, forcing identical number of bins
  16. m_lp_hist &lt;- plyr::llply(maxima, function(b) {
  17. geom_histogram(
  18. data = m_df %&gt;% filter(sm == b),
  19. mapping = aes(x = thalach),
  20. bins = 20)
  21. })
  22. # Combine the layers
  23. m_p_hist &lt;- Reduce(&quot;+&quot;, m_lp_hist, init = ggplot2::ggplot())
  24. m_p_hist +
  25. ggplot2::facet_wrap(. ~ fsm, scales = &quot;free_y&quot;) +
  26. ggplot2::theme_bw() +
  27. ggplot2::labs(x = &quot;Maximum Heart Rate Achieved&quot;, y = &quot;Count&quot;)

When evaluating the resulting plot, the histograms are very similar (!), while stepping through the individual layers generated using ggplot() + m_p_list[[1]] etc. nicely shows the difference between the layers and thus the effect I intend to demonstrate (lack of robustness in histograms). What is going on here!?

Thanks for any pointers

答案1

得分: 1

Here is the translated content:

  1. 你距离(我认为的)你想要的很接近。请尝试使用以下代码:
  2. ```R
  3. m_p_hist +
  4. ggplot2::facet_wrap(. ~ fsm, scales = "free") + # 替代 free_y
  5. ggplot2::theme_bw() +
  6. ggplot2::labs(x = "达到的最大心率", y = "计数")

`R`/`ggplot2`:合并个别`geom_histogram`层时的奇怪现象

解释

你原始的代码将所有4个直方图层添加到单个 ggplot 对象中,然后再将它们分为4个单独的图表面板,而不允许 x 轴刻度在不同图表面板之间变化。因此,x 轴范围/限制是根据所有4个图层的组合数据计算的,每个直方图的20个箱子也是从相同的范围计算的。

通过将 scales = "free_y" 更改为 scales = "free",我们允许 x 轴刻度变化,并且每个直方图的20个箱子是根据不同范围的值计算的。

顺便说一下,我不太清楚为什么你要通过 facet_wrap 创建这个图。我会选择创建4个单独的 ggplot 对象,然后将它们拼接成一个单一的图表进行演示,这将完全避免上述问题。例如:

  1. lapply(m_lp_hist,
  2. function(p) ggplot() +
  3. p +
  4. labs(title = p$data$fsm[[1]]) +
  5. theme_bw()) %>%
  6. cowplot::plot_grid(plotlist = ., nrow = 2, ncol = 2)
  1. Please note that the code and content have been translated, and I have excluded the parts you mentioned not to translate.
  2. <details>
  3. <summary>英文:</summary>
  4. You are very close to (what I think) you want. Try this instead:

m_p_hist +
ggplot2::facet_wrap(. ~ fsm, scales = "free") + # instead of free_y
ggplot2::theme_bw() +
ggplot2::labs(x = "Maximum Heart Rate Achieved", y = "Count")

  1. [![result][1]][1]
  2. ## Explanation
  3. Your original code adds all 4 histogram layers into a single ggplot object before faceting them out into 4 separate plot panels again, **without allowing x-axis scales to vary across facets.** As a result, the x axis range / limits are calculated from the combined data of all 4 layers, and the 20 bins for each histogram are calculated from the same range.
  4. By changing `scales = &quot;free_y&quot;` to `scales = &quot;free&quot;`, we allow x-axis scales to vary, and the 20 bins for each histogram are calculated from a different range of values.
  5. By the way, it&#39;s not very clear to me why you&#39;d want to create this via `facet_wrap`. I&#39;d have gone with making 4 separate ggplot objects &amp; stitching them together into a single chart afterwards for presentation purpose, which would have avoided the above issue all together. E.g.:

lapply(m_lp_hist,
function(p) ggplot() +
p +
labs(title = p$data$fsm[1]) +
theme_bw()) %>%
cowplot::plot_grid(plotlist = ., nrow = 2, ncol = 2)

  1. [1]: https://i.stack.imgur.com/iuWld.png
  2. </details>

huangapple
  • 本文由 发表于 2023年5月14日 19:09:50
  • 转载请务必保留本文链接:https://go.coder-hub.com/76247153.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定