`R`/`ggplot2`:合并个别`geom_histogram`层时的奇怪现象

huangapple go评论94阅读模式
英文:

`R`/`ggplot2`: Strangeness when combining individual `geom_histogram` layers

问题

I'm striving to replicate aspects of this nice demonstration using R/ggplot2.

For the first point (plotting different bin counts) I have managed nicely, but when replicating that approach for the second point (different maxima), it strangely fails and I can't figure out why.

Here's what I do for demonstrating the lack of robustness as pointed out in 2. of the above post:

library(dplyr)
library(ggplot2)
library(magrittr)
requireNamespace("kmed")

# Define the new maxima to be used
maxima <- c(197, 202, 213, 224)

# Replicate the UCI heart data set
m_df <- bind_rows(
  replicate(length(maxima), kmed::heart, simplify = FALSE)) %>%
  dplyr::mutate(
    sm  = rep(maxima, each = nrow(kmed::heart)),
    fsm = paste("Maximum Edited to:", rep(maxima, each = nrow(kmed::heart)))
  )

# Modify the maxima
m_df[which(m_df$thalach == max(m_df$thalach)), "thalach"] <- maxima

# Generate individual geom_histogram layers, forcing an identical number of bins
m_lp_hist <- plyr::llply(maxima, function(b) {
  geom_histogram(
    data    = m_df %>% filter(sm == b),
    mapping = aes(x = thalach),
    bins    = 20
  )
})

# Combine the layers
m_p_hist <- Reduce("+", m_lp_hist, init = ggplot2::ggplot())
m_p_hist +
  ggplot2::facet_wrap(. ~ fsm, scales = "free_y") +
  ggplot2::theme_bw() +
  ggplot2::labs(x = "Maximum Heart Rate Achieved", y = "Count")

When evaluating the resulting plot, the histograms are very similar (!), while stepping through the individual layers generated using ggplot() + m_p_list[[1]] etc. nicely shows the difference between the layers and thus the effect I intend to demonstrate (lack of robustness in histograms). What is going on here!?

Thanks for any pointers.

英文:

I'm striving to replicate aspects of this nice demonstration using R/ggplot2.

For the first point (plotting different bin counts) I have managed nicely, but when replicating that approach for the second point (different maxima), it strangely fails and I can't figure out why.

Here's what I do for demonstrating the lack of robustness as pointed out in 2. of above post:

library(dplyr)
library(ggplot2)
library(magrittr)
requireNamespace(&quot;kmed&quot;)

# Define the new maxima to be used
maxima &lt;- c(197, 202, 213, 224)

# Replicate the UCI heart data set
m_df &lt;- bind_rows(
  replicate(length(maxima), kmed::heart, simplify = FALSE)) %&gt;%
  dplyr::mutate(
    sm  = rep(maxima, each = nrow(kmed::heart)),
    fsm = paste(&quot;Maximum Edited to:&quot;, rep(maxima, each = nrow(kmed::heart))))

# Modify the maxima
m_df[which(m_df$thalach == max(m_df$thalach)), &quot;thalach&quot;] &lt;- maxima

# Generate individual geom_histogram layers, forcing identical number of bins
m_lp_hist &lt;- plyr::llply(maxima, function(b) {
  geom_histogram(
    data    = m_df %&gt;% filter(sm == b),
    mapping = aes(x = thalach),
    bins    = 20)
  })

# Combine the layers
m_p_hist &lt;- Reduce(&quot;+&quot;, m_lp_hist, init = ggplot2::ggplot())
m_p_hist +
  ggplot2::facet_wrap(. ~ fsm, scales = &quot;free_y&quot;) +
  ggplot2::theme_bw() +
  ggplot2::labs(x = &quot;Maximum Heart Rate Achieved&quot;, y = &quot;Count&quot;)

When evaluating the resulting plot, the histograms are very similar (!), while stepping through the individual layers generated using ggplot() + m_p_list[[1]] etc. nicely shows the difference between the layers and thus the effect I intend to demonstrate (lack of robustness in histograms). What is going on here!?

Thanks for any pointers

答案1

得分: 1

Here is the translated content:

你距离(我认为的)你想要的很接近。请尝试使用以下代码:

```R
m_p_hist +
  ggplot2::facet_wrap(. ~ fsm, scales = "free") + # 替代 free_y
  ggplot2::theme_bw() +
  ggplot2::labs(x = "达到的最大心率", y = "计数")

`R`/`ggplot2`:合并个别`geom_histogram`层时的奇怪现象

解释

你原始的代码将所有4个直方图层添加到单个 ggplot 对象中,然后再将它们分为4个单独的图表面板,而不允许 x 轴刻度在不同图表面板之间变化。因此,x 轴范围/限制是根据所有4个图层的组合数据计算的,每个直方图的20个箱子也是从相同的范围计算的。

通过将 scales = "free_y" 更改为 scales = "free",我们允许 x 轴刻度变化,并且每个直方图的20个箱子是根据不同范围的值计算的。

顺便说一下,我不太清楚为什么你要通过 facet_wrap 创建这个图。我会选择创建4个单独的 ggplot 对象,然后将它们拼接成一个单一的图表进行演示,这将完全避免上述问题。例如:

lapply(m_lp_hist, 
       function(p) ggplot() + 
         p + 
         labs(title = p$data$fsm[[1]]) +
         theme_bw()) %>%
  cowplot::plot_grid(plotlist = ., nrow = 2, ncol = 2)

Please note that the code and content have been translated, and I have excluded the parts you mentioned not to translate.

<details>
<summary>英文:</summary>

You are very close to (what I think) you want. Try this instead:

m_p_hist +
ggplot2::facet_wrap(. ~ fsm, scales = "free") + # instead of free_y
ggplot2::theme_bw() +
ggplot2::labs(x = "Maximum Heart Rate Achieved", y = "Count")


[![result][1]][1]

## Explanation

Your original code adds all 4 histogram layers into a single ggplot object before faceting them out into 4 separate plot panels again, **without allowing x-axis scales to vary across facets.** As a result, the x axis range / limits are calculated from the combined data of all 4 layers, and the 20 bins for each histogram are calculated from the same range.

By changing `scales = &quot;free_y&quot;` to `scales = &quot;free&quot;`, we allow x-axis scales to vary, and the 20 bins for each histogram are calculated from a different range of values.

By the way, it&#39;s not very clear to me why you&#39;d want to create this via `facet_wrap`. I&#39;d have gone with making 4 separate ggplot objects &amp; stitching them together into a single chart afterwards for presentation purpose, which would have avoided the above issue all together. E.g.:

lapply(m_lp_hist,
function(p) ggplot() +
p +
labs(title = p$data$fsm[1]) +
theme_bw()) %>%
cowplot::plot_grid(plotlist = ., nrow = 2, ncol = 2)



  [1]: https://i.stack.imgur.com/iuWld.png

</details>



huangapple
  • 本文由 发表于 2023年5月14日 19:09:50
  • 转载请务必保留本文链接:https://go.coder-hub.com/76247153.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定