2023年5月14日 19:09:50go评论149阅读模式

英文:

`R`/`ggplot2`: Strangeness when combining individual `geom_histogram` layers

问题

I'm striving to replicate aspects of this nice demonstration using R/ggplot2.

For the first point (plotting different bin counts) I have managed nicely, but when replicating that approach for the second point (different maxima), it strangely fails and I can't figure out why.

Here's what I do for demonstrating the lack of robustness as pointed out in 2. of the above post:

library(dplyr)
library(ggplot2)
library(magrittr)
requireNamespace("kmed")
# Define the new maxima to be used
maxima <- c(197, 202, 213, 224)
# Replicate the UCI heart data set
m_df <- bind_rows(
  replicate(length(maxima), kmed::heart, simplify = FALSE)) %>%
  dplyr::mutate(
    sm  = rep(maxima, each = nrow(kmed::heart)),
    fsm = paste("Maximum Edited to:", rep(maxima, each = nrow(kmed::heart)))
  )
# Modify the maxima
m_df[which(m_df$thalach == max(m_df$thalach)), "thalach"] <- maxima
# Generate individual geom_histogram layers, forcing an identical number of bins
m_lp_hist <- plyr::llply(maxima, function(b) {
  geom_histogram(
    data    = m_df %>% filter(sm == b),
    mapping = aes(x = thalach),
    bins    = 20
  )
})
# Combine the layers
m_p_hist <- Reduce("+", m_lp_hist, init = ggplot2::ggplot())
m_p_hist +
  ggplot2::facet_wrap(. ~ fsm, scales = "free_y") +
  ggplot2::theme_bw() +
  ggplot2::labs(x = "Maximum Heart Rate Achieved", y = "Count")

When evaluating the resulting plot, the histograms are very similar (!), while stepping through the individual layers generated using ggplot() + m_p_list[[1]] etc. nicely shows the difference between the layers and thus the effect I intend to demonstrate (lack of robustness in histograms). What is going on here!?

Thanks for any pointers.

英文:

I'm striving to replicate aspects of this nice demonstration using R/ggplot2.

For the first point (plotting different bin counts) I have managed nicely, but when replicating that approach for the second point (different maxima), it strangely fails and I can't figure out why.

Here's what I do for demonstrating the lack of robustness as pointed out in 2. of above post:

library(dplyr)
library(ggplot2)
library(magrittr)
requireNamespace(&quot;kmed&quot;)
# Define the new maxima to be used
maxima &lt;- c(197, 202, 213, 224)
# Replicate the UCI heart data set
m_df &lt;- bind_rows(
  replicate(length(maxima), kmed::heart, simplify = FALSE)) %&gt;%
  dplyr::mutate(
    sm  = rep(maxima, each = nrow(kmed::heart)),
    fsm = paste(&quot;Maximum Edited to:&quot;, rep(maxima, each = nrow(kmed::heart))))
# Modify the maxima
m_df[which(m_df$thalach == max(m_df$thalach)), &quot;thalach&quot;] &lt;- maxima
# Generate individual geom_histogram layers, forcing identical number of bins
m_lp_hist &lt;- plyr::llply(maxima, function(b) {
  geom_histogram(
    data    = m_df %&gt;% filter(sm == b),
    mapping = aes(x = thalach),
    bins    = 20)
  })
# Combine the layers
m_p_hist &lt;- Reduce(&quot;+&quot;, m_lp_hist, init = ggplot2::ggplot())
m_p_hist +
  ggplot2::facet_wrap(. ~ fsm, scales = &quot;free_y&quot;) +
  ggplot2::theme_bw() +
  ggplot2::labs(x = &quot;Maximum Heart Rate Achieved&quot;, y = &quot;Count&quot;)

Thanks for any pointers

答案1

得分: 1

Here is the translated content:

你距离（我认为的）你想要的很接近。请尝试使用以下代码：
```R
m_p_hist +
  ggplot2::facet_wrap(. ~ fsm, scales = "free") + # 替代 free_y
  ggplot2::theme_bw() +
  ggplot2::labs(x = "达到的最大心率", y = "计数")

解释

你原始的代码将所有4个直方图层添加到单个 ggplot 对象中，然后再将它们分为4个单独的图表面板，而不允许 x 轴刻度在不同图表面板之间变化。因此，x 轴范围/限制是根据所有4个图层的组合数据计算的，每个直方图的20个箱子也是从相同的范围计算的。

通过将 scales = "free_y" 更改为 scales = "free"，我们允许 x 轴刻度变化，并且每个直方图的20个箱子是根据不同范围的值计算的。

顺便说一下，我不太清楚为什么你要通过 facet_wrap 创建这个图。我会选择创建4个单独的 ggplot 对象，然后将它们拼接成一个单一的图表进行演示，这将完全避免上述问题。例如：

lapply(m_lp_hist, 
       function(p) ggplot() + 
         p + 
         labs(title = p$data$fsm[[1]]) +
         theme_bw()) %>%
  cowplot::plot_grid(plotlist = ., nrow = 2, ncol = 2)


Please note that the code and content have been translated, and I have excluded the parts you mentioned not to translate.
<details>
<summary>英文:</summary>
You are very close to (what I think) you want. Try this instead:

m_p_hist +
ggplot2::facet_wrap(. ~ fsm, scales = "free") + # instead of free_y
ggplot2::theme_bw() +
ggplot2::labs(x = "Maximum Heart Rate Achieved", y = "Count")


[![result][1]][1]
## Explanation
Your original code adds all 4 histogram layers into a single ggplot object before faceting them out into 4 separate plot panels again, **without allowing x-axis scales to vary across facets.** As a result, the x axis range / limits are calculated from the combined data of all 4 layers, and the 20 bins for each histogram are calculated from the same range.
By changing `scales = &quot;free_y&quot;` to `scales = &quot;free&quot;`, we allow x-axis scales to vary, and the 20 bins for each histogram are calculated from a different range of values.
By the way, it&#39;s not very clear to me why you&#39;d want to create this via `facet_wrap`. I&#39;d have gone with making 4 separate ggplot objects &amp; stitching them together into a single chart afterwards for presentation purpose, which would have avoided the above issue all together. E.g.:

lapply(m_lp_hist,
function(p) ggplot() +
p +
labs(title = p$data$fsm[1]) +
theme_bw()) %>%
cowplot::plot_grid(plotlist = ., nrow = 2, ncol = 2)


  [1]: https://i.stack.imgur.com/iuWld.png
</details>

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

`R`/`ggplot2`：合并个别`geom_histogram`层时的奇怪现象

问题

答案1

解释

Display the distribution of two groups on the same plot, using two data frames.

OpenAI ChatGPT (GPT-3.5) API错误 400: “‘user’ 不是类型为 ‘object’ 的对象”

使用R计算每个周期内的时间间隔。

如何使用R中的Pipe函数将一行除以另一行

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。