2023年2月14日 03:52:30go评论94阅读模式

英文:

How to overlap R histograms

问题

以下是代码的翻译部分：

# 从[这里](https://stackoverflow.com/questions/64474714/run-svymean-on-all-variables)的代码复制而来：
library(haven)
library(survey)
library(dplyr)
nhanesDemo <- read_xpt(url("https://wwwn.cdc.gov/Nchs/Nhanes/2015-2016/DEMO_I.XPT"))
# 将变量重命名为更可读的名称
nhanesDemo$fpl <- nhanesDemo$INDFMPIR
nhanesDemo$age <- nhanesDemo$RIDAGEYR
nhanesDemo$gender <- nhanesDemo$RIAGENDR
nhanesDemo$persWeight <- nhanesDemo$WTINT2YR
nhanesDemo$psu <- nhanesDemo$SDMVPSU
nhanesDemo$strata <- nhanesDemo$SDMVSTRA
nhanesAnalysis <- nhanesDemo %>%
  mutate(LowIncome = case_when(
    INDFMIN2 < 40 ~ TRUE,
    TRUE ~ FALSE
  )) %>%
  # 选择必要的列
  select(INDFMIN2, LowIncome, persWeight, psu, strata)
# 设置设计
nhanesDesign <- svydesign(id      = ~psu,
                          strata  = ~strata,
                          weights = ~persWeight,
                          nest    = TRUE,
                          data    = nhanesAnalysis)
svyhist(~log10(INDFMIN2), design=nhanesDesign, main = '')

希望这对你有帮助。如果有其他翻译需求，请告诉我。

英文:

Reproduced from this code:

library(haven)
library(survey)
library(dplyr)
nhanesDemo &lt;- read_xpt(url(&quot;https://wwwn.cdc.gov/Nchs/Nhanes/2015-2016/DEMO_I.XPT&quot;))
# Rename variables into something more readable
nhanesDemo$fpl &lt;- nhanesDemo$INDFMPIR
nhanesDemo$age &lt;- nhanesDemo$RIDAGEYR
nhanesDemo$gender &lt;- nhanesDemo$RIAGENDR
nhanesDemo$persWeight &lt;- nhanesDemo$WTINT2YR
nhanesDemo$psu &lt;- nhanesDemo$SDMVPSU
nhanesDemo$strata &lt;- nhanesDemo$SDMVSTRA
nhanesAnalysis &lt;- nhanesDemo %&gt;%
  mutate(LowIncome = case_when(
    INDFMIN2 &lt; 40 ~ T,
    T ~ F
  )) %&gt;%
  # Select the necessary columns
  select(INDFMIN2, LowIncome, persWeight, psu, strata)
# Set up the design
nhanesDesign &lt;- svydesign(id      = ~psu,
                          strata  = ~strata,
                          weights = ~persWeight,
                          nest    = TRUE,
                          data    = nhanesAnalysis)
svyhist(~log10(INDFMIN2), design=nhanesDesign, main = &#39;&#39;)

How do I color the histogram by independent variable, say, LowIncome? I want to have two separate histograms, one for each value of LowIncome. Unfortunately I picked a bad example, but I want them to be see-through in case their values overlap.

答案1

得分: 3

如果您想从您的模型绘制直方图，可以从model.frame中获取数据（这就是svyhist在内部执行的操作）。要按组获取填充的直方图，您可以在ggplot内使用此数据框：

library(ggplot2)
ggplot(model.frame(nhanesDesign), aes(log10(INDFMIN2), fill = LowIncome)) +
  geom_histogram(alpha = 0.5, color = "gray60", breaks = 0:20 / 10) +
  theme_classic()

编辑

正如Thomas Lumley指出的那样，这不包括抽样权重，所以如果您想要包括这一点，可以这样做：

ggplot(model.frame(nhanesDesign), aes(log10(INDFMIN2), fill = LowIncome)) +
  geom_histogram(aes(weight = persWeight), alpha = 0.5, 
                 color = "gray60", breaks = 0:20 / 10) +
  theme_classic()

为了演示这种方法的有效性，我们可以在ggplot中使用svyhist的数据示例来复制Thomas的方法。要获得不均匀的箱尺寸（如果需要的话），我们需要两个直方图层，尽管我猜这对于大多数用例可能不是必需的。

ggplot(model.frame(dstrat), aes(enroll)) +
  geom_histogram(aes(fill = "E", weight = pw, y = after_stat(density)),
                 data = subset(model.frame(dstrat), stype == "E"),
                 breaks = 0:35 * 100,
                 position = "identity", col = "gray50") +
  geom_histogram(aes(fill = "Not E", weight = pw, y = after_stat(density)),
                 data = subset(model.frame(dstrat), stype != "E"),
                 position = "identity", col = "gray50",
                 breaks = 0:7 * 500) +
  scale_fill_manual(NULL, values = c("#00880020", "#88000020")) +
  theme_classic()

英文:

If you want to plot a histogram from your model, you can get its data from model.frame (this is what svyhist does under the hood). To get the histogram filled by group, you could use this data frame inside ggplot:

library(ggplot2)
ggplot(model.frame(nhanesDesign), aes(log10(INDFMIN2), fill = LowIncome)) +
  geom_histogram(alpha = 0.5, color = &quot;gray60&quot;, breaks = 0:20 / 10) +
  theme_classic()

Edit

As Thomas Lumley points out, this does not incorporate sampling weights, so if you wanted this you could do:

ggplot(model.frame(nhanesDesign), aes(log10(INDFMIN2), fill = LowIncome)) +
  geom_histogram(aes(weight = persWeight), alpha = 0.5, 
                 color = &quot;gray60&quot;, breaks = 0:20 / 10) +
  theme_classic()

To demonstrate this approach works, we can replicate Thomas's approach in ggplot using the data example from svyhist. To get the uneven bin sizes (if this is desired), we need two histogram layers, though I'm guessing this would not be required for most use-cases.

ggplot(model.frame(dstrat), aes(enroll)) +
  geom_histogram(aes(fill = &quot;E&quot;, weight = pw, y = after_stat(density)),
                 data = subset(model.frame(dstrat), stype == &quot;E&quot;),
                 breaks = 0:35 * 100,
                 position = &quot;identity&quot;, col = &quot;gray50&quot;) +
  geom_histogram(aes(fill = &quot;Not E&quot;, weight = pw, y = after_stat(density)),
                 data = subset(model.frame(dstrat), stype != &quot;E&quot;),
                 position = &quot;identity&quot;, col = &quot;gray50&quot;,
                 breaks = 0:7 * 500) +
  scale_fill_manual(NULL, values = c(&quot;#00880020&quot;, &quot;#88000020&quot;)) +
  theme_classic()

答案2

得分: 1

你不能只提取数据并使用 ggplot，因为这样不会使用权重，从而忽略了 svyhist 的整个用意。你可以使用 add=TRUE 参数。你确实需要正确设置 x 和 y 轴范围，以确保整个图都可见。

使用 ?svyhist 中的数据示例：

svyhist(~enroll, subset(dstrat,stype=="E"), col="#00880020", ylim=c(0,0.003), xlim=c(0,3500))
svyhist(~enroll, subset(dstrat,stype!="E"), col="#88000020", add=TRUE)

英文:

You can't just extract the data and use ggplot, because that won't use the weights and so misses the whole point of svyhist. You can use the add=TRUE argument, though. You do need to set the x and y axis ranges correctly to make sure the whole plot is visible

Using the data example from ?svyhist

svyhist(~enroll, subset(dstrat,stype==&quot;E&quot;), col=&quot;#00880020&quot;,ylim=c(0,0.003),xlim=c(0,3500))
svyhist(~enroll, subset(dstrat,stype!=&quot;E&quot;), col=&quot;#88000020&quot;,add=TRUE)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何重叠R直方图

问题

答案1

答案2

非线性拟合在R中

我如何在每个交叉验证折叠中的每个训练部分上应用预处理，使用tidymodels？

如何制作小提琴图？

如何在facet_grid中删除特定列？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。