问题

我有一个数据框架：

set.seed(1)
d <- data.frame(year= c(2001:2005,2001:2005,2001:2005),
                income = sample(2000:10000,15,replace = T),
                gender = sample(1:2,15,replace = T),
                education = sample(1:3,15,replace = T)
)

因为在实际的数据框架中，我有比gender和education更多的变量，我想编写一个函数，用于绘制每个子组的收入核密度图与所有子组（根据性别和教育）的比较，并最终保存为PDF文件。以gender==1为例：

male <- d %>% filter(gender == 1)

density_all <- density(d$income)
density_male <- density(male$income)

d_density <- data.frame(x = density_all$x, 
                      density_all = density_all$y, 
                      density_male = density_male$y)

plot <- ggplot(d_density, aes(x)) + 
  geom_line(aes(y = density_all), color = "red") +
  geom_line(aes(y = density_male), color = "blue")

ggsave("subgroup_name.pdf", plot, width = 300, height = 250, units = "mm")

我考虑过将数据框架从长格式转换为宽格式，但每个子组的长度不会相同。
我还考虑过在循环内部进行嵌套循环，即首先循环遍历变量中的值（gender == 1或2），然后循环遍历变量（gender和education）。不确定哪种选项更好以及如何确切执行它。

您的建议将不胜感激。

英文:

I have a dataframe:

set.seed(1)
d &lt;- data.frame(year= c(2001:2005,2001:2005,2001:2005),
                income = sample(2000:10000,15,replace = T),
                gender = sample(1:2,15,replace = T),
                education = sample(1:3,15,replace = T)
)

Since in the actual dataframe, I have more variables than just gener and education, I want to write a function to plot income kernel densities for each subgroup vs. all under gender and education, and save as pdf for each subgroup at the end.
Take gender==1 as an example:

male &lt;- d %&gt;% filter(gender == 1)

density_all &lt;- density(d$income)
density_male &lt;- density(male$income)

d_densisty &lt;- data.frame(x = density_all$x, 
                      density_all = density_all$y, 
                      density_male = density_male$y)

plot &lt;- ggplot(d_densisty, aes(x)) + 
  geom_line(aes(y = density_all), color = &quot;red&quot;) +
  geom_line(aes(y = density_male), color = &quot;blue&quot;)

ggsave(&quot;subgroup_name.pdf&quot;, plot, width = 300, height = 250, units = &quot;mm&quot;)

I have thought about converting the dataframe from long to wide format, but the length of each subgroup won't be the same.
I also thought about doing a loop within a loop, i.e. first looping over the values in a variable (gender == 1 or 2), then looping over the variables (gender and education). Not sure which option is better and how exactly I can carry it out.

Your suggestions will be highly appreciated.

答案1

得分: 1

这将把所有的图表存储在一个列表的列表中，只需修改要包含的变量的初始向量：

plots <- map(.x = c("gender", "education"),
             .f = \(categ){
               map(.x = unique(d[[categ]]),
                   .f = \(lev){
                     plt <- ggplot(d) +
                       geom_density(aes(x = income, colour = "red")) +
                       geom_density(data = d[d[categ]== lev,], aes(x = income, colour = "blue"))
                   })
             })

您可以在列表上使用 walk() 来应用保存到 PDF 的函数，但我假设您会首先进行美观性调整、设置标题、隐藏图例等操作。

编辑：
允许绘制差异的版本：

plots <- map(.x = c("gender", "education"),
             .f = \(categ){
               tmpdf <- data.frame(x = density(d$income)$x, 
                                   y = density(d$income)$y)
               map(.x = unique(d[[categ]]),
                   .f = \(lev){
                     tmpdf$y2 <- density(d[d[[categ]] == lev,"income"])$y
                     tmpdf$y3 <- tmpdf$y - tmpdf$y2
                     plt <- ggplot(cbind(tmpdf, density(d[d[[categ]] == lev,"income"])$y)) +
                       geom_line(aes(x = x, y = y, colour = "total")) +
                       geom_line(aes(x = x, y = y2, colour = "filtered")) +
                       geom_line(aes(x = x, y = y3, colour = "difference")) +
                       scale_color_manual(name = categ, values = c("total" = "red",
                                                                   "filtered" = "blue",
                                                                   "difference" = "black"))
                   })
             })

英文:

This will store all the plots in a list of lists, just amend the initial vector for the variables you wish to cover:

plots &lt;- map(.x = c(&quot;gender&quot;, &quot;education&quot;),
             .f = \(categ){
               map(.x = unique(d[[categ]]),
                   .f = \(lev){
                     plt &lt;- ggplot(d) +
                       geom_density(aes(x = income, colour = &quot;red&quot;)) +
                       geom_density(data = d[d[categ]== lev,], aes(x = income, colour = &quot;blue&quot;))
                   })
             })

You can use walk() on the lists to apply your save-to-pdf function, but I assume you'll want to play with the aesthetics, set titles, hide the legend etc first.

EDIT:
Version allowing for plotting of the difference as well:

plots &lt;- map(.x = c(&quot;gender&quot;, &quot;education&quot;),
             .f = \(categ){
               tmpdf &lt;- data.frame(x = density(d$income)$x, 
                                   y = density(d$income)$y)
               map(.x = unique(d[[categ]]),
                   .f = \(lev){
                     tmpdf$y2 &lt;- density(d[d[[categ]] == lev,&quot;income&quot;])$y
                     tmpdf$y3 &lt;- tmpdf$y - tmpdf$y2
                     plt &lt;- ggplot(cbind(tmpdf, density(d[d[[categ]] == lev,&quot;income&quot;])$y)) +
                       geom_line(aes(x = x, y = y, colour = &quot;total&quot;)) +
                       geom_line(aes(x = x, y = y2, colour = &quot;filtered&quot;)) +
                       geom_line(aes(x = x, y = y3, colour = &quot;difference&quot;)) +
                       scale_color_manual(name = categ, values = c(&quot;total&quot; = &quot;red&quot;,
                                                                   &quot;filtered&quot; = &quot;blue&quot;,
                                                                   &quot;difference&quot; = &quot;black&quot;))
                   })
             })

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

遍历列表的筛选值

问题

答案1

更快的将大型嵌套XML转换为R数据框的方法

计算瑞士网球排名分类。

将单行数据转换为包含它们组合数值的多行数据。

多边形的非唯一属性的中位栅格值

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论