英文:
Iterate over filtered values of a list
问题
我有一个数据框架:
set.seed(1)
d <- data.frame(year= c(2001:2005,2001:2005,2001:2005),
income = sample(2000:10000,15,replace = T),
gender = sample(1:2,15,replace = T),
education = sample(1:3,15,replace = T)
)
因为在实际的数据框架中,我有比gender
和education
更多的变量,我想编写一个函数,用于绘制每个子组的收入核密度图与所有子组(根据性别和教育)的比较,并最终保存为PDF文件。以gender==1
为例:
male <- d %>% filter(gender == 1)
density_all <- density(d$income)
density_male <- density(male$income)
d_density <- data.frame(x = density_all$x,
density_all = density_all$y,
density_male = density_male$y)
plot <- ggplot(d_density, aes(x)) +
geom_line(aes(y = density_all), color = "red") +
geom_line(aes(y = density_male), color = "blue")
ggsave("subgroup_name.pdf", plot, width = 300, height = 250, units = "mm")
我考虑过将数据框架从长格式转换为宽格式,但每个子组的长度不会相同。
我还考虑过在循环内部进行嵌套循环,即首先循环遍历变量中的值(gender == 1或2),然后循环遍历变量(gender和education)。不确定哪种选项更好以及如何确切执行它。
您的建议将不胜感激。
英文:
I have a dataframe:
set.seed(1)
d <- data.frame(year= c(2001:2005,2001:2005,2001:2005),
income = sample(2000:10000,15,replace = T),
gender = sample(1:2,15,replace = T),
education = sample(1:3,15,replace = T)
)
Since in the actual dataframe, I have more variables than just gener
and education
, I want to write a function to plot income kernel densities for each subgroup vs. all under gender and education, and save as pdf for each subgroup at the end.
Take gender==1
as an example:
male <- d %>% filter(gender == 1)
density_all <- density(d$income)
density_male <- density(male$income)
d_densisty <- data.frame(x = density_all$x,
density_all = density_all$y,
density_male = density_male$y)
plot <- ggplot(d_densisty, aes(x)) +
geom_line(aes(y = density_all), color = "red") +
geom_line(aes(y = density_male), color = "blue")
ggsave("subgroup_name.pdf", plot, width = 300, height = 250, units = "mm")
I have thought about converting the dataframe from long to wide format, but the length of each subgroup won't be the same.
I also thought about doing a loop within a loop, i.e. first looping over the values in a variable (gender == 1 or 2), then looping over the variables (gender and education). Not sure which option is better and how exactly I can carry it out.
Your suggestions will be highly appreciated.
答案1
得分: 1
这将把所有的图表存储在一个列表的列表中,只需修改要包含的变量的初始向量:
plots <- map(.x = c("gender", "education"),
.f = \(categ){
map(.x = unique(d[[categ]]),
.f = \(lev){
plt <- ggplot(d) +
geom_density(aes(x = income, colour = "red")) +
geom_density(data = d[d[categ]== lev,], aes(x = income, colour = "blue"))
})
})
您可以在列表上使用 walk()
来应用保存到 PDF 的函数,但我假设您会首先进行美观性调整、设置标题、隐藏图例等操作。
编辑:
允许绘制差异的版本:
plots <- map(.x = c("gender", "education"),
.f = \(categ){
tmpdf <- data.frame(x = density(d$income)$x,
y = density(d$income)$y)
map(.x = unique(d[[categ]]),
.f = \(lev){
tmpdf$y2 <- density(d[d[[categ]] == lev,"income"])$y
tmpdf$y3 <- tmpdf$y - tmpdf$y2
plt <- ggplot(cbind(tmpdf, density(d[d[[categ]] == lev,"income"])$y)) +
geom_line(aes(x = x, y = y, colour = "total")) +
geom_line(aes(x = x, y = y2, colour = "filtered")) +
geom_line(aes(x = x, y = y3, colour = "difference")) +
scale_color_manual(name = categ, values = c("total" = "red",
"filtered" = "blue",
"difference" = "black"))
})
})
英文:
This will store all the plots in a list of lists, just amend the initial vector for the variables you wish to cover:
plots <- map(.x = c("gender", "education"),
.f = \(categ){
map(.x = unique(d[[categ]]),
.f = \(lev){
plt <- ggplot(d) +
geom_density(aes(x = income, colour = "red")) +
geom_density(data = d[d[categ]== lev,], aes(x = income, colour = "blue"))
})
})
You can use walk()
on the lists to apply your save-to-pdf function, but I assume you'll want to play with the aesthetics, set titles, hide the legend etc first.
EDIT:
Version allowing for plotting of the difference as well:
plots <- map(.x = c("gender", "education"),
.f = \(categ){
tmpdf <- data.frame(x = density(d$income)$x,
y = density(d$income)$y)
map(.x = unique(d[[categ]]),
.f = \(lev){
tmpdf$y2 <- density(d[d[[categ]] == lev,"income"])$y
tmpdf$y3 <- tmpdf$y - tmpdf$y2
plt <- ggplot(cbind(tmpdf, density(d[d[[categ]] == lev,"income"])$y)) +
geom_line(aes(x = x, y = y, colour = "total")) +
geom_line(aes(x = x, y = y2, colour = "filtered")) +
geom_line(aes(x = x, y = y3, colour = "difference")) +
scale_color_manual(name = categ, values = c("total" = "red",
"filtered" = "blue",
"difference" = "black"))
})
})
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论