英文:
How to draw a Two-group Histogram with split longitudinally bars in the overlaying part with ggplot2 in R
问题
我想要直方图的重叠部分显示为由对角线划分的条形,例如,我创建了以下代码:
set.seed(1)
grupo1 <- round(rnorm(100, mean = 20, sd = 2.2))
grupo2 <- round(rnorm(100, mean = 10, sd = 2))
df <- data.frame(
valores = c(grupo1, grupo2),
grupo = c(rep("grupo1", length(grupo1)), rep("grupo2", length(grupo2)))
)
# 创建直方图
ggplot(df, aes(x = valores, fill = grupo)) +
geom_histogram(binwidth = 1, color = "black", alpha=0.6, position = "identity") +
labs(x = "Valores", y = "Frecuencia", fill = "Grupo") +
scale_fill_manual(values = c("grupo1" = "blue", "grupo2" = "red")) +
theme_minimal()
这段代码会生成以下图形:
结果,注意重叠部分是不同颜色的
但是我想要绘制的图形是这样的(我在画图中进行了修改):
我想要的样子,注意显示了两种颜色
有人知道如何绘制直方图,使其看起来像最后一张图吗?
第二个例子,使用相同的代码:
df2 <- rbind(df, data.frame(valores = c(15,15), grupo = c("grupo1", "grupo1")))
这是结果:
第2个例子
但我想要的是这个(在画图中进行了修改):
我想要的样子
我已经尝试了一些 geom_histogram
的参数,比如改变 position
,但没有成功。期望得到解决我的问题的代码,提前感谢。
英文:
I want the overlapping part of a histogram to show the bar divided by a diagonal line, for example, I created the following code:
set.seed(1)
grupo1 <- round(rnorm(100, mean = 20, sd = 2.2))
grupo2 <- round(rnorm(100, mean = 10, sd = 2))
df <- data.frame(
valores = c(grupo1, grupo2),
grupo = c(rep("grupo1", length(grupo1)), rep("grupo2", length(grupo2)))
)
# Crear histograma
ggplot(df, aes(x = valores, fill = grupo)) +
geom_histogram(binwidth = 1, color = "black", position = "identity", alpha=0.6) +
labs(x = "Valores", y = "Frecuencia", fill = "Grupo") +
scale_fill_manual(values = c("grupo1" = "blue", "grupo2" = "red")) +
theme_minimal()
This code results in this plot:
result, note that the part that overlaps is a different color
But I want the plot to be drawn like this (I modified it in paint):
what I want, note that shows both colors
Does anyone know how to draw the histogram as the last image shows?
A second example, following the same code:
df2 <- rbind(df, data.frame(valores = c(15,15), grupo = c("grupo1", "grupo1")))
This is the result:
2° example
But i want this (modified in paint):
what i want
I have tried some 'geom_histogram' arguments such as changing the 'position' but does not work
I expect a code that solves my issue, Thanks in advance
答案1
得分: 1
Ggplot2对这种情况不太适用。这个问题是关于在条形图中添加纹理,与你想要的类似,但可能会很困难。
我想出了一些非常奇怪的解决方案,可能有更简单的方法,我不知道。
备注:在结尾处有虚拟数据(df2
)。
选项1 - 组合颜色的条形,但不组合高度
这基本上是你所说你不想要的,但有一个改进:组合的条形没有高度的总和。看,如果我们这样做:
ggplot(df2, aes(valores, fill = grupo)) +
geom_histogram(binwidth = 1, color = "black", alpha=0.6)
在valores = 15
处的条形将具有count = 3 + 1
,但我们可能更喜欢一个count = 3
的条形与另一个较小的count = 1
的条形。我们可以使用position_dodge()
来实现这一点,但不要求实际闪避:
ggplot(df2, aes(valores, fill = grupo)) +
geom_histogram(position = position_dodge(0), binwidth = 1, color = "black", alpha=0.6)
但我们也可以添加一点点的闪避。使用position_dodge(0.3)
:
ggplot(df2, aes(valores, fill = grupo)) +
geom_histogram(position = position_dodge(0.3), binwidth = 1, color = "black", alpha=0.6)
这种解决方案的问题是它在条形之间添加了空白。也许有一个可以删除它的geom_histogram
选项,但我不知道。如果你愿意,可以提出一个新问题。
选项2 - 使用geom_area
+ geom_segment
伪造条形
你可以使用直方图的值构建一个新的数据集,然后可以更灵活地定制条形。你可以使用hist()
来做到这一点,但由于我们正在使用ggplot,我为每个组制作了单独的直方图,并使用ggplot_build()
获得了它们的数据。可能有更好的方法来做到这一点,重要的是最终你有一个包含每个组直方图值的数据集。
df_area <- df2 %>%
group_split(grupo) %>%
map_dfr(function(df_group){
g <- ggplot(df_group, aes(valores)) +
geom_histogram(binwidth = 1)
ggplot_build(g)$data[[1]] %>%
select(c(x, xmin, xmax, y)) %>%
mutate(grupo = unique(df_group$grupo)) %>%
pivot_longer(c(xmin, xmax), values_to = "x_area")
})
现在,我们可以使用geom_area
构建直方图的区域,使用geom_segment
构建线段。再次使用position_dodge(0)
,但这次没有空白:
ggplot(df_area, aes(x_area, y, fill = grupo)) +
geom_area(position = position_dodge(0), alpha = 0.6, color = "black") +
geom_segment(aes(y = 0, yend = y, x = x_area, xend = x_area))
这可能会产生区域轮廓和线段之间的奇怪边界。而且,不能使用k != 0
的position_dodge(k)
。
选项3 - 使用自定义数据的geom_area
+ geom_segment
这是最接近你想要的。思路是改变数据:
df_area2 <- df_area %>%
mutate(y = case_when(grupo == "grupo1" ~ ifelse(name == "xmin", y, 0),
grupo == "grupo2" ~ ifelse(name == "xmax", y, 0)))
以产生倾斜的条形:
ggplot(df_area, aes(x_area, y, fill = grupo)) +
geom_area(position = position_dodge(0), color = "black") +
geom_area(data = df_area2) +
geom_segment(aes(y = 0, yend = y, x = x_area, xend = x_area))
在这里使用alpha
将使倾斜的条形可见。你可以在fill
中传递“褪色”的颜色,以匹配先前的色调。
最后,我们可以在半个条形的末尾添加线条:
binwidth <- 1
ggplot(df_area, aes(x_area, y, fill = grupo)) +
geom_area(position = position_dodge(0), color = "black") +
geom_area(data = df_area2) +
geom_segment(aes(y = 0, yend = y, x = x_area, xend = x_area)) +
geom_segment(aes(y = y, yend = y, x = x - 0.5*binwidth, xend = x + 0.5*binwidth))
在这里,binwidth
与创建直方图数据时使用的相同。
虚拟数据
set.seed(1)
grupo1 <- round(rnorm(100, mean = 20, sd = 2.2))
grupo2 <- round(rnorm(100, mean = 10, sd = 2))
df <- data.frame(valores = c(grupo1, grupo2),
grupo = c(rep("grupo1", length(grupo1)), rep("grupo2", length(grupo2))))
df2 <- rbind(df, data.frame(valores = c(15,15), grupo = c("grupo1", "grupo1")))
希望这对你有所帮助。
英文:
Ggplot2 doesn't work well with this kind of thing. This question is about adding textures to bar plots, something similar to what you want, and it comes at great difficulty.
I've come up with some really weird solutions, there might be a way easier approach that I don't know.
Obs: dummy data (df2
) at the end.
Option 1 - bars with combined colors, but not combined heights
This is basically what you said you didn't wanted, but with an improvement: the combined bar doesn't have the sum of the heights. See, if we made:
ggplot(df2, aes(valores, fill = grupo)) +
geom_histogram(binwidth = 1, color = "black", alpha=0.6)
The bar at valores = 15
will have count = 3 + 1
, but we might prefer a count = 3
bar with another samller count = 1
bar below. We can get that using position_dodge()
, but requesting no actual dodge:
ggplot(df2, aes(valores, fill = grupo)) +
geom_histogram(position = position_dodge(0), binwidth = 1, color = "black", alpha=0.6)
But we can also add a small amount of dogdge. Using position_dodge(0.3)
:
The problem with this solution is that it adds whitespace between the bars. There might be an option to geom_histogram
that removes it, but I don't know. You can open a new question if you'd like to.
Option 2 - faking bars using geom_area
+ geom_segment
You can build a new dataset with the values of the histogram, then you get more flexibility to customize the bars. You could do that with hist()
, but since we're using ggplot, I made individual histograms for each group, and got their data with ggplot_build()
. There might be a better way to do that, the important part is that at the end you have a dataset with the histogram values for each group.
df_area <- df2 %>%
group_split(grupo) %>% #for each group
map_dfr(function(df_group){ #apply the following function
g <- ggplot(df_group, aes(valores)) +
geom_histogram(binwidth = 1) #build a histogram
ggplot_build(g)$data[[1]] %>% #get it's data
select(c(x, xmin, xmax, y)) %>% #select these columns
mutate(grupo = unique(df_group$grupo)) %>% #and add a 'grupo' column
pivot_longer(c(xmin, xmax), values_to = "x_area") #pivot the data in order to build columns with geom_area
})
Now, we can build the area of the histogram with geom_area
, and the lines with geom_segment
. Again we use position_dodge(0)
, but this time, no whitespace!:
ggplot(df_area, aes(x_area, y, fill = grupo)) +
geom_area(position = position_dodge(0), alpha = 0.6, color = "black") +
geom_segment(aes(y = 0, yend = y, x = x_area, xend = x_area))
This might yield these weird boundaries between the area contour and the segments. Also, can't use position_dodge(k)
with k != 0
.
Option 3 - geom_area
+ geom_segment
with custom data
This is the closest to what you wanted. The ideia is to change the data:
df_area2 <- df_area %>%
mutate(y = case_when(grupo == "grupo1" ~ ifelse(name == "xmin", y, 0),
grupo == "grupo2" ~ ifelse(name == "xmax", y, 0)))
In such a way to produce the inclined bars:
Then, we add that on top of the base graph of the last image:
ggplot(df_area, aes(x_area, y, fill = grupo)) +
geom_area(position = position_dodge(0), color = "black") +
geom_area(data = df_area2) + #on top of the base area, but below the lines
geom_segment(aes(y = 0, yend = y, x = x_area, xend = x_area))
Using alpha
here will make the inclined bars visible. You can pass "washed out" colors to fill
in order to match the previous tone you had.
Lastly, we can add lines at the end of the half bars:
binwidth <- 1
ggplot(df_area, aes(x_area, y, fill = grupo)) +
geom_area(position = position_dodge(0), color = "black") +
geom_area(data = df_area2) +
geom_segment(aes(y = 0, yend = y, x = x_area, xend = x_area)) +
geom_segment(aes(y = y, yend = y, x = x - 0.5*binwidth, xend = x + 0.5*binwidth))
Here, binwidth
is the same one you used to create the histogram data.
Dummy data
set.seed(1)
grupo1 <- round(rnorm(100, mean = 20, sd = 2.2))
grupo2 <- round(rnorm(100, mean = 10, sd = 2))
df <- data.frame(valores = c(grupo1, grupo2),
grupo = c(rep("grupo1", length(grupo1)), rep("grupo2", length(grupo2))))
df2 <- rbind(df, data.frame(valores = c(15,15), grupo = c("grupo1", "grupo1")))
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论