将两个连续变量映射到ggplot中的矩形高度和宽度。

huangapple go评论53阅读模式
英文:

How to map two continuous variables to the height and width of boxes in ggplot?

问题

我能帮你翻译这个内容。

英文:

I want to create a plot with two continuous variables (v1 and v2) and one categorical variable (ex levels A,B,C,D). The plot should show a matrix of proportions. The categorical variable should be on the x-axis and each column should have two boxes (v1 and v2) representing the proportion of each continuous variable within that category (Within A, v1/(v1+v2) then v2/(v1+v2)). The width of the columns should represent the proportion of the total that is within that category (v1+v2 for A divided by the sum of all v1 and v2)

It should look like a heatmap but with the variable type (v1 or v2) mapped to color and the height and width of the boxes mapped as described above.

Using a stacked bar graph approach worked well and is close to what I want but there is horizontal space between the bars. Since I'm already using the width aesthetic to map the proportion within each category I wasn't able to eliminate this space.

将两个连续变量映射到ggplot中的矩形高度和宽度。

Alternatively I tried to use geom_tile but that suffered from the same space issue and didn't result in all bars with a height of 1.

将两个连续变量映射到ggplot中的矩形高度和宽度。

The closest solution I have found is: https://stackoverflow.com/questions/66996598/ggplot2-heatmap-with-tile-height-and-width-as-aes

However in that example they have a categorical variable on both X and Y axes which is a little different than my case.

Reproducible example for reference:

library(tidyverse)

cat <- c("A","B","C","D")
v1 <- c(1,3,6,2)
v2 <- c(3,3,10,1)
df <- data.frame(cat,v1,v2)

df <- df %>%
  group_by(cat) %>%
  mutate(sum.cat = sum(v1,v2)) %>%
  mutate(prop.v1 = v1/sum.cat) %>%
  ungroup() %>%
  mutate(prop.cat = sum.cat/sum(v1,v2)) %>%
  mutate(sum.tot = sum(sum.cat)) %>%
  mutate(prop.v2 = 1-prop.v1) %>%
  pivot_longer(cols = c(5,8), names_to = "prop.v.type", values_to = "prop.v")

ggplot(df,aes(cat,prop.v, fill = prop.v.type))+
  geom_bar(position = "stack", stat = "identity",aes(width=prop.cat))

ggplot(df,aes(x=cat, y=prop.v, fill = prop.v.type))+
  geom_tile(aes(width=prop.cat,height=prop.v))

Thanks in advance!

答案1

得分: 1

它可以通过对x轴值进行小小的修改来实现。我所做的是根据 prop.cat 计算 x 轴值,然后将 cat 标签分配给与每个柱位置相对应的匹配值。这将使得 x 轴成为连续值,以便 `width` 美学现在可以匹配轴值。

cat <- c("A","B","C","D")
v1 <- c(1,3,6,2)
v2 <- c(3,3,10,1)
df <- data.frame(cat,v1,v2)

df <- df %>%
  group_by(cat) %>%
  mutate(sum.cat = sum(v1,v2)) %>%
  mutate(prop.v1 = v1/sum.cat) %>%
  ungroup() %>%
  mutate(prop.cat = sum.cat/sum(v1,v2)) %>%
  mutate(sum.tot = sum(sum.cat)) %>%
  mutate(prop.v2 = 1-prop.v1) %>%
  pivot_longer(cols = c(5,8), names_to = "prop.v.type", values_to = "prop.v")

# 这里我计算了每个 cat 的 x 轴位置
df_revised <- df |>
  group_by(cat) |>
  mutate(prop.cat_cumsum = if_else(row_number() == 1, prop.cat, 0)) |>
  ungroup() |>
  mutate(prop.cat_cumsum = cumsum(prop.cat_cumsum)) |>
  mutate(x_axis_value = 0 + prop.cat_cumsum - prop.cat / 2)

# 因为 cat 和值在顺序上是对齐的,所以我只是将它们提取出来
x_asix_breaks <- unique(df_revised$x_axis_value)
x_asix_labels <- unique(df_revised$cat)

# 现在我绘制它们以测试它们是否匹配得很好。
ggplot(df_revised,
       aes(x = x_axis_value, y = prop.v, fill = prop.v.type))+
  geom_bar(position = "stack", stat = "identity",
           aes(width = prop.cat)) +
  scale_x_continuous(breaks = x_asix_breaks, expand = c(0, 0)) +
  scale_y_continuous(expand = c(0, 0))
#> 警告 in geom_bar(position = "stack", stat = "identity", aes(width = prop.cat)): Ignoring unknown aesthetics: width

将两个连续变量映射到ggplot中的矩形高度和宽度。

好的,它按预期工作了。现在只需要将正确的 cat 标签分配给 x 轴,并在柱状图上添加一条线框,以便更容易区分柱之间的差异。

ggplot(df_revised,
       aes(x = x_axis_value, y = prop.v, fill = prop.v.type))+
  geom_bar(position = "stack", stat = "identity",
           color = "black", aes(width = prop.cat)) +
  scale_x_continuous(breaks = x_asix_breaks, labels = x_asix_labels,
                     expand = c(0, 0)) +
  scale_y_continuous(expand = c(0, 0))
#> 警告 in geom_bar(position = "stack", stat = "identity", color = "black", : Ignoring unknown aesthetics: width

将两个连续变量映射到ggplot中的矩形高度和宽度。

创建于2023-05-18,使用 reprex v2.0.2


<details>
<summary>英文:</summary>

It can be done with a little hack to the x-axis values. What I did is I calculate the x-Axis value based on the prop.cat the assign the cat labels to matched values of each bar position corresponded to each cat. This will make the x-Axis continous values so that the `width` aes now able to matched Axis values.

``` r
library(tidyverse)

cat &lt;- c(&quot;A&quot;,&quot;B&quot;,&quot;C&quot;,&quot;D&quot;)
v1 &lt;- c(1,3,6,2)
v2 &lt;- c(3,3,10,1)
df &lt;- data.frame(cat,v1,v2)

df &lt;- df %&gt;%
  group_by(cat) %&gt;%
  mutate(sum.cat = sum(v1,v2)) %&gt;%
  mutate(prop.v1 = v1/sum.cat) %&gt;%
  ungroup() %&gt;%
  mutate(prop.cat = sum.cat/sum(v1,v2)) %&gt;%
  mutate(sum.tot = sum(sum.cat)) %&gt;%
  mutate(prop.v2 = 1-prop.v1) %&gt;%
  pivot_longer(cols = c(5,8), names_to = &quot;prop.v.type&quot;, values_to = &quot;prop.v&quot;)

# Here I calculate the x_axis position for each cat
df_revised &lt;- df |&gt; 
  group_by(cat) |&gt;
  mutate(prop.cat_cumsum = if_else(row_number() == 1, prop.cat, 0)) |&gt;
  ungroup() |&gt;
  mutate(prop.cat_cumsum = cumsum(prop.cat_cumsum)) |&gt;
  mutate(x_axis_value = 0 + prop.cat_cumsum - prop.cat / 2)

# As the cat &amp; the values are well aligned in order so I just extract them
x_asix_breaks &lt;- unique(df_revised$x_axis_value)
x_asix_labels &lt;- unique(df_revised$cat)

# Now I plot them to test if it fit well.
ggplot(df_revised,
       aes(x = x_axis_value, y = prop.v, fill = prop.v.type))+
  geom_bar(position = &quot;stack&quot;, stat = &quot;identity&quot;,
           aes(width = prop.cat)) +
  scale_x_continuous(breaks = x_asix_breaks, expand = c(0, 0)) +
  scale_y_continuous(expand = c(0, 0))
#&gt; Warning in geom_bar(position = &quot;stack&quot;, stat = &quot;identity&quot;, aes(width =
#&gt; prop.cat)): Ignoring unknown aesthetics: width

将两个连续变量映射到ggplot中的矩形高度和宽度。<!-- -->

Ok it worked as expected. Now just need to assign the proper cat labels to the x-Axis and add a line border to the bar so it easy to distinct between bars.


ggplot(df_revised,
       aes(x = x_axis_value, y = prop.v, fill = prop.v.type))+
  geom_bar(position = &quot;stack&quot;, stat = &quot;identity&quot;,
           color = &quot;black&quot;, aes(width = prop.cat)) +
  scale_x_continuous(breaks = x_asix_breaks, labels = x_asix_labels,
                     expand = c(0, 0)) +
  scale_y_continuous(expand = c(0, 0))
#&gt; Warning in geom_bar(position = &quot;stack&quot;, stat = &quot;identity&quot;, color = &quot;black&quot;, :
#&gt; Ignoring unknown aesthetics: width

将两个连续变量映射到ggplot中的矩形高度和宽度。<!-- -->

<sup>Created on 2023-05-18 with reprex v2.0.2</sup>

huangapple
  • 本文由 发表于 2023年5月18日 08:24:32
  • 转载请务必保留本文链接:https://go.coder-hub.com/76276989.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定