白色间隙/空白空间与堆叠面积图(ggplot)

huangapple go评论59阅读模式
英文:

White gaps/blank space with stacked area graph (ggplot)

问题

我遵循了这里的指导生成了一个针对分类变量内的四个组的堆叠面积图。代码没有错误,但图表中包含白色间隙,如下图所示。我已经检查了那些特定时期,如十月和十一月,这两个月的数据都是完整的,因此不应该显示白色间隙。我也愿意接受同一数据的其他图表建议?

#安装包:

library(tidyverse)
library(stringr)
library(ggplot2)
library(zoo)
library(ggthemes)
library(writexl)
library(viridis)
library(hrbrthemes)
library(textclean)
library(lubridate)

这是一个数据示例:

dput(stackedgraph[1:20,c(1,2,4)])

输出:

structure(list(month_year = structure(c(2011.91666666667, 2011.91666666667, 
2011.91666666667, 2011.91666666667, 2012, 2012, 2012, 2012, 2012.08333333333, 
2012.08333333333, 2012.08333333333, 2012.08333333333, 2012.16666666667, 
2012.16666666667, 2012.16666666667, 2012.25, 2012.25, 2012.25, 
2012.25, 2012.33333333333), class = "yearmon"), directed_to_whom = c("MoE", 
"MoL", "Non-critical", "Private employers", "MoE", "MoL", "Non-critical", 
"Private employers", "MoE", "MoL", "Non-critical", "Private employers", 
"MoE", "Non-critical", "Private employers", "MoE", "MoL", "Non-critical", 
"Private employers", "MoE"), directed_to_whom_percentage = c(0.0923076923076923, 
0.107692307692308, 0.430769230769231, 0.369230769230769, 0.0666666666666667, 
0.0833333333333333, 0.45, 0.4, 0.0606060606060606, 0.121212121212121, 
0.287878787878788, 0.53030303030303, 0.184210526315789, 0.342105263157895, 
0.473684210526316, 0.131578947368421, 0.105263157894737, 0.210526315789474, 
0.552631578947368, 0.108108108108108)), class = c("grouped_df", 
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -20L), groups = structure(list(
    month_year = structure(c(2011.91666666667, 2012, 2012.08333333333, 
    2012.16666666667, 2012.25, 2012.33333333333), class = "yearmon"), 
    .rows = structure(list(1:4, 5:8, 9:12, 13:15, 16:19, 20L), ptype = integer(0), class = c("vctrs_list_of", 
    "vctrs_vctr", "list")))), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -6L), .drop = TRUE))

这是创建图表的代码:

ggplot(stackedgraph, aes(x = as.Date(month_year),y = directed_to_whom_percentage)) + 
  geom_area(aes(fill=directed_to_whom,group = directed_to_whom), position='stack') +
  scale_fill_manual(values = c("MoL" = "light green",
                              "MoE" = "red",
                               "Private employers" = "light blue",
                              "Non-critical" = "black")) +
          scale_x_date(date_breaks = "months" , date_labels = "%b-%y") +
  theme_economist_white() +
    theme(plot.title = element_text(size = 5, face = "bold"),
          axis.text.x = element_text(angle =90, vjust = 0.5)) +
  theme(axis.title.x=element_blank(),
                      axis.ticks.x=element_blank()) +
    scale_y_continuous(labels = percent_format(accuracy = 1)) 

输出:
白色间隙/空白空间与堆叠面积图(ggplot)

我非常感谢Andy的建议,我用了一个小改变运行了代码:

stackedgraph <- stackedgraph %>%
  ungroup() %>%
  complete(month_year, directed_to_whom, fill = list(directed_to_whom_percentage = 0))

然后是图表:

ggplot(stackedgraph, aes(x = as.Date(month_year), y = directed_to_whom_percentage)) +
  geom_area(aes(fill = directed_to_whom, group = directed_to_whom),
            position = 'stack') +
  scale_fill_manual(
    values = c(
      "MoL" = "light green",
      "MoE" = "red",
      "Private employers" = "light blue",
      "Non-critical" = "black"
    )
  ) +
  scale_x_date(date_breaks = "months" , date_labels = "%b-%y") +
  theme(
    plot.title = element_text(size = 5, face = "bold"),
    axis.text.x = element_text(angle = 90, vjust = 0.5)
  ) +
  theme(axis.title.x = element_blank(),
        axis.ticks.x = element_blank()) +
  scale_y_continuous(labels = percent_format(accuracy = 1))

输出:
白色间隙/空白空间与堆叠面积图(ggplot)

英文:

I have followed the guidance here to generate a stacked area graph for four groups within a categorical variable. The code works without errors but for some reason the graph contains white gaps, as you can see below. And I have checked those specific periods such as Oct and Nov and the data are complete for both months, and thus it should not be displaying white gaps. I am also open to other graph recommendations for the same data?
#install packages:

library(tidyverse)
library(stringr)
library(ggplot2)
library(zoo)
library(ggthemes)
library(writexl)
library(viridis)
library(hrbrthemes)
library(textclean)
library(lubridate)

Here is a data example:

dput(stackedgraph[1:20,c(1,2,4)])

output

structure(list(month_year = structure(c(2011.91666666667, 2011.91666666667, 
2011.91666666667, 2011.91666666667, 2012, 2012, 2012, 2012, 2012.08333333333, 
2012.08333333333, 2012.08333333333, 2012.08333333333, 2012.16666666667, 
2012.16666666667, 2012.16666666667, 2012.25, 2012.25, 2012.25, 
2012.25, 2012.33333333333), class = &quot;yearmon&quot;), directed_to_whom = c(&quot;MoE&quot;, 
&quot;MoL&quot;, &quot;Non-critical&quot;, &quot;Private employers&quot;, &quot;MoE&quot;, &quot;MoL&quot;, &quot;Non-critical&quot;, 
&quot;Private employers&quot;, &quot;MoE&quot;, &quot;MoL&quot;, &quot;Non-critical&quot;, &quot;Private employers&quot;, 
&quot;MoE&quot;, &quot;Non-critical&quot;, &quot;Private employers&quot;, &quot;MoE&quot;, &quot;MoL&quot;, &quot;Non-critical&quot;, 
&quot;Private employers&quot;, &quot;MoE&quot;), directed_to_whom_percentage = c(0.0923076923076923, 
0.107692307692308, 0.430769230769231, 0.369230769230769, 0.0666666666666667, 
0.0833333333333333, 0.45, 0.4, 0.0606060606060606, 0.121212121212121, 
0.287878787878788, 0.53030303030303, 0.184210526315789, 0.342105263157895, 
0.473684210526316, 0.131578947368421, 0.105263157894737, 0.210526315789474, 
0.552631578947368, 0.108108108108108)), class = c(&quot;grouped_df&quot;, 
&quot;tbl_df&quot;, &quot;tbl&quot;, &quot;data.frame&quot;), row.names = c(NA, -20L), groups = structure(list(
    month_year = structure(c(2011.91666666667, 2012, 2012.08333333333, 
    2012.16666666667, 2012.25, 2012.33333333333), class = &quot;yearmon&quot;), 
    .rows = structure(list(1:4, 5:8, 9:12, 13:15, 16:19, 20L), ptype = integer(0), class = c(&quot;vctrs_list_of&quot;, 
    &quot;vctrs_vctr&quot;, &quot;list&quot;))), class = c(&quot;tbl_df&quot;, &quot;tbl&quot;, &quot;data.frame&quot;
), row.names = c(NA, -6L), .drop = TRUE))

Here is the code to create the graph:

ggplot(stackedgraph, aes(x = as.Date(month_year),y = directed_to_whom_percentage)) + 
  geom_area(aes(fill=directed_to_whom,group = directed_to_whom), position=&#39;stack&#39;) +
  scale_fill_manual(values = c(&quot;MoL&quot; = &quot;light green&quot;,
                              &quot;MoE&quot; = &quot;red&quot;,
                               &quot;Private employers&quot; = &quot;light blue&quot;,
                              &quot;Non-critical&quot; = &quot;black&quot;)) +
          scale_x_date(date_breaks = &quot;months&quot; , date_labels = &quot;%b-%y&quot;) +
  theme_economist_white() +
    theme(plot.title = element_text(size = 5, face = &quot;bold&quot;),
          axis.text.x = element_text(angle =90, vjust = 0.5)) +
  theme(axis.title.x=element_blank(),
                      axis.ticks.x=element_blank()) +
    scale_y_continuous(labels = percent_format(accuracy = 1)) 

output:
白色间隙/空白空间与堆叠面积图(ggplot)

I really appreciate the advice from Andy, I ran the code with a small change:

stackedgraph &lt;- stackedgraph %&gt;%
  ungroup() %&gt;% # I used ungroup to avoid this [error][3] which I was receiving.
  complete(month_year, directed_to_whom, fill = list(directed_to_whom_percentage = 0))

Then the graph:

ggplot(stackedgraph, aes(x = as.Date(month_year), y = directed_to_whom_percentage)) +
  geom_area(aes(fill = directed_to_whom, group = directed_to_whom),
            position = &#39;stack&#39;) +
  scale_fill_manual(
    values = c(
      &quot;MoL&quot; = &quot;light green&quot;,
      &quot;MoE&quot; = &quot;red&quot;,
      &quot;Private employers&quot; = &quot;light blue&quot;,
      &quot;Non-critical&quot; = &quot;black&quot;
    )
  ) +
  scale_x_date(date_breaks = &quot;months&quot; , date_labels = &quot;%b-%y&quot;) +
  theme(
    plot.title = element_text(size = 5, face = &quot;bold&quot;),
    axis.text.x = element_text(angle = 90, vjust = 0.5)
  ) +
  theme(axis.title.x = element_blank(),
        axis.ticks.x = element_blank()) +
  scale_y_continuous(labels = percent_format(accuracy = 1))

output:
白色间隙/空白空间与堆叠面积图(ggplot)

答案1

得分: 1

理想情况下,您需要在每个类别的月份中都为零值添加一个0的行 - 面积图对缺失的月份进行外推,从而在某些情况下添加超过100%。您可以使用pivot_wider / pivot_longer组合进行快速修复以填充缺失的0:

library(tidyverse)
library(zoo)

# 创建用于测试的示例数据
stackedgraph <- tibble::tribble(
                  ~month_year,   ~directed_to_whom, ~directed_to_whom_percentage,
                   "Dec 2011",               "MoE",           0.0923076923076923,
                   "Dec 2011",               "MoL",            0.107692307692308,
                   "Dec 2011",      "Non-critical",            0.430769230769231,
                   "Dec 2011", "Private employers",            0.369230769230769,
                   "Jan 2012",               "MoE",           0.0666666666666667,
                   "Jan 2012",               "MoL",           0.0833333333333333,
                   "Jan 2012",      "Non-critical",                         0.45,
                   "Jan 2012", "Private employers",                          0.4,
                   "Feb 2012",               "MoE",           0.0606060606060606,
                   "Feb 2012",               "MoL",            0.121212121212121,
                   "Feb 2012",      "Non-critical",            0.287878787878788,
                   "Feb 2012", "Private employers",             0.53030303030303,
                   "Mar 2012",               "MoE",            0.184210526315789,
                   "Mar 2012",      "Non-critical",            0.342105263157895,
                   "Mar 2012", "Private employers",            0.473684210526316,
                   "Apr 2012",               "MoE",            0.131578947368421,
                   "Apr 2012",               "MoL",            0.105263157894737,
                   "Apr 2012",      "Non-critical",            0.210526315789474,
                   "Apr 2012", "Private employers",            0.552631578947368
                  ) %> 
  mutate(month_year = as.yearmon(month_year))

# 在您的数据上运行此代码

stackedgraph |>
  pivot_wider(names_from = directed_to_whom,
              values_from = directed_to_whom_percentage,
              values_fill = 0) |>
  pivot_longer(-month_year, 
               names_to = "directed_to_whom",
               values_to = "directed_to_whom_percentage") |>
  ggplot(aes(x = as.Date(month_year), y = directed_to_whom_percentage)) +
  geom_area(aes(fill = directed_to_whom, group = directed_to_whom),
            position = 'stack') +
  scale_fill_manual(
    values = c(
      "MoL" = "light green",
      "MoE" = "red",
      "Private employers" = "light blue",
      "Non-critical" = "black"
    )
  ) +
  scale_x_date(date_breaks = "months" , date_labels = "%b-%y") +
  theme(
    plot.title = element_text(size = 5, face = "bold"),
    axis.text.x = element_text(angle = 90, vjust = 0.5)
  ) +
  theme(axis.title.x = element_blank(),
        axis.ticks.x = element_blank()) +
  scale_y_continuous(labels = percent_format(accuracy = 1))

白色间隙/空白空间与堆叠面积图(ggplot)

编辑 - 稍微更整洁的解决方案

tidyr::complete函数可以以更自然/明确的方式为您执行这些步骤。这将在数据框中创建每个month_year*directed_to_whom组合,并使用0填充缺失值:

stackedgraph <- stackedgraph %>%
  complete(month_year,
           directed_to_whom,
           fill = list(directed_to_whom_percentage = 0))

这与两个数据透视的效果相同,但是这是更好、更正确的方法。这将提供整理过的、扩展的数据框,应该能够传递给您的ggplot代码,得到正确的结果。

英文:

Ideally you would need a row with a 0 for every category in months when it has a value of zero - the area chart is extrapolating for missing months thereby adding over 100% in some cases. You can do a quick fix with a pivot_wider/pivot_longer combo to fill in missing 0s:

library(tidyverse)
library(zoo)

# Creating sample data to test
stackedgraph &lt;- tibble::tribble(
                  ~month_year,   ~directed_to_whom, ~directed_to_whom_percentage,
                   &quot;Dec 2011&quot;,               &quot;MoE&quot;,           0.0923076923076923,
                   &quot;Dec 2011&quot;,               &quot;MoL&quot;,            0.107692307692308,
                   &quot;Dec 2011&quot;,      &quot;Non-critical&quot;,            0.430769230769231,
                   &quot;Dec 2011&quot;, &quot;Private employers&quot;,            0.369230769230769,
                   &quot;Jan 2012&quot;,               &quot;MoE&quot;,           0.0666666666666667,
                   &quot;Jan 2012&quot;,               &quot;MoL&quot;,           0.0833333333333333,
                   &quot;Jan 2012&quot;,      &quot;Non-critical&quot;,                         0.45,
                   &quot;Jan 2012&quot;, &quot;Private employers&quot;,                          0.4,
                   &quot;Feb 2012&quot;,               &quot;MoE&quot;,           0.0606060606060606,
                   &quot;Feb 2012&quot;,               &quot;MoL&quot;,            0.121212121212121,
                   &quot;Feb 2012&quot;,      &quot;Non-critical&quot;,            0.287878787878788,
                   &quot;Feb 2012&quot;, &quot;Private employers&quot;,             0.53030303030303,
                   &quot;Mar 2012&quot;,               &quot;MoE&quot;,            0.184210526315789,
                   &quot;Mar 2012&quot;,      &quot;Non-critical&quot;,            0.342105263157895,
                   &quot;Mar 2012&quot;, &quot;Private employers&quot;,            0.473684210526316,
                   &quot;Apr 2012&quot;,               &quot;MoE&quot;,            0.131578947368421,
                   &quot;Apr 2012&quot;,               &quot;MoL&quot;,            0.105263157894737,
                   &quot;Apr 2012&quot;,      &quot;Non-critical&quot;,            0.210526315789474,
                   &quot;Apr 2012&quot;, &quot;Private employers&quot;,            0.552631578947368
                  ) |&gt; 
  mutate(month_year = as.yearmon(month_year))

# Run this code on your data

stackedgraph |&gt;
  pivot_wider(names_from = directed_to_whom,
              values_from = directed_to_whom_percentage,
              values_fill = 0) |&gt;
  pivot_longer(-month_year, 
               names_to = &quot;directed_to_whom&quot;,
               values_to = &quot;directed_to_whom_percentage&quot;) |&gt;
  ggplot(aes(x = as.Date(month_year), y = directed_to_whom_percentage)) +
  geom_area(aes(fill = directed_to_whom, group = directed_to_whom),
            position = &#39;stack&#39;) +
  scale_fill_manual(
    values = c(
      &quot;MoL&quot; = &quot;light green&quot;,
      &quot;MoE&quot; = &quot;red&quot;,
      &quot;Private employers&quot; = &quot;light blue&quot;,
      &quot;Non-critical&quot; = &quot;black&quot;
    )
  ) +
  scale_x_date(date_breaks = &quot;months&quot; , date_labels = &quot;%b-%y&quot;) +
  theme(
    plot.title = element_text(size = 5, face = &quot;bold&quot;),
    axis.text.x = element_text(angle = 90, vjust = 0.5)
  ) +
  theme(axis.title.x = element_blank(),
        axis.ticks.x = element_blank()) +
  scale_y_continuous(labels = percent_format(accuracy = 1))

白色间隙/空白空间与堆叠面积图(ggplot)<!-- -->

Edit - a slightly tidier solution

The tidyr::complete function can do the steps for you in a bit more of a natural/explicit way. This will create every month_year*directed_to_whom combo in the dataframe and fill missing values with 0s:

stackedgraph &lt;- stackedgraph %&gt;%
  complete(month_year,
           directed_to_whom,
           fill = list(directed_to_whom_percentage = 0))

This does the same as the two pivots but is the nicer, proper way of doing it. This will then give the tidied, expanded dataframe which should pass to your ggplot code giving the right results.

huangapple
  • 本文由 发表于 2023年4月17日 21:43:11
  • 转载请务必保留本文链接:https://go.coder-hub.com/76035820.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定