使用六边形图来显示分类变量的比例(就像六角形三角形图中一样)。

huangapple go评论54阅读模式
英文:

Using hexbins to show proportion of categorical variables (like in hextri)

问题

以下是您要翻译的内容:

"通过这个网站的建议,我已经在ggplot中构建了一个hexbin图,显示了每个bin中数据点的计数,并突出显示了特定的兴趣区域。"

"现在,我想进一步扩展这个图,以显示每个hexbin中第二个分组类别的比例。这已经可以通过hextri包实现,但我无法将我的先前问题中的ggplot解决方案与hextri包的输出结合起来。"

"最终目标是拥有一个看起来像hextri包输出的图,并突出显示感兴趣的单元格。"

"以下是一些可以创建具有突出显示单元格的ggplot和显示分类比例的hextri图的示例数据代码。这两个功能是我想要结合的。"

"我已经尝试使用hextri函数的border输入来实现所期望的结果,但目前还没有成功。"

请注意,代码部分未翻译。

英文:

Through advice from this site, I have built a hexbin plot in ggplot which shows the count of data points in each bin, and highlights particular bins of interest.

I now want to extend this plot one step further to show the proportion of a second grouping category within each hexbin. This can already be acheived with the hextri package, but I can't combine the ggplot solutions from my previous question with output from the hextri package.

The ultimate goal is to have a plot that looks like the output from the hextri package, and highlights the cells of interest.

Below is some example data code that can create the ggplot with highlighted cells, and the hextri plot with the categorical proportions shown. These two features are what I want to combine.

I have tried playing with the border input of the hextri function to achieve the desired outcome but with no success yet.

library(hextri)
library(ggplot2)

n = 100

df = data.frame(x = rnorm(n), 
                y = rnorm(n),
                group = sample(0:1, n, prob = c(0.9, 0.1), replace = TRUE))

# hextri plot
hextri_plot = hextri(
  df$x,
  df$y,
  class = df$group,
  colour = c("#00b38a", "#ea324c"),
  nbins = 3,
  diffuse = FALSE, 
  sorted = TRUE
) 


# GGplot
ggplot(df, aes(x = x, y = y)) +
  geom_hex() +
  stat_summary_hex(aes(
    z = group,
    color = after_stat(as.character(value))
  ), fun = ~ +any(.x == 1), fill = NA) +
  scale_color_manual(
    values = c("0" = "transparent", "1" = "yellow"),
    guide = "none"
  )


答案1

得分: 13

这是不是一个琐碎的问题。它需要编写一个新的 Geom,一个新的 Stat,和一个新的 Grob(见下文)。我个人并不确信这是一个很好的数据可视化选项,因为它会导致位置失真和涉及到显著的舍入误差。然而,它在视觉上很吸引人,而且相当直观,所以我还是写了一个 geom_hextri。为了使它工作,我们只需将其美学映射到一个分类变量,它应该表现得与预期相符。

让我们使用您提供的示例数据:

set.seed(1)
n = 100

df = data.frame(x = rnorm(n), 
                y = rnorm(n),
                group = sample(0:1, n, prob = c(0.9, 0.1), replace = TRUE))

并使用您选择的颜色方案绘制它与 geom_hextri。我们将叠加点以确保段的填充逻辑与点匹配。

ggplot(df, aes(x, y, fill = factor(group), color = factor(group))) + 
  geom_hextri(linewidth = 0.3, bins = 4) + 
  geom_point(shape = 21, size = 3, color = "black") +
  coord_equal() + 
  theme_classic(base_size = 16) + 
  theme(aspect.ratio = 1) +
  scale_fill_manual("Group", values =  c("#00b38a", "#ea324c")) +
  scale_color_manual("Group", values =  c("#00b38a", "#ea324c"))

使用六边形图来显示分类变量的比例(就像六角形三角形图中一样)。

请注意,如果需要,很容易更改箱子大小和美学。要在我们的三角形周围得到实心六边形,我们只需添加一个 geom_hex 层:

ggplot(df, aes(x, y, fill = factor(group))) + 
  geom_hextri(color = "black", linewidth = 0.1, bins = 5) + 
  geom_point(shape = 21, size = 3) +
  geom_hex(fill = NA, color = "black", linewidth = 1, bins = 5) +
  coord_equal() + 
  theme_classic(base_size = 16) + 
  theme(aspect.ratio = 1) +
  scale_fill_manual("Group", values = c("gray", "red"))

使用六边形图来显示分类变量的比例(就像六角形三角形图中一样)。

应用到另一个数据集,我们可以得到:

ggplot(iris, aes(Sepal.Width, Sepal.Length, fill = Species)) + 
  geom_hextri(color = "white", linewidth = 0.1, bins = 5) + 
  geom_point(shape = 21, size = 3, position = position_jitter(0.03, 0.03),
             color = "white") +
  geom_hex(fill = NA, colour = NA, linewidth = 1, bins = 5) +
  coord_equal() + 
  theme_minimal(base_size = 20) + 
  theme(aspect.ratio = 1) +
  scale_fill_brewer(palette = "Set2")

使用六边形图来显示分类变量的比例(就像六角形三角形图中一样)。

还要注意,我们不需要使用填充美学。例如,我们可以简单地更改轮廓颜色:

ggplot(iris, aes(Sepal.Width, Sepal.Length, colour = Species)) + 
  geom_hextri(fill = NA, linewidth = 2, bins = 5, alpha = 1) + 
  geom_hex(fill = NA, colour = NA, linewidth = 1, bins = 5) +
  coord_equal() + 
  theme_minimal(base_size = 20) + 
  theme(aspect.ratio = 1) +
  scale_colour_brewer(palette = "Set1")

使用六边形图来显示分类变量的比例(就像六角形三角形图中一样)。


英文:

This is not a trivial problem. It requires writing a new Geom, a new Stat and a new Grob (see below). I'm personally not convinced that it is a great data visualization option, as it is position-distorting and involves significant rounding errors. However, it is visually appealing and fairly intuitive, so I went ahead and wrote a geom_hextri anyway. To get it to work, we simply map its aesthetics to a categorical variable and it should behave much as expected.

Let's use your own example data:

set.seed(1)
n = 100

df = data.frame(x = rnorm(n), 
                y = rnorm(n),
                group = sample(0:1, n, prob = c(0.9, 0.1), replace = TRUE))

And plot it with geom_hextri using your chosen color scheme. We will overlay points so we can ensure the logic of the segment fills matches the points.

ggplot(df, aes(x, y, fill = factor(group), color = factor(group))) + 
  geom_hextri(linewidth = 0.3, bins = 4) + 
  geom_point(shape = 21, size = 3, color = "black") +
  coord_equal() + 
  theme_classic(base_size = 16) + 
  theme(aspect.ratio = 1) +
  scale_fill_manual("Group", values =  c("#00b38a", "#ea324c")) +
  scale_color_manual("Group", values =  c("#00b38a", "#ea324c"))

使用六边形图来显示分类变量的比例(就像六角形三角形图中一样)。

Note that it's easy to change the bin size and aesthetics if we want. To get solid hexagons around our triangles, we just add a geom_hex layer:

ggplot(df, aes(x, y, fill = factor(group))) + 
  geom_hextri(color = "black", linewidth = 0.1, bins = 5) + 
  geom_point(shape = 21, size = 3) +
  geom_hex(fill = NA, color = "black", linewidth = 1, bins = 5) +
  coord_equal() + 
  theme_classic(base_size = 16) + 
  theme(aspect.ratio = 1) +
  scale_fill_manual("Group", values = c("gray", "red"))

使用六边形图来显示分类变量的比例(就像六角形三角形图中一样)。

And applying to another data set we get:

ggplot(iris, aes(Sepal.Width, Sepal.Length, fill = Species)) + 
  geom_hextri(color = "white", linewidth = 0.1, bins = 5) + 
  geom_point(shape = 21, size = 3, position = position_jitter(0.03, 0.03),
             color = "white") +
  geom_hex(fill = NA, colour = NA, linewidth = 1, bins = 5) +
  coord_equal() + 
  theme_minimal(base_size = 20) + 
  theme(aspect.ratio = 1) +
  scale_fill_brewer(palette = "Set2")

使用六边形图来显示分类变量的比例(就像六角形三角形图中一样)。

Also note we don't need to use the fill aesthetic. We can, for example, simply change the outline colour:

ggplot(iris, aes(Sepal.Width, Sepal.Length, colour = Species)) + 
  geom_hextri(fill = NA, linewidth = 2, bins = 5, alpha = 1) + 
  geom_hex(fill = NA, colour = NA, linewidth = 1, bins = 5) +
  coord_equal() + 
  theme_minimal(base_size = 20) + 
  theme(aspect.ratio = 1) +
  scale_colour_brewer(palette = "Set1")

使用六边形图来显示分类变量的比例(就像六角形三角形图中一样)。


Code for geom_hextri

Now the difficult part - the implementation of geom_hextri. I have tried to break this down into chunks, but the code is necessarily long and too difficult to explain in any great detail. I have also had to sacrifice spacing a bit to allow it to fit into code boxes that don't need scrolling.


Ultimately, ggplot has to draw objects on the plotting device as graphical objects (grobs), but there is no existing off-the-shelf grob that will draw these hexagonal segments, so we need to define a function that will draw them using grid::polygonGrob, given appropriate x, y co-ordinates, heights, widths, graphical parameters, and the segment we are dealing with. This needs to accept vectorized data to work with ggplot:

hextriGrob <- function(x, y, seg, height, width, gp = grid::gpar()) {

  gp <- lapply(seq_along(x), function(i) structure(gp[i], class = "gpar"))
  xl  <- x - width
  xr  <- x + width
  y1  <- y + 2 * height
  y2  <- y + height
  y3  <- y - height
  y4  <- y - 2 * height
  pg  <- grid::polygonGrob
  
  do.call(grid::gList, 
    Map(function(x, y, xl, xr, y1, y2, y3, y4, seg, gp) {
      if(seg == 1) return(pg(x = c(x, x, xr, x),  y = c(y, y1, y2, y), gp = gp))
      if(seg == 2) return(pg(x = c(x, xr, xr, x), y = c(y, y2, y3, y), gp = gp))
      if(seg == 3) return(pg(x = c(x, xr, x, x),  y = c(y, y3, y4, y), gp = gp))
      if(seg == 4) return(pg(x = c(x, x, xl, x),  y = c(y, y4, y3, y), gp = gp))
      if(seg == 5) return(pg(x = c(x, xl, xl, x), y = c(y, y3, y2, y), gp = gp))
      if(seg == 6) return(pg(x = c(x, xl, x, x),  y = c(y, y2, y1, y), gp = gp))
  }, x = x, y = y, xl = xl, xr = xr, y1 = y1, 
     y2 = y2, y3 = y3, y4 = y4, seg = seg, gp = gp))
}

But this isn't itself enough. We also need to define a geom that inherits from GeomHex, but has its own compute_group method to invoke our hextriGrob function appropriately. Part of its job will be to ensure that aesthetics are split correctly into segments, which for technical reasons cannot all easily be done within a Stat layer.

GeomHextri <- ggproto("GeomHextri", GeomHex,
  draw_group = function (self, data, panel_params, coord, lineend = "butt",
                         linejoin = "mitre", linemitre = 10) {
    table_six <- function(vec) {
      if(!is.factor(vec)) vec <- factor(vec)
      tab <- round(6 * table(vec, useNA = "always")/length(vec))
      n <- diff(c(0, findInterval(cumsum(tab) / sum(tab), 1:6/6)))
      rep(names(tab), times = n)
    }
    num_cols <- sapply(data, is.numeric)
    non_num_cols <- names(data)[!num_cols]
    num_cols <- names(data)[num_cols]
    datasplit <- split(data, interaction(data$x, data$y, drop = TRUE))
    data <- do.call("rbind", lapply(seq_along(datasplit), function(i) {
      num_list <- lapply(datasplit[[i]][num_cols], function(x) rep(mean(x), 6))
      non_num_list <- lapply(datasplit[[i]][non_num_cols], function(x) {
        table_six(rep(x, times = datasplit[[i]]$count))})
      d <- datasplit[[i]][rep(1, 6),]
      d[num_cols] <- num_list
      d[non_num_cols] <- non_num_list
      d$tri <- 1:6
      d$group <- i
      d}))
    data <- ggplot2:::check_linewidth(data, snake_class(self))
    if (ggplot2:::empty(data))  return(zeroGrob())
    coords <- coord$transform(data, panel_params)
    hw <- c(min(diff(unique(sort(coords$x)))), 
            min(diff(unique(sort(coords$y))))/3)
    hextriGrob(coords$x, coords$y, data$tri, hw[2], hw[1],
      gp = grid::gpar(col = data$colour, fill = alpha(data$fill, data$alpha),
                      lwd = data$linewidth * .pt, lty = data$linetype,
                      lineend = lineend, linejoin = linejoin,
                      linemitre = linemitre))})

Before our data even gets to this geom, it needs to be binned into hexagons. Unfortunately, the existing StatBinhex will not be able to do this while preserving the individual segment-level aesthetic detail we need, so we have to write our own binning function:

hexify <- function (x, y, z, xbnds, ybnds, xbins, ybins, binwidth,
                    fun = mean, fun.args = list(),
                    drop = TRUE) {

  hb <- hexbin::hexbin(x, xbnds = xbnds, xbins = xbins, y,
                       ybnds = ybnds, shape = ybins/xbins, IDs = TRUE)
  value <- rlang::inject(tapply(z, hb@cID, fun, !!!fun.args))
  out <- hexbin::hcell2xy(hb)
  out <- ggplot2:::data_frame0(!!!out)
  out$value <- as.vector(value)
  out$width <- binwidth[1]
  out$height <- binwidth[2]
  if (drop) out <- stats::na.omit(out)
  out
}

This then has to be used inside a custom Stat:

StatHextri <- ggproto("StatBinhex", StatBinhex,
  default_aes = aes(weight = 1, alpha = after_stat(count)),
  compute_panel = function (self, data, scales, ...) {
    if (ggplot2:::empty(data)) return(ggplot2:::data_frame0())
    data$group <- 1
    self$compute_group(data = data, scales = scales, ...)},
  compute_group = function (data, scales, binwidth = NULL, bins = 30,
                            na.rm = FALSE){
    `%||%` <- rlang::`%||%`
    rlang::check_installed("hexbin", reason = "for `stat_binhex()`")
    binwidth <- binwidth %||% ggplot2:::hex_binwidth(bins, scales)
    if (length(binwidth) == 1) binwidth <- rep(binwidth, 2)
    wt <- data$weight %||% rep(1L, nrow(data))
    non_pos <- !names(data) %in% c("x", "y", "PANEL", "group")
    is_num  <- sapply(data, is.numeric)
    aes_vars <- names(data)[non_pos & !is_num]
    grps <- do.call("interaction", c(as.list(data[aes_vars]), drop = TRUE))
    xbnds <- ggplot2:::hex_bounds(data$x, binwidth[1])
    xbins <- diff(xbnds)/binwidth[1]
    ybnds <- ggplot2:::hex_bounds(data$y, binwidth[2])
    ybins <- diff(ybnds)/binwidth[2]
    do.call("rbind", Map(function(data, wt) {
      out <- hexify(data$x, data$y, wt, xbnds, ybnds, xbins,
                    ybins, binwidth, sum)
      for(var in aes_vars) out[[var]] <- data[[var]][1]
      out$density <- as.vector(out$value/sum(out$value, na.rm = TRUE))
      out$ndensity <- out$density/max(out$density, na.rm = TRUE)
      out$count <- out$value
      out$ncount <- out$count/max(out$count, na.rm = TRUE)
      out$value <- NULL
      out$group <- 1
      out}, split(data, grps), split(wt, grps)))})

Finally, we need to write a geom function so that we can easily invoke all of the above in a ggplot call:

geom_hextri <- function(
    mapping     = aes(),
    data        = NULL,
    stat        = "hextri",
    position    = "identity",
    na.rm       = FALSE,
    show.legend = NA,
    inherit.aes = TRUE,
    bins        = 10,
    ...) {
  
      ggplot2::layer(
        geom        = GeomHextri,
        data        = data,
        mapping     = mapping,
        stat        = stat,
        position    = position,
        show.legend = show.legend,
        inherit.aes = inherit.aes,
        params      = list(na.rm = na.rm, bins = bins, ...)
      )
  }

huangapple
  • 本文由 发表于 2023年7月18日 10:37:04
  • 转载请务必保留本文链接:https://go.coder-hub.com/76709204.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定