重新创建一个使用ggplot的词汇网格。

huangapple go评论69阅读模式
英文:

recreating a lexis grid using ggplot

问题

我正在追踪一个回答,我得到了这个问题上关于创建Lexis网格的答案。虽然答案让我能够将我的数据与Lexis网格叠加,但由于我的数据密度,Lexis网格完全被填充物遮盖。我找到了一种将网格移到前面的hacky解决方案:

library(ggplot2)

p <- mylexis + 
  geom_tile(data = df, mapping = aes(x = as.Date(paste0(year, "-01-01")), y = age, fill = event))

p$layers <- p$layers[c(3, 1, 2)]

p

这最初起作用,但当我在绘图中添加更多细节以创建更多层时,它作为解决方案有点不完善。

所以现在我尝试完全绕过lexis_grid命令和LexisPlotR包。相反,我只想添加一系列垂直、水平和对角线。

我想要的是以下图像,来自这篇文章:

重新创建一个使用ggplot的词汇网格。

这是我正在尝试的:

library('dplyr')
library('ggplot2')
library('viridis')

df <- data.frame(
  year <-  sample(c(1900:2021), 1000, TRUE),
  age <-    sample(c(0:80), 1000, TRUE),
  event <- sample(c(0:5), 1000, TRUE)
)

colnames(df) <- c("year", "age", "event")

ggplot(df, aes(x=year, y=age, fill=event)) +
  geom_tile() +
  scale_fill_viridis() + 
  geom_hline(yintercept=seq(0, 80, by=10)) +
  geom_vline(xintercept=seq(1900,2030, by=10)) +
  geom_abline(intercept=seq(0, 80, by=10), slope=1) + 
  labs(fill = "Count",
       title = "Events")

这给我以下结果:

重新创建一个使用ggplot的词汇网格。

问题是,我不知道为什么在0线下和1900线的左侧有点。而且我不知道我在geom_abline方面做错了什么,但我无法使对角线起作用。

英文:

I am following up to an answer I got to this question on creating a lexis grid. While the answer got me to the point where I could overlay my data with a lexis grid, due to the density of my data, the lexis grid was completely obscured by the fill. I got a hacky sort of response for bringing the grid to the front with the solution:

library(ggplot2)

p &lt;- mylexis + 
  geom_tile(data = df, mapping = aes(x = as.Date(paste0(year, &quot;-01-01&quot;)), y = age, fill = event))

p$layers &lt;- p$layers[c(3, 1, 2)]

p

This worked initially, however, as I added more detail to the plot that created more layers, it sort of fell apart as a solution.

So I am now trying to completely circumvent the lexis_grid command and the LexisPlotR package. Instead, I just want to add a sequence of vertical, horizontal, and diagonal lines.

What I want is along the lines of the following image, from this article:

重新创建一个使用ggplot的词汇网格。

This is what I am trying:

library(&#39;dplyr&#39;)
library(&#39;ggplot2&#39;)
library(&#39;viridis&#39;)

df &lt;- data.frame(
  year &lt;-  sample(c(1900:2021), 1000, TRUE),
  age &lt;-    sample(c(0:80), 1000, TRUE),
  event &lt;- sample(c(0:5), 1000, TRUE)
)

colnames(df) &lt;- c(&quot;year&quot;, &quot;age&quot;, &quot;event&quot;)


ggplot(df, aes(x=year, y=age, fill=event)) +
  geom_tile() +
  scale_fill_viridis() + 
  geom_hline(yintercept=seq(0, 80, by=10)) +
  geom_vline(xintercept=seq(1900,2030, by=10)) +
  geom_abline(intercept=seq(0, 80, by=10), slope=1) + 
  labs(fill = &quot;Count&quot;,
       title = &quot;Events&quot;)

Which gets me the following:

重新创建一个使用ggplot的词汇网格。

The problem is, I don't know why there are dots below the 0 line and to the left of the 1900 line. And I have no idea what I'm doing wrong with the geom_abline, but I can't get the diagonals to work for anything.

答案1

得分: 1

以下是您要翻译的部分:

"The reason the tiles fill to the left of 1900 and below 0 is that the tiles are centered on the x/y coordinates, and they spill out in all directions based in width/height. I don't know of any way of showing dots with finite size that don't visually appear to spill out outside of the domain of values."

瓷砖填充到1900年左侧和0以下的原因是,这些瓷砖是基于width/heightx/y坐标的中心,它们向各个方向溢出。我不知道有什么办法可以显示有限大小的点,而这些点在数值域之外不会在视觉上溢出。

"The reason your ablines do not show is that your intercept is assuming a 0,0 origin on the plot, but your x axis starts at 1900. The "real" (y-)intercept is far below 0; 1900 below, to be precise. If we accommodate that (and widen the range a bit), we can see the diagonal lines."

你的abline不显示的原因是,你的截距假设绘图的原点是0,0,但你的x轴从1900年开始。 "真正的" (y-)截距远低于0;确切地说,低于1900。如果我们适应这一点(并稍微扩大范围),我们就能看到对角线。

"Reproducible data (using set.seed) and correcting for the unadvised use of &lt;- inside of data.frame:"
"可复制的数据(使用set.seed)并纠正了在data.frame内部不建议使用&lt;-的情况:"

library('dplyr')
library('ggplot2')
library('viridis')
set.seed(42)
df <- data.frame(
  year  = sample(c(1900:2021), 1000, TRUE),
  age   = sample(c(0:80), 1000, TRUE),
  event = sample(c(0:5), 1000, TRUE)
)
head(df)
#   year age event
# 1 1948  66     1
# 2 2000  13     4
# 3 1964  72     0
# 4 1924  55     1
# 5 1973  43     2
# 6 1999  54     1
ggplot(df, aes(x=year, y=age, fill=event)) +
  geom_tile() +
  scale_fill_viridis() + 
  geom_hline(yintercept=seq(0, 80, by = 10)) +
  geom_vline(xintercept=seq(1900, 2030, by = 10)) +
  geom_abline(intercept=seq(0, 200, by = 10) - 2020, slope = 1) + 
  labs(fill = "Count", title = "Events")

上面是可重现的数据(使用set.seed),并纠正了在data.frame内部不建议使用&lt;-的情况的示例代码。

英文:

The reason the tiles fill to the left of 1900 and below 0 is that the tiles are centered on the x/y coordinates, and they spill out in all directions based in width/height. I don't know of any way of showing dots with finite size that don't visually appear to spill out outside of the domain of values.

The reason your ablines do not show is that your intercept is assuming a 0,0 origin on the plot, but your x axis starts at 1900. The "real" (y-)intercept is far below 0; 1900 below, to be precise. If we accommodate that (and widen the range a bit), we can see the diagonal lines.

Reproducible data (using set.seed) and correcting for the unadvised use of &lt;- inside of data.frame:

library(&#39;dplyr&#39;)
library(&#39;ggplot2&#39;)
library(&#39;viridis&#39;)
set.seed(42)
df &lt;- data.frame(
  year  = sample(c(1900:2021), 1000, TRUE),
  age   = sample(c(0:80), 1000, TRUE),
  event = sample(c(0:5), 1000, TRUE)
)
head(df)
#   year age event
# 1 1948  66     1
# 2 2000  13     4
# 3 1964  72     0
# 4 1924  55     1
# 5 1973  43     2
# 6 1999  54     1
ggplot(df, aes(x=year, y=age, fill=event)) +
  geom_tile() +
  scale_fill_viridis() + 
  geom_hline(yintercept=seq(0, 80, by = 10)) +
  geom_vline(xintercept=seq(1900, 2030, by = 10)) +
  geom_abline(intercept=seq(0, 200, by = 10) - 2020, slope = 1) + 
  labs(fill = &quot;Count&quot;, title = &quot;Events&quot;)

重新创建一个使用ggplot的词汇网格。

The use of seq(0, 200, by = 10) is because we have 80/10=8 lines to draw originating from the left-border, and (2020-1900)/10=12 lines to draw originating from the bottom-border. You can change to seq(-10, 200, by = 10) - 2020 to fill in that last diagonal. It's okay to over-draw some ablines, they will be optimized out of the plot. (For instance, seq(-50, 300, by = 10) - 2020 works without otherwise affecting the x/y limits.)

huangapple
  • 本文由 发表于 2023年2月14日 00:27:33
  • 转载请务必保留本文链接:https://go.coder-hub.com/75438637.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定