在ggplot2中,只标注唯一重复的键值在一个发散的条形图上。

huangapple go评论78阅读模式
英文:

Annotating only unique duplicated key values on a diverging bar chart in ggplot2

问题

我有一个包含2列和40行的数据框(df)。第一列包含重复的键/ID值,第二列包含20个正值,后面跟着20个负值。

因此,我决定使用一个分散的条形图。但每当我绘制图表时,X轴上的文本会显示两次 - 一个集合(例如前20个正值)重叠在另一个集合(例如后20个负值)上。我的解决方案是使用 scale_x_discrete (),部分原因是这种方式看起来更好。

但我仍然需要显示X轴上的文本。我考虑在一组条形图的底部显示它(正值)。就像这样:

在ggplot2中,只标注唯一重复的键值在一个发散的条形图上。
(但注释文本更间隔,适合每个条形图的中心)。

但是,当我尝试按照下面示例代码所示进行操作时,键值(col1)仍然重叠!或者它们看起来就像是粗体... 无论哪种方式,我都无法做到这一点 =//

我该怎么做?

数据

#示例df:
structure(list(col1 = c("A", "B", "C", "D", "E", "F", "G", "H", 
"I", "J", "A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "A", 
"B", "C", "D", "E", "F", "G", "H", "I", "J", "A", "B", "C", "D", 
"E", "F", "G", "H", "I", "J"), col2 = c(18.5817806317937, 28.1916172143538, 
8.66620996058919, 12.0227236610372, 24.4170182822272, 29.3641960325185, 
28.7800777778029, 23.1192238365766, 15.7798075131141, 2.86982706259005, 
19.6636101899203, 27.5613576434553, 3.76174484286457, 9.56581128691323, 
23.3280192685779, 8.42091225110926, 16.01897605462, 20.6576479838695, 
5.26960676000454, 21.3152553031687, -1, -14.7368421052632, -10.1578947368421, 
-2.52631578947368, -13.2105263157895, -25.4210526315789, -5.57894736842105, 
-4.05263157894737, -26.9473684210526, -28.4736842105263, -22.3684210526316, 
-7.10526315789474, -19.3157894736842, -23.8947368421053, -17.7894736842105, 
-30, -11.6842105263158, -8.63157894736842, -20.8421052631579, 
-16.2631578947368)), class = "data.frame", row.names = c(NA, 
-40L))


#示例图:
ggplot(df, aes(x = reorder(col1, col2), y = col2)) +
geom_bar(stat = "identity", show.legend = FALSE) +
geom_text(aes(x = 5, y = 0.07, label = paste(col1, collapse = " "), family = "Futura"), color = "black", size = 5) +
xlab("Group") +
ylab("Value") +
theme(axis.text.x = element_blank(), axis.ticks.x = element_blank())

<details>
<summary>英文:</summary>

Say I have a dataframe (df) with a total of 2 columns and 40 rows. The first column have duplicated key/ID values and the second contains 20 positive values, followed by 20 negative ones.

Because of this, I decided to go for a diverging bar chart. But whenever I plotted the chart, the X-axis text was being displayed twice — like, with one set (e.g. the first 20 positive values) overlapping the other (e.g. the last 20 negative values). My solution was to use ```scale_x_discrete ()```; in part because it looked way better this way, too.

But I still needed to show the X-axis text. I thought about displaying it at the base of one set of bars (the positive ones). Like this:

[![enter image description here][1]][1]
(But with the annotated text more spaced, fitting the center of each bar).

But when I try to do this as shown in my sample code below, the key values (col1) are still being overlapped! Or maybe they just look like they&#39;re in bold... Either way, I can&#39;t get this right =//

What could I do?

DATA
----

#Sample df:
structure(list(col1 = c("A", "B", "C", "D", "E", "F", "G", "H",
"I", "J", "A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "A",
"B", "C", "D", "E", "F", "G", "H", "I", "J", "A", "B", "C", "D",
"E", "F", "G", "H", "I", "J"), col2 = c(18.5817806317937, 28.1916172143538,
8.66620996058919, 12.0227236610372, 24.4170182822272, 29.3641960325185,
28.7800777778029, 23.1192238365766, 15.7798075131141, 2.86982706259005,
19.6636101899203, 27.5613576434553, 3.76174484286457, 9.56581128691323,
23.3280192685779, 8.42091225110926, 16.01897605462, 20.6576479838695,
5.26960676000454, 21.3152553031687, -1, -14.7368421052632, -10.1578947368421,
-2.52631578947368, -13.2105263157895, -25.4210526315789, -5.57894736842105,
-4.05263157894737, -26.9473684210526, -28.4736842105263, -22.3684210526316,
-7.10526315789474, -19.3157894736842, -23.8947368421053, -17.7894736842105,
-30, -11.6842105263158, -8.63157894736842, -20.8421052631579,
-16.2631578947368)), class = "data.frame", row.names = c(NA,
-40L))

#Sample plot:
ggplot(df, aes(x = reorder (col1, col2), y = col2)) +
geom_bar(stat = "identity", show.legend = FALSE) +
geom_text (aes (x = 5, y = 0.07, label = paste (col1, collapse = " "), family = "Futura"), color = "black", size = 5) +
xlab("Group") +
ylab("Value") +
theme (axis.text.x = element_blank(), axis.ticks.x = element_blank())



  [1]: https://i.stack.imgur.com/GAszI.png

</details>


# 答案1
**得分**: 1

我觉得在使用ggplot2的不同层次时,如果我们在传递给ggplot之前准备好变量顺序,会更容易操作。在这里,我将`col1`基于`col2`(默认使用中位数值)转换为有序因子。

```R
library(ggplot2); library(dplyr)
df |&gt;
  mutate(col1 = forcats::fct_reorder(col1, col2)) |&gt;
ggplot(aes(x = col1, y = col2)) +
  geom_bar(stat = &quot;identity&quot;, show.legend = FALSE) +
  geom_text(aes(y = 0.07, label = col1), size = 5,
            data = distinct(df, col1)) +  # 只需要每个col1一个观察值 
  xlab(&quot;分组&quot;) +
  ylab(&quot;数值&quot;) + 
  theme(axis.text.x = element_blank(), axis.ticks.x = element_blank())

在ggplot2中,只标注唯一重复的键值在一个发散的条形图上。


<details>
<summary>英文:</summary>

I find it easier to work with different layers in ggplot2 if we prepare the variable order before it gets to ggplot. Here I make `col1` an ordered factor based on `col2` (by default using the median value).

    library(ggplot2); library(dplyr)
    df |&gt;
      mutate(col1 = forcats::fct_reorder(col1, col2)) |&gt;
    ggplot(aes(x = col1, y = col2)) +
      geom_bar(stat = &quot;identity&quot;, show.legend = FALSE) +
      geom_text(aes(y = 0.07, label = col1), size = 5,
                data = distinct(df, col1)) +  # only need one obs per col1 
      xlab(&quot;Group&quot;) +
      ylab(&quot;Value&quot;) + 
      theme(axis.text.x = element_blank(), axis.ticks.x = element_blank())

[![enter image description here][1]][1]


  [1]: https://i.stack.imgur.com/MmJXv.png

</details>



huangapple
  • 本文由 发表于 2023年6月1日 05:11:00
  • 转载请务必保留本文链接:https://go.coder-hub.com/76377329.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定