英文:
Annotating only unique duplicated key values on a diverging bar chart in ggplot2
问题
我有一个包含2列和40行的数据框(df)。第一列包含重复的键/ID值,第二列包含20个正值,后面跟着20个负值。
因此,我决定使用一个分散的条形图。但每当我绘制图表时,X轴上的文本会显示两次 - 一个集合(例如前20个正值)重叠在另一个集合(例如后20个负值)上。我的解决方案是使用 scale_x_discrete ()
,部分原因是这种方式看起来更好。
但我仍然需要显示X轴上的文本。我考虑在一组条形图的底部显示它(正值)。就像这样:
但是,当我尝试按照下面示例代码所示进行操作时,键值(col1)仍然重叠!或者它们看起来就像是粗体... 无论哪种方式,我都无法做到这一点 =//
我该怎么做?
数据
#示例df:
structure(list(col1 = c("A", "B", "C", "D", "E", "F", "G", "H",
"I", "J", "A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "A",
"B", "C", "D", "E", "F", "G", "H", "I", "J", "A", "B", "C", "D",
"E", "F", "G", "H", "I", "J"), col2 = c(18.5817806317937, 28.1916172143538,
8.66620996058919, 12.0227236610372, 24.4170182822272, 29.3641960325185,
28.7800777778029, 23.1192238365766, 15.7798075131141, 2.86982706259005,
19.6636101899203, 27.5613576434553, 3.76174484286457, 9.56581128691323,
23.3280192685779, 8.42091225110926, 16.01897605462, 20.6576479838695,
5.26960676000454, 21.3152553031687, -1, -14.7368421052632, -10.1578947368421,
-2.52631578947368, -13.2105263157895, -25.4210526315789, -5.57894736842105,
-4.05263157894737, -26.9473684210526, -28.4736842105263, -22.3684210526316,
-7.10526315789474, -19.3157894736842, -23.8947368421053, -17.7894736842105,
-30, -11.6842105263158, -8.63157894736842, -20.8421052631579,
-16.2631578947368)), class = "data.frame", row.names = c(NA,
-40L))
#示例图:
ggplot(df, aes(x = reorder(col1, col2), y = col2)) +
geom_bar(stat = "identity", show.legend = FALSE) +
geom_text(aes(x = 5, y = 0.07, label = paste(col1, collapse = " "), family = "Futura"), color = "black", size = 5) +
xlab("Group") +
ylab("Value") +
theme(axis.text.x = element_blank(), axis.ticks.x = element_blank())
<details>
<summary>英文:</summary>
Say I have a dataframe (df) with a total of 2 columns and 40 rows. The first column have duplicated key/ID values and the second contains 20 positive values, followed by 20 negative ones.
Because of this, I decided to go for a diverging bar chart. But whenever I plotted the chart, the X-axis text was being displayed twice — like, with one set (e.g. the first 20 positive values) overlapping the other (e.g. the last 20 negative values). My solution was to use ```scale_x_discrete ()```; in part because it looked way better this way, too.
But I still needed to show the X-axis text. I thought about displaying it at the base of one set of bars (the positive ones). Like this:
[![enter image description here][1]][1]
(But with the annotated text more spaced, fitting the center of each bar).
But when I try to do this as shown in my sample code below, the key values (col1) are still being overlapped! Or maybe they just look like they're in bold... Either way, I can't get this right =//
What could I do?
DATA
----
#Sample df:
structure(list(col1 = c("A", "B", "C", "D", "E", "F", "G", "H",
"I", "J", "A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "A",
"B", "C", "D", "E", "F", "G", "H", "I", "J", "A", "B", "C", "D",
"E", "F", "G", "H", "I", "J"), col2 = c(18.5817806317937, 28.1916172143538,
8.66620996058919, 12.0227236610372, 24.4170182822272, 29.3641960325185,
28.7800777778029, 23.1192238365766, 15.7798075131141, 2.86982706259005,
19.6636101899203, 27.5613576434553, 3.76174484286457, 9.56581128691323,
23.3280192685779, 8.42091225110926, 16.01897605462, 20.6576479838695,
5.26960676000454, 21.3152553031687, -1, -14.7368421052632, -10.1578947368421,
-2.52631578947368, -13.2105263157895, -25.4210526315789, -5.57894736842105,
-4.05263157894737, -26.9473684210526, -28.4736842105263, -22.3684210526316,
-7.10526315789474, -19.3157894736842, -23.8947368421053, -17.7894736842105,
-30, -11.6842105263158, -8.63157894736842, -20.8421052631579,
-16.2631578947368)), class = "data.frame", row.names = c(NA,
-40L))
#Sample plot:
ggplot(df, aes(x = reorder (col1, col2), y = col2)) +
geom_bar(stat = "identity", show.legend = FALSE) +
geom_text (aes (x = 5, y = 0.07, label = paste (col1, collapse = " "), family = "Futura"), color = "black", size = 5) +
xlab("Group") +
ylab("Value") +
theme (axis.text.x = element_blank(), axis.ticks.x = element_blank())
[1]: https://i.stack.imgur.com/GAszI.png
</details>
# 答案1
**得分**: 1
我觉得在使用ggplot2的不同层次时,如果我们在传递给ggplot之前准备好变量顺序,会更容易操作。在这里,我将`col1`基于`col2`(默认使用中位数值)转换为有序因子。
```R
library(ggplot2); library(dplyr)
df |>
mutate(col1 = forcats::fct_reorder(col1, col2)) |>
ggplot(aes(x = col1, y = col2)) +
geom_bar(stat = "identity", show.legend = FALSE) +
geom_text(aes(y = 0.07, label = col1), size = 5,
data = distinct(df, col1)) + # 只需要每个col1一个观察值
xlab("分组") +
ylab("数值") +
theme(axis.text.x = element_blank(), axis.ticks.x = element_blank())
<details>
<summary>英文:</summary>
I find it easier to work with different layers in ggplot2 if we prepare the variable order before it gets to ggplot. Here I make `col1` an ordered factor based on `col2` (by default using the median value).
library(ggplot2); library(dplyr)
df |>
mutate(col1 = forcats::fct_reorder(col1, col2)) |>
ggplot(aes(x = col1, y = col2)) +
geom_bar(stat = "identity", show.legend = FALSE) +
geom_text(aes(y = 0.07, label = col1), size = 5,
data = distinct(df, col1)) + # only need one obs per col1
xlab("Group") +
ylab("Value") +
theme(axis.text.x = element_blank(), axis.ticks.x = element_blank())
[![enter image description here][1]][1]
[1]: https://i.stack.imgur.com/MmJXv.png
</details>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论