英文:
Plotting a grouped bar chart using ggplot
问题
我是您的中文翻译,以下是翻译好的部分:
我是新手学习R,尝试创建一个分组的条形图,但遇到了一些问题。我编写了一些代码,但图表的外观不符合我的期望。
dveR <- data.frame(values = c(3.921, 3.557, 3.793, 3.154, 2.387, 1.906), group = rep(c("Cardia", "Anterior", "Posterior"), each = 2), subgroup = LETTERS[1:2])
我希望坐标轴标签为Y=Log Re,X=tissue,并且希望条形的顺序为cardia、anterior、posterior,但它似乎按字母顺序排序它们。子组标签应该是0小时和24小时,但我不确定如何更改这些。任何帮助将不胜感激,谢谢!
英文:
im new to R and trying to create a grouped bar chart but having a few issues.
I written some code but the chart doesn't look the way I want it to.
dveR <- data.frame(values = c(3.921, 3.557, 3.793, 3.154, 2.387, 1.906),group = rep(c("Cardia","Anterior","Posterior"),each = 2),subgroup = LETTERS[1:2])
ggplot(dveR,aes(x = group,y = values, fill = subgroup)) + geom_bar(stat = "identity",position = "dodge") + scale_fill_manual(values=c("springgreen4","orange2")
I want the axis labels to be Y=Log Re and X=tissue and I want the order of the bars to be cardia, anterior, posterior but it seems to sort them in to alphabetical order. the subgroup labels should also be 0 hours and 24 but im not sure how to change this. Any help would be appreciated, thanks!
答案1
得分: 1
由于您是新手使用R,重要的是要了解R如何处理分类变量,或称为factors。如果您在数据框的一列中提供了一组字符值,R会尝试推断您打算表示这些值的意图。正如您发现的那样,它不总是能正确推断。
当您在数据框的一列中提供了一组字符值,然后尝试在回归或图形显示中使用这些值时,R通常会将这些值视为一个因子,并按字母数字顺序对因子的水平进行排序。我发现最安全的方法是自己将这些列转换为因子,并指定所需的顺序。在这种情况下,有一种做法是:
dveR$group <- factor(dveR$group)
dveR$group <- relevel(dveR$group, ref="Cardia")
有评论建议您在ggplot
中直接使用forcats
包的fct_inorder()
函数来执行此操作。该包可以使这些操作更容易,但我认为您最好在学习初期自己进行这些操作,以便更好地理解正在发生的事情。
在评论中建议的labs(x="tissue", y="Log Re")
术语是指定坐标轴标签的一种方法。您还可以使用xlab("Tissue")
和ylab("Log Re")
。
您可以使用scale_fill_manual()
的其他参数来设置图例的name
和因子水平的labels
。
scale_fill_manual(values = c("springgreen4","orange2"),
labels = c("0","24"), name = "Hour")
危险在于您可能会在原始因子水平和您要求ggplot
显示的内容之间出现混淆。我发现更安全的做法是在数据框中指定明确表示我想要显示的内容的因子。例如,在这种情况下:
dveR[,"Hour"] <- ifelse (dveR$subgroup=="A", "0" ,"24")
然后,在重新设置group
并定义Hour
之后,您可以使用以下代码生成所需的图形:
ggplot(dveR, aes(x = group, y = values, fill = Hour)) +
geom_bar(stat = "identity", position = "dodge") +
scale_fill_manual(values = c("springgreen4", "orange2")) +
xlab("Tissue") + ylab("Log Re")
然而,对于这种类型的数据,您可能根本不想显示柱状图。正如Edward Tufte经常指出的那样,这些图表占用了大量的空间,却没有传达大量的信息。例如,来自此页面的引用:
柱状图浪费空间;在现在显示1个数字的地方,您可以显示至少100个数字。人们经常在表格中阅读数字(请参阅任何一份优秀报纸的财经版和体育版),他们不需要柱状图来看到一个数字。
此外,柱状图有时会被误用(与您的使用方式不同),将刻度底部重新设置为与范围顶部接近的值,以使微小的差异看起来非常大。因此,我建议学习如何正确指定坐标轴标签和图例,但选择不同的方式来显示这种类型的数据。
英文:
As you are new to R, it's important that you learn about how R treats categorical variables, or factors. If you provide R with a set of character values in a column of a data frame, it tries to infer what you intended those values to represent. As you discovered, it won't always make the correct inference.
When you provide a set of character values in a column of a data frame and then try to use those values in a regression or a graphical display, R will typically treat those values as a factor but will order the levels of the factor alphanumerically. I find that it's safest to convert such columns to factors yourself and specify the order that you desire. One way to do that in this case is:
dveR$group <- factor(dveR$group)
dveR$group <- relevel(dveR$group,ref="Cardia")
A comment suggested that you use the fct_inorder()
function from the forcats
package to do that directly in ggplot
. That package can make such manipulations easier, but I think that you will be better off if you do these manipulations yourself at the start of your learning so that you have a better idea of what's going on.
The labs(x= "tissue", y="Log Re")
term suggested in a comment is one way to specify axis labels. You can also use xlab("Tissue")
and ylab("Log Re")
.
You can use additional arguments to scale_fill_manual()
to set the name
of the legend and the labels
of the factor levels.
scale_fill_manual(values = c("springgreen4","orange2"),
labels = c("0","24"), name = "Hour")
The danger is that you might then accidentally mix up the order between the original factor levels and what you ask ggplot
to display. I find it safer to specify factors in the data frame that explicitly represent what I want to show. For example, in this case:
dveR[,"Hour"] <- ifelse (dveR$subgroup=="A", "0" ,"24")
Then, after re-leveling group
and defining Hour
as above, you could generate the plot you want with
ggplot(dveR,aes(x = group, y = values, fill = Hour)) +
geom_bar(stat = "identity", position = "dodge") +
scale_fill_manual(values = c("springgreen4", "orange2")) +
xlab("Tissue") + ylab("Log Re")
For this type of data, however, you might not want to show a bar chart at all. As Edward Tufte often notes, such plots use up a lot of space without conveying a lot of information. From this page, for example:
> The bar chart wastes space; you could show at least 100 numbers in the space that now shows 1 number. People read numbers in tables all the time (see the financial section and sports section of any good newspaper) and they don't need bars to see a bar to understand 1 number.
Also, bar charts are sometime misused (in a way that you aren't) by resetting the bottom of the scale to something other than 0, close to the top of the range, to make truly small differences appear very large. So I'd recommend learning how to specify axis labels and legends properly, but choosing a different way to display this type of data.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论