为什么在Rstudio中使用barplot时我会收到错误消息“名称数量不正确”?

huangapple go评论79阅读模式
英文:

Why I´m getting this error using barplot in Rstudio "incorrect number of names"?

问题

我正在尝试在Rstudio中为祖先制作一个条形图,但出现了错误"incorrect number of names"。

这是我运行的代码:

```{r}
# Cargar los datos
datos <- read.table("admix.txt", header = TRUE)

# Definir los nombres de las poblaciones
Poblacion <- c("America","Europa","Eurasia","EurAm","Europasia")

# Crear el gráfico de Admixture
barplot(t(as.matrix(datos[, -(1)])), col = rainbow(ncol(datos)-1), 
        xlab = "Individuo", ylab = "Proporción", names.arg = Poblacion)

我的数据集:

...

origen 1 和 2 是使用混合得到的每个个体的祖先比例

我尝试了上面显示的代码,我期望得到这个:

为什么在Rstudio中使用barplot时我会收到错误消息“名称数量不正确”?


<details>
<summary>英文:</summary>

I&#180;m trying to make a barplot for ancestry in Rstudio, an got the error &quot;incorrect number of names&quot;

This is the code I run:

```{r}
# Cargar los datos
datos &lt;- read.table(&quot;admix.txt&quot;, header = TRUE)

# Definir los nombres de las poblaciones
Poblacion &lt;- c(&quot;America&quot;,&quot;Europa&quot;,&quot;Eurasia&quot;,&quot;EurAm&quot;,&quot;Europasia&quot;)

# Crear el gr&#225;fico de Admixture
barplot(t(as.matrix(datos[, -(1)])), col = rainbow(ncol(datos)-1), 
        xlab = &quot;Individuo&quot;, ylab = &quot;Proporci&#243;n&quot;, names.arg = Poblacion)

My dataset:

Poblacion	origen1	origen2
America	0.006666	0.993334
America	0.779961	0.220039
America	0.427611	0.572389
America	0.813640	0.186360
America	0.652604	0.347396
America	0.499865	0.500135
America	0.290712	0.709288
America	0.447847	0.552153
America	0.840954	0.159046
America	0.523092	0.476908
America	0.000010	0.999990
America	0.143286	0.856714
America	0.472235	0.527765
America	0.771131	0.228869
America	0.511068	0.488932
America	0.025474	0.974526
America	0.000010	0.999990
America	0.005296	0.994704
America	0.685525	0.314475
America	0.418856	0.581144
America	0.653668	0.346332
America	0.225173	0.774827
America	0.383285	0.616715
America	0.058886	0.941114
America	0.009342	0.990658
America	0.015007	0.984993
America	0.002664	0.997336
America	0.000010	0.999990
America	0.145986	0.854014
America	0.000010	0.999990
America	0.015244	0.984756
America	0.000010	0.999990
America	0.000010	0.999990
America	0.167392	0.832608
America	0.640400	0.359600
EurAm	0.000648	0.999352
EurAm	0.255487	0.744513
EurAm	0.000010	0.999990
EurAm	0.450210	0.549790
EurAm	0.000010	0.999990
EurAm	0.546981	0.453019
EurAm	0.484598	0.515402
EurAm	0.086021	0.913979
EurAm	0.285348	0.714652
EurAm	0.031093	0.968907
EurAm	0.069430	0.930570
EurAm	0.037918	0.962082
EurAm	0.022321	0.977679
EurAm	0.320998	0.679002
EurAm	0.106400	0.893600
EurAm	0.048877	0.951123
EurAm	0.182298	0.817702
EurAm	0.031725	0.968275
EurAm	0.312833	0.687167
EurAm	0.457584	0.542416
EurAm	0.054852	0.945148
EurAm	0.553960	0.446040
EurAm	0.002580	0.997420
EurAm	0.025126	0.974874
EurAm	0.999990	0.000010
EurAm	0.000010	0.999990
EurAm	0.147882	0.852118
EurAm	0.000010	0.999990
EurAm	0.221932	0.778068
EurAm	0.181649	0.818351
EurAm	0.595149	0.404851
EurAm	0.681347	0.318653
EurAm	0.000010	0.999990
EurAm	0.702988	0.297012
EurAm	0.000010	0.999990
EurAm	0.002774	0.997226
Eurasia	0.005494	0.994506
Eurasia	0.000010	0.999990
Eurasia	0.019013	0.980987
Eurasia	0.019751	0.980249
Eurasia	0.023125	0.976875
Eurasia	0.335525	0.664475
Eurasia	0.019229	0.980771
Eurasia	0.028028	0.971972
Eurasia	0.000010	0.999990
Eurasia	0.667998	0.332002
Eurasia	0.000010	0.999990
Europa	0.021506	0.978494
Europa	0.085614	0.914386
Europa	0.002423	0.997577
Europa	0.136019	0.863981
Europa	0.000010	0.999990
Europa	0.001705	0.998295
Europa	0.008959	0.991041
Europa	0.005611	0.994389
Europa	0.000010	0.999990
Europa	0.000010	0.999990
Europa	0.011926	0.988074
Europa	0.685324	0.314676
Europa	0.026084	0.973916
Europa	0.000010	0.999990
Europa	0.016599	0.983401
Europa	0.007035	0.992965
Europa	0.132058	0.867942
Europa	0.005673	0.994327
Europa	0.000010	0.999990
Europa	0.007433	0.992567
Europa	0.022336	0.977664
Europa	0.000010	0.999990
Europa	0.076555	0.923445
Europa	0.205925	0.794075
Europa	0.023510	0.976490
Europa	0.003213	0.996787
Europa	0.000010	0.999990
Europa	0.000010	0.999990
Europa	0.020198	0.979802
Europa	0.000010	0.999990
Europa	0.174797	0.825203
Europa	0.130237	0.869763
Europa	0.128710	0.871290
Europa	0.015761	0.984239
Europa	0.016476	0.983524
Europa	0.016811	0.983189
Europa	0.000863	0.999137
Europa	0.162520	0.837480
Europa	0.000010	0.999990
Europa	0.004684	0.995316
Europa	0.019208	0.980792
Europa	0.492487	0.507513
Europa	0.000010	0.999990
Europa	0.000010	0.999990
Europa	0.015666	0.984334
Europa	0.000010	0.999990
Europa	0.018586	0.981414
Europa	0.228070	0.771930
Europa	0.054701	0.945299
Europa	0.015723	0.984277
Europa	0.000010	0.999990
Europa	0.147377	0.852623
Europa	0.000010	0.999990
Europa	0.015433	0.984567
Europa	0.194324	0.805676
Europa	0.142146	0.857854
Europa	0.181220	0.818780
Europa	0.003677	0.996323
Europa	0.355231	0.644769
Europa	0.402608	0.597392
Europa	0.067520	0.932480
Europa	0.171952	0.828048
Europa	0.014737	0.985263
Europa	0.000010	0.999990
Europa	0.003896	0.996104
Europa	0.000010	0.999990
Europa	0.795202	0.204798
Europa	0.006578	0.993422
Europa	0.021397	0.978603
Europa	0.145587	0.854413
Europa	0.062430	0.937570
Europa	0.000010	0.999990
Europa	0.012280	0.987720
Europa	0.999990	0.000010
Europa	0.020080	0.979920
Europa	0.134631	0.865369
Europasia	0.023057	0.976943
Europasia	0.000010	0.999990
Europasia	0.016871	0.983129
Europasia	0.058525	0.941475

origen 1 and 2 are the proportions of ancestry for each individual obtained with admixture

I tried the code I showed you above,
I expect this:
为什么在Rstudio中使用barplot时我会收到错误消息“名称数量不正确”?

答案1

得分: 0

由于您的 Poblacion 向量只有长度为5,barplot 不知道您希望 names.arg 在每个个体中正确重复。我可以建议您改用 ggplot2 吗?这可能不是最有效或最优雅的解决方案,但可以让您离您想要的目标更近一步。

我读取了您的数据,并将其命名为 df

# 如果您尚未安装这些包,请安装它们
library(forcats)
library(tidyverse)
library(reshape2)

df$Poblacion <- factor(df$Poblacion)
df$ID <- as.numeric(row.names(df))

# 为 ggplot 重塑数据集
melt.df <- melt(df, id.vars = c("ID","Poblacion"), value.name = "percentage")

# 这是为了颠倒因子水平,因为在这里描述的排序 geom_bar 和图例存在问题
melt.df$reassigned.origen <- "origen1"
melt.df$reassigned.origen[melt.df$variable=="origen1"] <- "origen2"

melt.df$ID <- as.numeric(melt.df$ID)
melt.df$reassigned.origen <- factor(melt.df$reassigned.origen)

# > head(melt.df)
#   ID Poblacion variable percentage reassigned.origen
# 1  1   America  origen1   0.006666           origen2
# 2  2   America  origen1   0.779961           origen2
# 3  3   America  origen1   0.427611           origen2
# 4  4   America  origen1   0.813640           origen2
# 5  5   America  origen1   0.652604           origen2
# 6  6   America  origen1   0.499865           origen2

# 这是为了标记 x 轴;它计算每个地区的中位数 ID
ID.medians <- aggregate(ID ~ Poblacion, data = melt.df, summary)

ggplot(data=melt.df) +
  # 绘制一个堆叠的柱状图,总和为100%;看起来有点奇怪,但可以得到正确的百分比和水平顺序
  geom_bar(aes(x=ID, y=percentage, fill = forcats::fct_rev(reassigned.origen), 
               color=Poblacion), stat="identity") +
  # 这删除了图例的标题
  theme(legend.title=element_blank()) +
  # 在 x 轴上添加地区
  scale_x_continuous(breaks = ID.medians$ID[,3],
                     labels = levels(ID.medians$Poblacion), minor_breaks=NULL) +
  # x 轴和 y 轴的标签
  labs(x = "", y = "Percentage origen1")

这是我得到的结果图像:
为什么在Rstudio中使用barplot时我会收到错误消息“名称数量不正确”?

您可能需要在颜色、因子水平和格式方面做更多的工作。希望这可以帮助您入门。

英文:

Since your Poblacion vector is only of length 5, barplot does not know that you want names.arg to be repeated correctly per each individual. May I suggest that you use ggplot2 instead? This may not be the most efficient or elegant solution but it gets you a few steps closer to what you want.

I read in your data and called it df.

#install these packages if you don&#39;t already have them installed
library(forcats)
library(tidyverse)
library(reshape2)

df$Poblacion &lt;- factor(df$Poblacion)
df$ID &lt;- as.numeric(row.names(df))

#reshape the data set for ggplot
melt.df &lt;- melt(df, id.vars = c(&quot;ID&quot;,&quot;Poblacion&quot;), value.name = &quot;percentage&quot;)

#this is to reverse the factor levels because there is a problem with ordering 
#geom_bar and legends that is too involved to describe here
melt.df$reassigned.origen&lt;-&quot;origen1&quot;
melt.df$reassigned.origen[melt.df$variable==&quot;origen1&quot;]&lt;-&quot;origen2&quot;

melt.df$ID &lt;- as.numeric(melt.df$ID)
melt.df$reassigned.origen &lt;- factor(melt.df$reassigned.origen)

#&gt; head(melt.df)
#  ID Poblacion variable percentage reassigned.origen
#1  1   America  origen1   0.006666           origen2
#2  2   America  origen1   0.779961           origen2
#3  3   America  origen1   0.427611           origen2
#4  4   America  origen1   0.813640           origen2
#5  5   America  origen1   0.652604           origen2
#6  6   America  origen1   0.499865           origen2

#this is for labeling the x axis; it calculates the median ID per region
ID.medians &lt;- aggregate(ID ~ Poblacion, data = melt.df, summary)

ggplot(data=melt.df) +
#plot a stacked bar that sums to 100%; it looks strange but 
#gets the percentages and levels in the right order
  geom_bar(aes(x=ID, y=percentage, fill = forcats::fct_rev(reassigned.origen), 
               color=Poblacion), stat=&quot;identity&quot;) +
#this removes the title for the legend
  theme(legend.title=element_blank()) +
#adds the regions at the x axis
  scale_x_continuous(breaks = ID.medians$ID[,3],
                     labels = levels(ID.medians$Poblacion), minor_breaks=NULL) +
#labels for x and y axes
  labs(x = &quot;&quot;, y = &quot;Percentage origen1&quot;)

This is what I get:
为什么在Rstudio中使用barplot时我会收到错误消息“名称数量不正确”?

You will likely have to do quite a bit more work with colors, factor levels, and formatting. Hopefully this will get you started though.

huangapple
  • 本文由 发表于 2023年7月14日 04:38:37
  • 转载请务必保留本文链接:https://go.coder-hub.com/76683097.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定