同时按照分组变量重新排序矩阵的行和列。

huangapple go评论79阅读模式
英文:

Reorder matrix rows and columns simultaneously by a group variable

问题

考虑下面的数据"df",其中包含了一个名为"g"的分组变量。我想找到一种按照"g"的顺序对数据进行行和列排序的方法。

g <- c(2, 1, 2, 1, 2)
df1 <- c(1, 0.2, 0.5, 0.8, 0.4)
df2 <- c(0.2, 1, 0.7, 0.6, 0.3)
df3 <- c(0.5, 0.7, 1, 0.4, 0.1)
df4 <- c(0.8, 0.6, 0.4, 1, 0.9)
df5 <- c(0.4, 0.3, 0.1, 0.9, 1)
df <- data.frame(g, df1, df2, df3, df4, df5)
colnames(df) <- c("g", 2, 1, 2, 1, 2)
df

这是一个大型数据集的示例。我使用R中的sort函数按照以下方式执行了任务:

r.df <- df[, sort.list(df[1,])]
s.df <- r.df[sort.list(r.df[, 1]),]

首先按照g的顺序(即第一行)对行进行排序,然后按照g的顺序(即第一列)对列进行排序。然而,第二个步骤会扭曲行的顺序。我希望得到类似下面的结果:

g 1   1   2   2   2
1 1.0 0.6 0.2 0.7 0.3
1 0.6 1.0 0.8 0.4 0.9
2 0.2 0.8 1.0 0.5 0.4
2 0.7 0.4 0.5 1.0 0.1
2 0.3 0.9 0.4 0.1 1.0

非常感谢任何帮助。

英文:

Consider the data "df" below with group variable "g". I am trying to find a way to sort the data by row and column according to the order of "g".

g &lt;- c(2, 1, 2, 1, 2)
df1 &lt;- c(1, 0.2, 0.5, 0.8, 0.4)
df2 &lt;- c(0.2, 1, 0.7, 0.6, 0.3)
df3 &lt;- c( 0.5, 0.7, 1, 0.4, 0.1) 
df4 &lt;- c(0.8, 0.6, 0.4, 1, 0.9) 
df5 &lt;- c(0.4, 0.3, 0.1, 0.9, 1) 
df &lt;- data.frame(g, df1, df2, df3, df4, df5)
colnames(df) &lt;- c(&quot;g&quot;, 2, 1, 2, 1, 2)
df

This is an example from a large data set. Using the sort function in R I performed the task as follows

r.df &lt;- df[,sort.list(df[1,])]
s.df &lt;- r.df[sort.list(r.df[,1]),]

by ordering the rows first according to g (i.e., first row) and then columns next by g (i.e., first column). However, the second stage distorts the order of the rows. I expect to have something like

g 1   1   2   2   2
1 1.0 0.6 0.2 0.7 0.3
1 0.6 1.0 0.8 0.4 0.9
2 0.2 0.8 1.0 0.5 0.4
2 0.7 0.4 0.5 1.0 0.1
2 0.3 0.9 0.4 0.1 1.0

Any help is hugely appreciated.

答案1

得分: 1

尝试这样做。

df[order(df$g), c(1L, order(colnames(df)[-1L]) + 1L)]
#   g   1 1.1   2 2.1 2.2
# 2 1 1.0 0.6 0.2 0.7 0.3
# 4 1 0.6 1.0 0.8 0.4 0.9
# 1 2 0.2 0.8 1.0 0.5 0.4
# 3 2 0.7 0.4 0.5 1.0 0.1
# 5 2 0.3 0.9 0.4 0.1 1.0

请注意,你的列名无效,可能存在重复。

英文:

Try this.

df[order(df$g), c(1L, order(colnames(df)[-1L]) + 1L)]
#   g   1 1.1   2 2.1 2.2
# 2 1 1.0 0.6 0.2 0.7 0.3
# 4 1 0.6 1.0 0.8 0.4 0.9
# 1 2 0.2 0.8 1.0 0.5 0.4
# 3 2 0.7 0.4 0.5 1.0 0.1
# 5 2 0.3 0.9 0.4 0.1 1.0

Note that your column names are invalid, may not be duplicated.

答案2

得分: 0

函数order可以接受多个向量作为参数进行排序。如果你使用order(v1, v2, v3),那么它将首先按照v1进行排序,然后使用v2解决相同值的情况,依此类推。

以下是按照列g和第一列进行排序的方法:

neworder <- order(df$g, df[,2])
df[neworder, ]
# g   2   1   2   1   2
# 2 1 0.2 1.0 0.7 0.6 0.3
# 4 1 0.8 0.6 0.4 1.0 0.9
# 5 2 0.4 0.3 0.1 0.9 1.0
# 3 2 0.5 0.7 1.0 0.4 0.1
# 1 2 1.0 0.2 0.5 0.8 0.4

我不得不将第二列提取为df[,2],因为你给出了相同的列名 - 这样做不好,参考@jay.sf的答案。

要按照所有列进行排序,你可以创建一个包含所有列向量的列表,并使用do.call将该列表作为order函数的参数:

neworder <- do.call("order", as.list(df))
df[neworder, ]
英文:

Function order can take more than one vector to order after as an argument. If you order(v1, v2, v3), then it will first order according to v1 and then resolve ties with v2 and so on.

This is how to order after columns g and then the first column:

neworder &lt;- order(df$g, df[,2])
df[neworder, ]
# g   2   1   2   1   2
# 2 1 0.2 1.0 0.7 0.6 0.3
# 4 1 0.8 0.6 0.4 1.0 0.9
# 5 2 0.4 0.3 0.1 0.9 1.0
# 3 2 0.5 0.7 1.0 0.4 0.1
# 1 2 1.0 0.2 0.5 0.8 0.4

I had to extract the second column as df[,2], because you gave identical column names - not good, see @jay.sf's answer.

To order after all columns, you can create a list of all column vectors and use do.call to use that list as arguments of function order:

neworder &lt;- do.call(&quot;order&quot;, as.list(df))
df[neworder, ]

huangapple
  • 本文由 发表于 2023年7月27日 14:54:27
  • 转载请务必保留本文链接:https://go.coder-hub.com/76777162.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定