
huangapple go评论108阅读模式

Reorder matrix rows and columns simultaneously by a group variable



g <- c(2, 1, 2, 1, 2)
df1 <- c(1, 0.2, 0.5, 0.8, 0.4)
df2 <- c(0.2, 1, 0.7, 0.6, 0.3)
df3 <- c(0.5, 0.7, 1, 0.4, 0.1)
df4 <- c(0.8, 0.6, 0.4, 1, 0.9)
df5 <- c(0.4, 0.3, 0.1, 0.9, 1)
df <- data.frame(g, df1, df2, df3, df4, df5)
colnames(df) <- c("g", 2, 1, 2, 1, 2)


r.df <- df[, sort.list(df[1,])]
s.df <- r.df[sort.list(r.df[, 1]),]


g 1   1   2   2   2
1 1.0 0.6 0.2 0.7 0.3
1 0.6 1.0 0.8 0.4 0.9
2 0.2 0.8 1.0 0.5 0.4
2 0.7 0.4 0.5 1.0 0.1
2 0.3 0.9 0.4 0.1 1.0



Consider the data "df" below with group variable "g". I am trying to find a way to sort the data by row and column according to the order of "g".

g &lt;- c(2, 1, 2, 1, 2)
df1 &lt;- c(1, 0.2, 0.5, 0.8, 0.4)
df2 &lt;- c(0.2, 1, 0.7, 0.6, 0.3)
df3 &lt;- c( 0.5, 0.7, 1, 0.4, 0.1) 
df4 &lt;- c(0.8, 0.6, 0.4, 1, 0.9) 
df5 &lt;- c(0.4, 0.3, 0.1, 0.9, 1) 
df &lt;- data.frame(g, df1, df2, df3, df4, df5)
colnames(df) &lt;- c(&quot;g&quot;, 2, 1, 2, 1, 2)

This is an example from a large data set. Using the sort function in R I performed the task as follows

r.df &lt;- df[,sort.list(df[1,])]
s.df &lt;- r.df[sort.list(r.df[,1]),]

by ordering the rows first according to g (i.e., first row) and then columns next by g (i.e., first column). However, the second stage distorts the order of the rows. I expect to have something like

g 1   1   2   2   2
1 1.0 0.6 0.2 0.7 0.3
1 0.6 1.0 0.8 0.4 0.9
2 0.2 0.8 1.0 0.5 0.4
2 0.7 0.4 0.5 1.0 0.1
2 0.3 0.9 0.4 0.1 1.0

Any help is hugely appreciated.


得分: 1


df[order(df$g), c(1L, order(colnames(df)[-1L]) + 1L)]
#   g   1 1.1   2 2.1 2.2
# 2 1 1.0 0.6 0.2 0.7 0.3
# 4 1 0.6 1.0 0.8 0.4 0.9
# 1 2 0.2 0.8 1.0 0.5 0.4
# 3 2 0.7 0.4 0.5 1.0 0.1
# 5 2 0.3 0.9 0.4 0.1 1.0



Try this.

df[order(df$g), c(1L, order(colnames(df)[-1L]) + 1L)]
#   g   1 1.1   2 2.1 2.2
# 2 1 1.0 0.6 0.2 0.7 0.3
# 4 1 0.6 1.0 0.8 0.4 0.9
# 1 2 0.2 0.8 1.0 0.5 0.4
# 3 2 0.7 0.4 0.5 1.0 0.1
# 5 2 0.3 0.9 0.4 0.1 1.0

Note that your column names are invalid, may not be duplicated.


得分: 0

函数order可以接受多个向量作为参数进行排序。如果你使用order(v1, v2, v3),那么它将首先按照v1进行排序,然后使用v2解决相同值的情况,依此类推。


neworder <- order(df$g, df[,2])
df[neworder, ]
# g   2   1   2   1   2
# 2 1 0.2 1.0 0.7 0.6 0.3
# 4 1 0.8 0.6 0.4 1.0 0.9
# 5 2 0.4 0.3 0.1 0.9 1.0
# 3 2 0.5 0.7 1.0 0.4 0.1
# 1 2 1.0 0.2 0.5 0.8 0.4

我不得不将第二列提取为df[,2],因为你给出了相同的列名 - 这样做不好,参考@jay.sf的答案。


neworder <- do.call("order", as.list(df))
df[neworder, ]

Function order can take more than one vector to order after as an argument. If you order(v1, v2, v3), then it will first order according to v1 and then resolve ties with v2 and so on.

This is how to order after columns g and then the first column:

neworder &lt;- order(df$g, df[,2])
df[neworder, ]
# g   2   1   2   1   2
# 2 1 0.2 1.0 0.7 0.6 0.3
# 4 1 0.8 0.6 0.4 1.0 0.9
# 5 2 0.4 0.3 0.1 0.9 1.0
# 3 2 0.5 0.7 1.0 0.4 0.1
# 1 2 1.0 0.2 0.5 0.8 0.4

I had to extract the second column as df[,2], because you gave identical column names - not good, see @jay.sf's answer.

To order after all columns, you can create a list of all column vectors and use do.call to use that list as arguments of function order:

neworder &lt;- do.call(&quot;order&quot;, as.list(df))
df[neworder, ]

  • 本文由 发表于 2023年7月27日 14:54:27
  • 转载请务必保留本文链接:https://go.coder-hub.com/76777162.html



:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:
