英文:
Reorder matrix rows and columns simultaneously by a group variable
问题
考虑下面的数据"df",其中包含了一个名为"g"的分组变量。我想找到一种按照"g"的顺序对数据进行行和列排序的方法。
g <- c(2, 1, 2, 1, 2)
df1 <- c(1, 0.2, 0.5, 0.8, 0.4)
df2 <- c(0.2, 1, 0.7, 0.6, 0.3)
df3 <- c(0.5, 0.7, 1, 0.4, 0.1)
df4 <- c(0.8, 0.6, 0.4, 1, 0.9)
df5 <- c(0.4, 0.3, 0.1, 0.9, 1)
df <- data.frame(g, df1, df2, df3, df4, df5)
colnames(df) <- c("g", 2, 1, 2, 1, 2)
df
这是一个大型数据集的示例。我使用R中的sort函数按照以下方式执行了任务:
r.df <- df[, sort.list(df[1,])]
s.df <- r.df[sort.list(r.df[, 1]),]
首先按照g的顺序(即第一行)对行进行排序,然后按照g的顺序(即第一列)对列进行排序。然而,第二个步骤会扭曲行的顺序。我希望得到类似下面的结果:
g 1 1 2 2 2
1 1.0 0.6 0.2 0.7 0.3
1 0.6 1.0 0.8 0.4 0.9
2 0.2 0.8 1.0 0.5 0.4
2 0.7 0.4 0.5 1.0 0.1
2 0.3 0.9 0.4 0.1 1.0
非常感谢任何帮助。
英文:
Consider the data "df" below with group variable "g". I am trying to find a way to sort the data by row and column according to the order of "g".
g <- c(2, 1, 2, 1, 2)
df1 <- c(1, 0.2, 0.5, 0.8, 0.4)
df2 <- c(0.2, 1, 0.7, 0.6, 0.3)
df3 <- c( 0.5, 0.7, 1, 0.4, 0.1)
df4 <- c(0.8, 0.6, 0.4, 1, 0.9)
df5 <- c(0.4, 0.3, 0.1, 0.9, 1)
df <- data.frame(g, df1, df2, df3, df4, df5)
colnames(df) <- c("g", 2, 1, 2, 1, 2)
df
This is an example from a large data set. Using the sort function in R I performed the task as follows
r.df <- df[,sort.list(df[1,])]
s.df <- r.df[sort.list(r.df[,1]),]
by ordering the rows first according to g (i.e., first row) and then columns next by g (i.e., first column). However, the second stage distorts the order of the rows. I expect to have something like
g 1 1 2 2 2
1 1.0 0.6 0.2 0.7 0.3
1 0.6 1.0 0.8 0.4 0.9
2 0.2 0.8 1.0 0.5 0.4
2 0.7 0.4 0.5 1.0 0.1
2 0.3 0.9 0.4 0.1 1.0
Any help is hugely appreciated.
答案1
得分: 1
尝试这样做。
df[order(df$g), c(1L, order(colnames(df)[-1L]) + 1L)]
# g 1 1.1 2 2.1 2.2
# 2 1 1.0 0.6 0.2 0.7 0.3
# 4 1 0.6 1.0 0.8 0.4 0.9
# 1 2 0.2 0.8 1.0 0.5 0.4
# 3 2 0.7 0.4 0.5 1.0 0.1
# 5 2 0.3 0.9 0.4 0.1 1.0
请注意,你的列名无效,可能存在重复。
英文:
Try this.
df[order(df$g), c(1L, order(colnames(df)[-1L]) + 1L)]
# g 1 1.1 2 2.1 2.2
# 2 1 1.0 0.6 0.2 0.7 0.3
# 4 1 0.6 1.0 0.8 0.4 0.9
# 1 2 0.2 0.8 1.0 0.5 0.4
# 3 2 0.7 0.4 0.5 1.0 0.1
# 5 2 0.3 0.9 0.4 0.1 1.0
Note that your column names are invalid, may not be duplicated.
答案2
得分: 0
函数order
可以接受多个向量作为参数进行排序。如果你使用order(v1, v2, v3)
,那么它将首先按照v1
进行排序,然后使用v2
解决相同值的情况,依此类推。
以下是按照列g
和第一列进行排序的方法:
neworder <- order(df$g, df[,2])
df[neworder, ]
# g 2 1 2 1 2
# 2 1 0.2 1.0 0.7 0.6 0.3
# 4 1 0.8 0.6 0.4 1.0 0.9
# 5 2 0.4 0.3 0.1 0.9 1.0
# 3 2 0.5 0.7 1.0 0.4 0.1
# 1 2 1.0 0.2 0.5 0.8 0.4
我不得不将第二列提取为df[,2]
,因为你给出了相同的列名 - 这样做不好,参考@jay.sf的答案。
要按照所有列进行排序,你可以创建一个包含所有列向量的列表,并使用do.call
将该列表作为order
函数的参数:
neworder <- do.call("order", as.list(df))
df[neworder, ]
英文:
Function order
can take more than one vector to order after as an argument. If you order(v1, v2, v3)
, then it will first order according to v1
and then resolve ties with v2
and so on.
This is how to order after columns g
and then the first column:
neworder <- order(df$g, df[,2])
df[neworder, ]
# g 2 1 2 1 2
# 2 1 0.2 1.0 0.7 0.6 0.3
# 4 1 0.8 0.6 0.4 1.0 0.9
# 5 2 0.4 0.3 0.1 0.9 1.0
# 3 2 0.5 0.7 1.0 0.4 0.1
# 1 2 1.0 0.2 0.5 0.8 0.4
I had to extract the second column as df[,2]
, because you gave identical column names - not good, see @jay.sf's answer.
To order after all columns, you can create a list of all column vectors and use do.call
to use that list as arguments of function order
:
neworder <- do.call("order", as.list(df))
df[neworder, ]
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论