2023年7月27日 14:54:27go评论122阅读模式

英文:

Reorder matrix rows and columns simultaneously by a group variable

问题

考虑下面的数据"df"，其中包含了一个名为"g"的分组变量。我想找到一种按照"g"的顺序对数据进行行和列排序的方法。

g <- c(2, 1, 2, 1, 2)
df1 <- c(1, 0.2, 0.5, 0.8, 0.4)
df2 <- c(0.2, 1, 0.7, 0.6, 0.3)
df3 <- c(0.5, 0.7, 1, 0.4, 0.1)
df4 <- c(0.8, 0.6, 0.4, 1, 0.9)
df5 <- c(0.4, 0.3, 0.1, 0.9, 1)
df <- data.frame(g, df1, df2, df3, df4, df5)
colnames(df) <- c("g", 2, 1, 2, 1, 2)
df

这是一个大型数据集的示例。我使用R中的sort函数按照以下方式执行了任务：

r.df <- df[, sort.list(df[1,])]
s.df <- r.df[sort.list(r.df[, 1]),]

首先按照g的顺序（即第一行）对行进行排序，然后按照g的顺序（即第一列）对列进行排序。然而，第二个步骤会扭曲行的顺序。我希望得到类似下面的结果：

g 1   1   2   2   2
1 1.0 0.6 0.2 0.7 0.3
1 0.6 1.0 0.8 0.4 0.9
2 0.2 0.8 1.0 0.5 0.4
2 0.7 0.4 0.5 1.0 0.1
2 0.3 0.9 0.4 0.1 1.0

非常感谢任何帮助。

英文:

Consider the data "df" below with group variable "g". I am trying to find a way to sort the data by row and column according to the order of "g".

g &lt;- c(2, 1, 2, 1, 2)
df1 &lt;- c(1, 0.2, 0.5, 0.8, 0.4)
df2 &lt;- c(0.2, 1, 0.7, 0.6, 0.3)
df3 &lt;- c( 0.5, 0.7, 1, 0.4, 0.1) 
df4 &lt;- c(0.8, 0.6, 0.4, 1, 0.9) 
df5 &lt;- c(0.4, 0.3, 0.1, 0.9, 1) 
df &lt;- data.frame(g, df1, df2, df3, df4, df5)
colnames(df) &lt;- c(&quot;g&quot;, 2, 1, 2, 1, 2)
df

This is an example from a large data set. Using the sort function in R I performed the task as follows

r.df &lt;- df[,sort.list(df[1,])]
s.df &lt;- r.df[sort.list(r.df[,1]),]

by ordering the rows first according to g (i.e., first row) and then columns next by g (i.e., first column). However, the second stage distorts the order of the rows. I expect to have something like

g 1   1   2   2   2
1 1.0 0.6 0.2 0.7 0.3
1 0.6 1.0 0.8 0.4 0.9
2 0.2 0.8 1.0 0.5 0.4
2 0.7 0.4 0.5 1.0 0.1
2 0.3 0.9 0.4 0.1 1.0

Any help is hugely appreciated.

答案1

得分: 1

尝试这样做。

df[order(df$g), c(1L, order(colnames(df)[-1L]) + 1L)]
#   g   1 1.1   2 2.1 2.2
# 2 1 1.0 0.6 0.2 0.7 0.3
# 4 1 0.6 1.0 0.8 0.4 0.9
# 1 2 0.2 0.8 1.0 0.5 0.4
# 3 2 0.7 0.4 0.5 1.0 0.1
# 5 2 0.3 0.9 0.4 0.1 1.0

请注意，你的列名无效，可能存在重复。

英文:

Try this.

df[order(df$g), c(1L, order(colnames(df)[-1L]) + 1L)]
#   g   1 1.1   2 2.1 2.2
# 2 1 1.0 0.6 0.2 0.7 0.3
# 4 1 0.6 1.0 0.8 0.4 0.9
# 1 2 0.2 0.8 1.0 0.5 0.4
# 3 2 0.7 0.4 0.5 1.0 0.1
# 5 2 0.3 0.9 0.4 0.1 1.0

Note that your column names are invalid, may not be duplicated.

答案2

得分: 0

函数order可以接受多个向量作为参数进行排序。如果你使用order(v1, v2, v3)，那么它将首先按照v1进行排序，然后使用v2解决相同值的情况，依此类推。

以下是按照列g和第一列进行排序的方法：

neworder <- order(df$g, df[,2])
df[neworder, ]
# g   2   1   2   1   2
# 2 1 0.2 1.0 0.7 0.6 0.3
# 4 1 0.8 0.6 0.4 1.0 0.9
# 5 2 0.4 0.3 0.1 0.9 1.0
# 3 2 0.5 0.7 1.0 0.4 0.1
# 1 2 1.0 0.2 0.5 0.8 0.4

我不得不将第二列提取为df[,2]，因为你给出了相同的列名 - 这样做不好，参考@jay.sf的答案。

要按照所有列进行排序，你可以创建一个包含所有列向量的列表，并使用do.call将该列表作为order函数的参数：

neworder <- do.call("order", as.list(df))
df[neworder, ]

英文:

Function order can take more than one vector to order after as an argument. If you order(v1, v2, v3), then it will first order according to v1 and then resolve ties with v2 and so on.

This is how to order after columns g and then the first column:

neworder &lt;- order(df$g, df[,2])
df[neworder, ]
# g   2   1   2   1   2
# 2 1 0.2 1.0 0.7 0.6 0.3
# 4 1 0.8 0.6 0.4 1.0 0.9
# 5 2 0.4 0.3 0.1 0.9 1.0
# 3 2 0.5 0.7 1.0 0.4 0.1
# 1 2 1.0 0.2 0.5 0.8 0.4

I had to extract the second column as df[,2], because you gave identical column names - not good, see @jay.sf's answer.

To order after all columns, you can create a list of all column vectors and use do.call to use that list as arguments of function order:

neworder &lt;- do.call(&quot;order&quot;, as.list(df))
df[neworder, ]

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

同时按照分组变量重新排序矩阵的行和列。

问题

答案1

答案2

Pandas: 如何提高性能，比较组内的行

MongoDB排序文档不起作用，尝试对数据进行排序。

按时间阈值在R中计算真值、假值和总和值。

如何根据匹配多列从另一个数据框中替换 NA 值

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。