在R中合并矩阵的行/列名称

huangapple go评论104阅读模式
英文:

Combine matrix row / column names in R

问题

I have multiple matrices reflecting bipartite / affiliation networks at different time points. These matrices have a lot of overlap in their incumbents, but also a lot of differences. For further analysis, however, I need them to be the same dimensions and have the same actors per row/column, so I need to combine row and column names somehow.

The final matrices will be around 8000 times 200, but each individual matrix is around 2000 times 150. Here is an example of two matrices and how I want the result to look like:

  1. adj1 <- matrix(0, 3, 5)
  2. colnames(adj1) <- c("g1", "g2", "g3", "g5", "g6")
  3. rownames(adj1) <- c("Tim", "John", "Sarah")
  4. adj2 <- matrix(0, 4, 2)
  5. colnames(adj2) <- c("g1", "g4")
  6. rownames(adj2) <- c("Tim", "Mary", "John", "Paolo")
  7. combined_adj <- matrix(0,5,6)
  8. colnames(combined_adj) <- c("g1","g2","g3","g4","g5","g6")
  9. rownames(combined_adj) <- c("John","Mary","Paolo","Sarah","Tim")

Ideally, the new cells should read "NA" or "10" and rows and columns would be ordered alphabetically. The initial values in each matrix need to be kept. I am at a loss of what to do here and appreciate any help!

英文:

I have multiple matrices reflecting bipartite / affiliation networks at different time points. These matrices have a lot of overlap in their incumbents, but also a lot of differences. For further analysis, however, I need them to be the same dimensions and have the same actors per row/column, so I need to combine row and column names somehow.

The final matrices will be around 8000 times 200, but each individual matrix is around 2000 times 150. Here is an example of two matrices and how I want the result to look like:

  1. adj1 &lt;- matrix(0, 3, 5)
  2. colnames(adj1) &lt;- c(&quot;g1&quot;, &quot;g2&quot;, &quot;g3&quot;, &quot;g5&quot;, &quot;g6&quot;)
  3. rownames(adj1) &lt;- c(&quot;Tim&quot;, &quot;John&quot;, &quot;Sarah&quot;)
  4. adj2 &lt;- matrix(0, 4, 2)
  5. colnames(adj2) &lt;- c(&quot;g1&quot;, &quot;g4&quot;)
  6. rownames(adj2) &lt;- c(&quot;Tim&quot;, &quot;Mary&quot;, &quot;John&quot;, &quot;Paolo&quot;)
  7. combined_adj &lt;- matrix(0,5,6)
  8. colnames(combined_adj) &lt;- c(&quot;g1&quot;,&quot;g2&quot;,&quot;g3&quot;,&quot;g4&quot;,&quot;g5&quot;,&quot;g6&quot;)
  9. rownames(combined_adj) &lt;- c(&quot;John&quot;,&quot;Mary&quot;,&quot;Paolo&quot;,&quot;Sarah&quot;,&quot;Tim&quot;)

Ideally, the new cells should read "NA" or "10" and rows and columns would be ordered alphabetically. The initial values in each matrix need to be kept. I am at a loss of what to do here and appreciate any help!

答案1

得分: 3

你可以使用merge并指定你想要使用row.names来进行合并。

  1. combined_adj <- merge(x = adj1,
  2. y = adj2,
  3. by = c('row.names',
  4. intersect(colnames(adj1),
  5. colnames(adj2))
  6. ),
  7. all = TRUE
  8. )
  9. combined_adj
  10. Row.names g1 g2 g3 g5 g6 g4
  11. 1 John 0 0 0 0 0 0
  12. 2 Mary 0 NA NA NA NA 0
  13. 3 Paolo 0 NA NA NA NA 0
  14. 4 Sarah 0 0 0 0 0 NA
  15. 5 Tim 0 0 0 0 0 0

这将其转换为一个数据框,如果需要,你需要将其转换回矩阵。

  1. row.names(combined_adj) <- combined_adj[,1]
  2. combined_adj <- combined_adj[,-1]

编辑:合并多个矩阵

我们使用Reduce来应用它到所有矩阵上。但是首先需要转换为数据框,并创建一个包含row_names的列以简化操作。

  1. # 创建示例数据
  2. adj1 <- matrix(
  3. 0, 3, 5,
  4. dimnames = list(c("Tim", "John", "Sarah"),
  5. c("g1", "g2", "g3", "g5", "g6"))
  6. )
  7. adj2 <- matrix(
  8. 0, 4, 2,
  9. dimnames = list(c("Tim", "Mary", "John", "Paolo"),
  10. c("g1", "g4"))
  11. )
  12. adj3 <- matrix(
  13. 0, 3, 3,
  14. dimnames = list(c("Tim2", "Mary2", "John"), c("g1", "g4", 'g7'))
  15. )
  16. # 创建一个列表
  17. list_matrices <- list(adj1, adj2, adj3)
  18. # 转换为数据框并创建包含row.names的列
  19. list_matrices <- lapply(list_matrices, function(mat){
  20. mat <- as.data.frame(mat)
  21. mat$row_names <- row.names(mat)
  22. mat
  23. })
  24. # 依次组合它们,首先合并1和2,然后将结果与3合并,以此类推
  25. res <- Reduce(function(mat1, mat2) merge(mat1, mat2, all = TRUE), x = list_matrices)
  26. res
  27. g1 row_names g4 g2 g3 g5 g6 g7
  28. 1 0 John 0 0 0 0 0 0
  29. 2 0 Mary 0 NA NA NA NA NA
  30. 3 0 Mary2 0 NA NA NA NA 0
  31. 4 0 Paolo 0 NA NA NA NA NA
  32. 5 0 Sarah NA 0 0 0 0 NA
  33. 6 0 Tim 0 0 0 0 0 NA
  34. 7 0 Tim2 0 NA NA NA NA 0

希望这些翻译对你有帮助。

英文:

You can use merge and specify that you want to use row.names for merging as well.

  1. combined_adj &lt;- merge(x = adj1,
  2. y = adj2,
  3. by = c(&#39;row.names&#39;,
  4. intersect(colnames(adj1),
  5. colnames(adj2))
  6. ),
  7. all = TRUE
  8. )
  9. combined_adj
  10. Row.names g1 g2 g3 g5 g6 g4
  11. 1 John 0 0 0 0 0 0
  12. 2 Mary 0 NA NA NA NA 0
  13. 3 Paolo 0 NA NA NA NA 0
  14. 4 Sarah 0 0 0 0 0 NA
  15. 5 Tim 0 0 0 0 0 0

This turns it into a data.frame, so you will need to convert it back to a matrix if required.

  1. row.names(combined_adj) &lt;- combined_adj[,1]
  2. combined_adj &lt;- combined_adj[,-1]

Edit: Merge multiple matrices

We use Reduce to apply it over all matrices. We first convert to data.frame however and create a column with row_names to simplify things.

  1. # create sample data
  2. adj1 &lt;- matrix(
  3. 0, 3, 5,
  4. dimnames = list(c(&quot;Tim&quot;, &quot;John&quot;, &quot;Sarah&quot;),
  5. c(&quot;g1&quot;, &quot;g2&quot;, &quot;g3&quot;, &quot;g5&quot;, &quot;g6&quot;))
  6. )
  7. adj2 &lt;- matrix(
  8. 0, 4, 2,
  9. dimnames = list(c(&quot;Tim&quot;, &quot;Mary&quot;, &quot;John&quot;, &quot;Paolo&quot;),
  10. c(&quot;g1&quot;, &quot;g4&quot;))
  11. )
  12. adj3 &lt;- matrix(
  13. 0, 3, 3,
  14. dimnames = list(c(&quot;Tim2&quot;, &quot;Mary2&quot;, &quot;John&quot;), c(&quot;g1&quot;, &quot;g4&quot;, &#39;g7&#39;))
  15. )
  16. # create a list
  17. list_matrices &lt;- list(adj1, adj2, adj3)
  18. # convert to dataframes and create a column with row.names
  19. list_matrices &lt;- lapply(list_matrices, function(mat){
  20. mat &lt;- as.data.frame(mat)
  21. mat$row_names &lt;- row.names(mat)
  22. mat
  23. })
  24. # successively combine them, merge 1..2 and then merge result with 3 and so on
  25. res &lt;- Reduce(function(mat1, mat2) merge(mat1, mat2, all = TRUE), x = list_matrices)
  26. res
  27. g1 row_names g4 g2 g3 g5 g6 g7
  28. 1 0 John 0 0 0 0 0 0
  29. 2 0 Mary 0 NA NA NA NA NA
  30. 3 0 Mary2 0 NA NA NA NA 0
  31. 4 0 Paolo 0 NA NA NA NA NA
  32. 5 0 Sarah NA 0 0 0 0 NA
  33. 6 0 Tim 0 0 0 0 0 NA
  34. 7 0 Tim2 0 NA NA NA NA 0

答案2

得分: 1

这可能是一个解决方案。但是,我假设这些单元格中存在的信息对于相同的行名称和列名称组合始终相同。此外,它依赖于 dplyr

  1. require(tidyverse)
  2. list_adj <- list(
  3. adj1, adj2
  4. )
  5. df.adj <- NULL
  6. for (adj in list_adj) {
  7. df.adj.temp <- adj %>% as_tibble(rownames = "row_names")
  8. if (is.null(df.adj)) {
  9. df.adj <- df.adj.temp
  10. } else {
  11. c.colnames.join.by <- c(intersect(colnames(df.adj), colnames(df.adj.temp)))
  12. df.adj <- df.adj %>%
  13. full_join(df.adj.temp, by = c.colnames.join.by) %>%
  14. mutate(across(.cols = - row_names, .fns = \(x) replace_na(x, 10)))
  15. }
  16. }
  17. df.adj %>%
  18. arrange(row_names) %>% # ordering rows
  19. select(all_of(sort(colnames(df.adj)))) %>% # ordering columns
  20. column_to_rownames(var = "row_names") %>%
  21. as.matrix()

输出

  1. g1 g2 g3 g5 g6 g4

John 0 0 0 0 0 0
Mary 0 10 10 10 10 0
Paolo 0 10 10 10 10 0
Sarah 0 0 0 0 0 10
Tim 0 0 0 0 0 0

  1. <details>
  2. <summary>英文:</summary>
  3. This could be one solution. However, I am assuming that the information that does exist in these cells is always the same for the same combination of row name and column name. In addition to this, it relies on `dplyr`:
  4. require(tidyverse)
  5. list_adj &lt;- list(
  6. adj1, adj2
  7. )
  8. df.adj &lt;- NULL
  9. for (adj in list_adj) {
  10. df.adj.temp &lt;- adj %&gt;% as_tibble(rownames = &quot;row_names&quot;)
  11. if (is.null(df.adj)) {
  12. df.adj &lt;- df.adj.temp
  13. } else {
  14. c.colnames.join.by &lt;- c(intersect(colnames(df.adj), colnames(df.adj.temp)))
  15. df.adj &lt;- df.adj %&gt;%
  16. full_join(df.adj.temp, by = c.colnames.join.by) %&gt;%
  17. mutate(across(.cols = - row_names, .fns = \(x) replace_na(x, 10)))
  18. }
  19. }
  20. df.adj %&gt;%
  21. arrange(row_names) %&gt;% # ordering rows
  22. select(all_of(sort(colnames(df.adj)))) %&gt;% # ordering columns
  23. column_to_rownames(var = &quot;row_names&quot;) %&gt;%
  24. as.matrix()
  25. # output
  26. g1 g2 g3 g5 g6 g4
  27. John 0 0 0 0 0 0
  28. Mary 0 10 10 10 10 0
  29. Paolo 0 10 10 10 10 0
  30. Sarah 0 0 0 0 0 10
  31. Tim 0 0 0 0 0 0
  32. </details>
  33. # 答案3
  34. **得分**: 0
  35. 以下是使用基本的R选项`reshape`进行的翻译:
  36. ```R
  37. df <- unique(
  38. rbind(
  39. as.data.frame(as.table(adj1)),
  40. as.data.frame(as.table(adj2))
  41. )
  42. )
  43. reshape(
  44. df,
  45. direction = "wide",
  46. idvar = "Var1",
  47. timevar = "Var2"
  48. )

得到的结果如下:

  1. Var1 Freq.g1 Freq.g2 Freq.g3 Freq.g5 Freq.g6 Freq.g4
  2. 1 Tim 0 0 0 0 0 0
  3. 2 John 0 0 0 0 0 0
  4. 3 Sarah 0 0 0 0 0 NA
  5. 17 Mary 0 NA NA NA NA 0
  6. 19 Paolo 0 NA NA NA NA 0

或者,我们使用xtabs的方法:

  1. mat <- `class<-`(xtabs(Freq ~ ., df) * NA, "matrix")
  2. mat[as.matrix(df[-3])] <- df$Freq

得到的结果如下:

  1. > mat
  2. Var2
  3. Var1 g1 g2 g3 g5 g6 g4
  4. Tim 0 0 0 0 0 0
  5. John 0 0 0 0 0 0
  6. Sarah 0 0 0 0 0 NA
  7. Mary 0 NA NA NA NA 0
  8. Paolo 0 NA NA NA NA 0
  9. attr(,"call")
  10. xtabs(formula = Freq ~ ., data = df)
英文:

Here is a base R option with reshape

  1. df &lt;- unique(
  2. rbind(
  3. as.data.frame(as.table(adj1)),
  4. as.data.frame(as.table(adj2))
  5. )
  6. )
  7. reshape(
  8. df,
  9. direction = &quot;wide&quot;,
  10. idvar = &quot;Var1&quot;,
  11. timevar = &quot;Var2&quot;
  12. )

which gives

  1. Var1 Freq.g1 Freq.g2 Freq.g3 Freq.g5 Freq.g6 Freq.g4
  2. 1 Tim 0 0 0 0 0 0
  3. 2 John 0 0 0 0 0 0
  4. 3 Sarah 0 0 0 0 0 NA
  5. 17 Mary 0 NA NA NA NA 0
  6. 19 Paolo 0 NA NA NA NA 0

Or, we use xtabs

  1. mat &lt;- `class&lt;-`(xtabs(Freq ~ ., df) * NA, &quot;matrix&quot;)
  2. mat[as.matrix(df[-3])] &lt;- df$Freq

which gives

  1. &gt; mat
  2. Var2
  3. Var1 g1 g2 g3 g5 g6 g4
  4. Tim 0 0 0 0 0 0
  5. John 0 0 0 0 0 0
  6. Sarah 0 0 0 0 0 NA
  7. Mary 0 NA NA NA NA 0
  8. Paolo 0 NA NA NA NA 0
  9. attr(,&quot;call&quot;)
  10. xtabs(formula = Freq ~ ., data = df)

huangapple
  • 本文由 发表于 2023年5月6日 18:11:35
  • 转载请务必保留本文链接:https://go.coder-hub.com/76188324.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定