在R中合并矩阵的行/列名称

huangapple go评论70阅读模式
英文:

Combine matrix row / column names in R

问题

I have multiple matrices reflecting bipartite / affiliation networks at different time points. These matrices have a lot of overlap in their incumbents, but also a lot of differences. For further analysis, however, I need them to be the same dimensions and have the same actors per row/column, so I need to combine row and column names somehow.

The final matrices will be around 8000 times 200, but each individual matrix is around 2000 times 150. Here is an example of two matrices and how I want the result to look like:

adj1 <- matrix(0, 3, 5)
colnames(adj1) <- c("g1", "g2", "g3", "g5", "g6")
rownames(adj1) <- c("Tim", "John", "Sarah")

adj2 <- matrix(0, 4, 2)
colnames(adj2) <- c("g1", "g4")
rownames(adj2) <- c("Tim", "Mary", "John", "Paolo")

combined_adj <- matrix(0,5,6)
colnames(combined_adj) <- c("g1","g2","g3","g4","g5","g6")
rownames(combined_adj) <- c("John","Mary","Paolo","Sarah","Tim")

Ideally, the new cells should read "NA" or "10" and rows and columns would be ordered alphabetically. The initial values in each matrix need to be kept. I am at a loss of what to do here and appreciate any help!

英文:

I have multiple matrices reflecting bipartite / affiliation networks at different time points. These matrices have a lot of overlap in their incumbents, but also a lot of differences. For further analysis, however, I need them to be the same dimensions and have the same actors per row/column, so I need to combine row and column names somehow.

The final matrices will be around 8000 times 200, but each individual matrix is around 2000 times 150. Here is an example of two matrices and how I want the result to look like:

adj1 &lt;- matrix(0, 3, 5)
colnames(adj1) &lt;- c(&quot;g1&quot;, &quot;g2&quot;, &quot;g3&quot;, &quot;g5&quot;, &quot;g6&quot;)
rownames(adj1) &lt;- c(&quot;Tim&quot;, &quot;John&quot;, &quot;Sarah&quot;)

adj2 &lt;- matrix(0, 4, 2)
colnames(adj2) &lt;- c(&quot;g1&quot;, &quot;g4&quot;)
rownames(adj2) &lt;- c(&quot;Tim&quot;, &quot;Mary&quot;, &quot;John&quot;, &quot;Paolo&quot;)

combined_adj &lt;- matrix(0,5,6)
colnames(combined_adj) &lt;- c(&quot;g1&quot;,&quot;g2&quot;,&quot;g3&quot;,&quot;g4&quot;,&quot;g5&quot;,&quot;g6&quot;)
rownames(combined_adj) &lt;- c(&quot;John&quot;,&quot;Mary&quot;,&quot;Paolo&quot;,&quot;Sarah&quot;,&quot;Tim&quot;)

Ideally, the new cells should read "NA" or "10" and rows and columns would be ordered alphabetically. The initial values in each matrix need to be kept. I am at a loss of what to do here and appreciate any help!

答案1

得分: 3

你可以使用merge并指定你想要使用row.names来进行合并。

combined_adj <- merge(x = adj1,
      y = adj2,
      by = c('row.names', 
             intersect(colnames(adj1), 
                       colnames(adj2))
             ), 
      all = TRUE
)
combined_adj
  Row.names g1 g2 g3 g5 g6 g4
1      John  0  0  0  0  0  0
2      Mary  0 NA NA NA NA  0
3     Paolo  0 NA NA NA NA  0
4     Sarah  0  0  0  0  0 NA
5       Tim  0  0  0  0  0  0

这将其转换为一个数据框,如果需要,你需要将其转换回矩阵。

row.names(combined_adj) <- combined_adj[,1]
combined_adj <- combined_adj[,-1]

编辑:合并多个矩阵

我们使用Reduce来应用它到所有矩阵上。但是首先需要转换为数据框,并创建一个包含row_names的列以简化操作。

# 创建示例数据
adj1 <- matrix(
  0, 3, 5,
  dimnames = list(c("Tim", "John", "Sarah"), 
                  c("g1", "g2", "g3", "g5", "g6"))
)

adj2 <- matrix(
  0, 4, 2, 
  dimnames = list(c("Tim", "Mary", "John", "Paolo"),
                  c("g1", "g4"))
)

adj3 <- matrix(
  0, 3, 3, 
  dimnames = list(c("Tim2", "Mary2", "John"), c("g1", "g4", 'g7'))
)

# 创建一个列表 
list_matrices <- list(adj1, adj2, adj3)

# 转换为数据框并创建包含row.names的列
list_matrices <- lapply(list_matrices, function(mat){
  mat <- as.data.frame(mat)
  mat$row_names <- row.names(mat)
  mat
})

# 依次组合它们,首先合并1和2,然后将结果与3合并,以此类推
res <- Reduce(function(mat1, mat2) merge(mat1, mat2, all = TRUE), x = list_matrices)

res
  g1 row_names g4 g2 g3 g5 g6 g7
1  0      John  0  0  0  0  0  0
2  0      Mary  0 NA NA NA NA NA
3  0     Mary2  0 NA NA NA NA  0
4  0     Paolo  0 NA NA NA NA NA
5  0     Sarah NA  0  0  0  0 NA
6  0       Tim  0  0  0  0  0 NA
7  0      Tim2  0 NA NA NA NA  0

希望这些翻译对你有帮助。

英文:

You can use merge and specify that you want to use row.names for merging as well.

combined_adj &lt;- merge(x = adj1,
      y = adj2,
      by = c(&#39;row.names&#39;, 
             intersect(colnames(adj1), 
                       colnames(adj2))
             ), 
      all = TRUE
)
combined_adj
  Row.names g1 g2 g3 g5 g6 g4
1      John  0  0  0  0  0  0
2      Mary  0 NA NA NA NA  0
3     Paolo  0 NA NA NA NA  0
4     Sarah  0  0  0  0  0 NA
5       Tim  0  0  0  0  0  0

This turns it into a data.frame, so you will need to convert it back to a matrix if required.

row.names(combined_adj) &lt;- combined_adj[,1]
combined_adj &lt;- combined_adj[,-1]

Edit: Merge multiple matrices

We use Reduce to apply it over all matrices. We first convert to data.frame however and create a column with row_names to simplify things.

# create sample data
adj1 &lt;- matrix(
  0, 3, 5,
  dimnames = list(c(&quot;Tim&quot;, &quot;John&quot;, &quot;Sarah&quot;), 
                  c(&quot;g1&quot;, &quot;g2&quot;, &quot;g3&quot;, &quot;g5&quot;, &quot;g6&quot;))
)

adj2 &lt;- matrix(
  0, 4, 2, 
  dimnames = list(c(&quot;Tim&quot;, &quot;Mary&quot;, &quot;John&quot;, &quot;Paolo&quot;),
                  c(&quot;g1&quot;, &quot;g4&quot;))
)

adj3 &lt;- matrix(
  0, 3, 3, 
  dimnames = list(c(&quot;Tim2&quot;, &quot;Mary2&quot;, &quot;John&quot;), c(&quot;g1&quot;, &quot;g4&quot;, &#39;g7&#39;))
)

# create a list 
list_matrices &lt;- list(adj1, adj2, adj3)

# convert to dataframes and create a column with row.names
list_matrices &lt;- lapply(list_matrices, function(mat){
  mat &lt;- as.data.frame(mat)
  mat$row_names &lt;- row.names(mat)
  mat
})

# successively combine them, merge 1..2 and then merge result with 3 and so on
res &lt;- Reduce(function(mat1, mat2) merge(mat1, mat2, all = TRUE), x = list_matrices)

res
  g1 row_names g4 g2 g3 g5 g6 g7
1  0      John  0  0  0  0  0  0
2  0      Mary  0 NA NA NA NA NA
3  0     Mary2  0 NA NA NA NA  0
4  0     Paolo  0 NA NA NA NA NA
5  0     Sarah NA  0  0  0  0 NA
6  0       Tim  0  0  0  0  0 NA
7  0      Tim2  0 NA NA NA NA  0

答案2

得分: 1

这可能是一个解决方案。但是,我假设这些单元格中存在的信息对于相同的行名称和列名称组合始终相同。此外,它依赖于 dplyr

require(tidyverse)

list_adj <- list(
  adj1, adj2
)

df.adj <- NULL

for (adj in list_adj) {
  df.adj.temp <- adj %>% as_tibble(rownames = "row_names")
  
  if (is.null(df.adj)) {
    df.adj <- df.adj.temp
  } else {
    c.colnames.join.by <- c(intersect(colnames(df.adj), colnames(df.adj.temp)))
    
    df.adj <- df.adj %>% 
      full_join(df.adj.temp, by = c.colnames.join.by) %>%
      mutate(across(.cols = - row_names, .fns = \(x) replace_na(x, 10)))
  }
}

df.adj %>% 
  arrange(row_names) %>% # ordering rows
  select(all_of(sort(colnames(df.adj)))) %>% # ordering columns
  column_to_rownames(var = "row_names") %>% 
  as.matrix()

输出

  g1 g2 g3 g5 g6 g4

John 0 0 0 0 0 0
Mary 0 10 10 10 10 0
Paolo 0 10 10 10 10 0
Sarah 0 0 0 0 0 10
Tim 0 0 0 0 0 0


<details>
<summary>英文:</summary>

This could be one solution. However, I am assuming that the information that does exist in these cells is always the same for the same combination of row name and column name. In addition to this, it relies on `dplyr`:

    require(tidyverse)

    list_adj &lt;- list(
      adj1, adj2
    )
    
    df.adj &lt;- NULL
    
    for (adj in list_adj) {
      df.adj.temp &lt;- adj %&gt;% as_tibble(rownames = &quot;row_names&quot;)
      
      if (is.null(df.adj)) {
        df.adj &lt;- df.adj.temp
      } else {
        c.colnames.join.by &lt;- c(intersect(colnames(df.adj), colnames(df.adj.temp)))
        
        df.adj &lt;- df.adj %&gt;% 
          full_join(df.adj.temp, by = c.colnames.join.by) %&gt;% 
          mutate(across(.cols = - row_names, .fns = \(x) replace_na(x, 10)))
      }
    }
    
    df.adj %&gt;% 
      arrange(row_names) %&gt;% # ordering rows
      select(all_of(sort(colnames(df.adj)))) %&gt;% # ordering columns
      column_to_rownames(var = &quot;row_names&quot;) %&gt;% 
      as.matrix()
    
    # output
          g1 g2 g3 g5 g6 g4
    John   0  0  0  0  0  0
    Mary   0 10 10 10 10  0
    Paolo  0 10 10 10 10  0
    Sarah  0  0  0  0  0 10
    Tim    0  0  0  0  0  0

</details>



# 答案3
**得分**: 0

以下是使用基本的R选项`reshape`进行的翻译:

```R
df <- unique(
    rbind(
        as.data.frame(as.table(adj1)),
        as.data.frame(as.table(adj2))
    )
)

reshape(
    df,
    direction = "wide",
    idvar = "Var1",
    timevar = "Var2"
)

得到的结果如下:

    Var1 Freq.g1 Freq.g2 Freq.g3 Freq.g5 Freq.g6 Freq.g4
1    Tim       0       0       0       0       0       0
2   John       0       0       0       0       0       0
3  Sarah       0       0       0       0       0      NA
17  Mary       0      NA      NA      NA      NA       0
19 Paolo       0      NA      NA      NA      NA       0

或者,我们使用xtabs的方法:

mat <- `class<-`(xtabs(Freq ~ ., df) * NA, "matrix")
mat[as.matrix(df[-3])] <- df$Freq

得到的结果如下:

> mat
       Var2
Var1    g1 g2 g3 g5 g6 g4
  Tim    0  0  0  0  0  0
  John   0  0  0  0  0  0
  Sarah  0  0  0  0  0 NA
  Mary   0 NA NA NA NA  0
  Paolo  0 NA NA NA NA  0
attr(,"call")
xtabs(formula = Freq ~ ., data = df)
英文:

Here is a base R option with reshape

df &lt;- unique(
    rbind(
        as.data.frame(as.table(adj1)),
        as.data.frame(as.table(adj2))
    )
)

reshape(
    df,
    direction = &quot;wide&quot;,
    idvar = &quot;Var1&quot;,
    timevar = &quot;Var2&quot;
)

which gives

    Var1 Freq.g1 Freq.g2 Freq.g3 Freq.g5 Freq.g6 Freq.g4
1    Tim       0       0       0       0       0       0
2   John       0       0       0       0       0       0
3  Sarah       0       0       0       0       0      NA
17  Mary       0      NA      NA      NA      NA       0
19 Paolo       0      NA      NA      NA      NA       0

Or, we use xtabs

mat &lt;- `class&lt;-`(xtabs(Freq ~ ., df) * NA, &quot;matrix&quot;)
mat[as.matrix(df[-3])] &lt;- df$Freq

which gives

&gt; mat
       Var2
Var1    g1 g2 g3 g5 g6 g4
  Tim    0  0  0  0  0  0
  John   0  0  0  0  0  0
  Sarah  0  0  0  0  0 NA
  Mary   0 NA NA NA NA  0
  Paolo  0 NA NA NA NA  0
attr(,&quot;call&quot;)
xtabs(formula = Freq ~ ., data = df)

huangapple
  • 本文由 发表于 2023年5月6日 18:11:35
  • 转载请务必保留本文链接:https://go.coder-hub.com/76188324.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定