
huangapple go评论107阅读模式

How to combine a list of data frame into a single dataframe using R?



A1 = data.frame(name = c("a1", "a3", "a5"), cor = c(1, 0.99, 0.93))
A2 = data.frame(name = c("a2", "a3", "a4"), cor = c(1, 0.94, 0.94))
A3 = data.frame(name = c("a3", "a1", "a2", "a6"), cor = c(1, 0.99, 0.94, 0.91))
myList = list(A1, A2, A3)






     a1   a2   a3   a4   a5   a6
a1 1.00 0.00 0.99 0.00 0.93 0.00
a2 0.00 1.00 0.94 0.94 0.00 0.00
a3 0.99 0.94 1.00 0.00 0.00 0.91
a4 0.00 0.94 0.00 1.00 0.00 0.00
a5 0.93 0.00 0.00 0.00 1.00 0.00
a6 0.00 0.00 0.91 0.00 0.00 1.00





I have a list of data frame

A1 = data.frame(name = c("a1", "a3", "a5"), cor = c(1, 0.99, 0.93))
A2 = data.frame(name = c("a2", "a3", "a4"), cor = c(1, 0.94, 0.94))
A3 = data.frame(name = c("a3", "a1", "a2", "a6"), cor = c(1, 0.99, 0.94, 0.91))
myList = list(A1, A2, A3)

Each data frame is a calculated correlation coefficient (CC).

For instance:

in A1, the CC between a1 and a1 is 1, between a1 and a3 is 0.99, and between a1 and a5 is 0.93;

in A2, the CC between a2 and a2 is 1, between a2 and a3 is 0.94, and between a2 and a4 is 0.94.

What I want to do is to combine these individual dataframe into a complete one like following:

     a1   a2   a3   a4   a5   a6
a1 1.00 0.00 0.99 0.00 0.93 0.00
a2 0.00 1.00 0.94 0.94 0.00 0.00
a3 0.99 0.94 1.00 0.00 0.00 0.91
a4 0.00 0.94 0.00 1.00 0.00 0.00
a5 0.93 0.00 0.00 0.00 1.00 0.00
a6 0.00 0.00 0.91 0.00 0.00 1.00

This corMatrix dataframe contains all the correlation information of the above data frames. If the correlation information of two variables are unknown, then 0 is used to represent their CC values, such as variable a1 and a2.

How can I do it?

Thanks a lot.


得分: 2


A1 = data.frame(name = c("a1", "a3", "a5"), cor = c(1, 0.99, 0.93))
A2 = data.frame(name = c("a2", "a3", "a4"), cor = c(1, 0.94, 0.94))
A3 = data.frame(name = c("a3", "a1", "a2", "a6"), cor = c(1, 0.99, 0.94, 0.91))
myList = list(A1, A2, A3)

names(myList) = c("a1", "a2", "a3")
myMatrix = dplyr::bind_rows(myList, .id = "name2") |>
  dplyr::mutate(name2 = factor(name2, levels = c("a1", "a2", "a3", "a4", "a5", "a6")),
                name = factor(name, levels = c("a1", "a2", "a3", "a4", "a5", "a6"))) |>
  tidyr::complete(name2, name, fill = list(cor = 0)) |>
  tidyr::pivot_wider(names_from = name2, values_from = cor) |>
  tibble::column_to_rownames("name") |>
diag(myMatrix) <- 1
myMatrix[upper.tri(myMatrix)] <- t(myMatrix)[upper.tri(myMatrix)]


     a1   a2   a3   a4   a5   a6
a1 1.00 0.00 0.99 0.00 0.93 0.00
a2 0.00 1.00 0.94 0.94 0.00 0.00
a3 0.99 0.94 1.00 0.00 0.00 0.91
a4 0.00 0.94 0.00 1.00 0.00 0.00
a5 0.93 0.00 0.00 0.00 1.00 0.00
a6 0.00 0.00 0.91 0.00 0.00 1.00


  • 给列表命名,以确保您知道它们是哪些相关性(如果列表较长,可以使用paste()来自动生成名称)
  • 将所有列表元素合并成一个数据框
  • 使用因子来填充所有可能的元素(如果需要,可以以编程方式完成)
  • 使用0填充缺失值
  • 切换到矩阵,对角线加1,并使对角线上下对称。

I believe this does what you're looking for, although it may not be the best way of doing this:

A1 = data.frame(name = c(&quot;a1&quot;, &quot;a3&quot;, &quot;a5&quot;), cor = c(1, 0.99, 0.93))
A2 = data.frame(name = c(&quot;a2&quot;, &quot;a3&quot;, &quot;a4&quot;), cor = c(1, 0.94, 0.94))
A3 = data.frame(name = c(&quot;a3&quot;, &quot;a1&quot;, &quot;a2&quot;, &quot;a6&quot;), cor = c(1, 0.99, 0.94, 0.91))
myList = list(A1, A2, A3)

names(myList) = c(&quot;a1&quot;, &quot;a2&quot;, &quot;a3&quot;)
myMatrix = dplyr::bind_rows(myList, .id = &quot;name2&quot;) |&gt; 
  dplyr::mutate(name2 = factor(name2, levels = c(&quot;a1&quot;, &quot;a2&quot;, &quot;a3&quot;, &quot;a4&quot;, &quot;a5&quot;, &quot;a6&quot;)),
                name = factor(name, levels = c(&quot;a1&quot;, &quot;a2&quot;, &quot;a3&quot;, &quot;a4&quot;, &quot;a5&quot;, &quot;a6&quot;))) |&gt; 
  tidyr::complete(name2, name, fill = list(cor = 0)) |&gt; 
  tidyr::pivot_wider(names_from = name2, values_from = cor) |&gt; 
  tibble::column_to_rownames(&quot;name&quot;) |&gt; 
diag(myMatrix) &lt;- 1
myMatrix[upper.tri(myMatrix)] &lt;- t(myMatrix)[upper.tri(myMatrix)]

which returns:

     a1   a2   a3   a4   a5   a6
a1 1.00 0.00 0.99 0.00 0.93 0.00
a2 0.00 1.00 0.94 0.94 0.00 0.00
a3 0.99 0.94 1.00 0.00 0.00 0.91
a4 0.00 0.94 0.00 1.00 0.00 0.00
a5 0.93 0.00 0.00 0.00 1.00 0.00
a6 0.00 0.00 0.91 0.00 0.00 1.00

The general idea is that you:

  • name the list to make sure you know which correlations they are (could do this programmatically with paste() if longer list)
  • combine all the list elements together into a dataframe
  • fill out all possible elements using factors (again could be done programmatically if required)
  • complete to add 0 for missing values
  • switch to a matrix, add 1 for diagonal, and make symmetric across the diagonal


得分: 2

Here's the translated code:


a <- do.call(rbind, Map(cbind, name1 = c('a1','a2', 'a3'), myList))
b <- unique(rbind(a, setNames(a[c(2,1,3)], names(a))))
xtabs(cor~., b)

And the table:

name1   a1   a2   a3   a4   a5   a6
   a1 1.00 0.00 0.99 0.00 0.93 0.00
   a2 0.00 1.00 0.94 0.94 0.00 0.00
   a3 0.99 0.94 1.00 0.00 0.00 0.91
   a4 0.00 0.94 0.00 0.00 0.00 0.00
   a5 0.93 0.00 0.00 0.00 0.00 0.00
   a6 0.00 0.00 0.91 0.00 0.00 0.00

in Base R you could do:

a &lt;- do.call(rbind,Map(cbind, name1 = c(&#39;a1&#39;,&#39;a2&#39;, &#39;a3&#39;), myList))
b &lt;- unique(rbind(a, setNames(a[c(2,1,3)], names(a))))
xtabs(cor~., b)

name1   a1   a2   a3   a4   a5   a6
   a1 1.00 0.00 0.99 0.00 0.93 0.00
   a2 0.00 1.00 0.94 0.94 0.00 0.00
   a3 0.99 0.94 1.00 0.00 0.00 0.91
   a4 0.00 0.94 0.00 0.00 0.00 0.00
   a5 0.93 0.00 0.00 0.00 0.00 0.00
   a6 0.00 0.00 0.91 0.00 0.00 0.00

  • 本文由 发表于 2023年4月4日 04:38:26
  • 转载请务必保留本文链接:https://go.coder-hub.com/75923595.html



:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:
