2023年4月4日 04:38:26go评论121阅读模式

英文:

How to combine a list of data frame into a single dataframe using R?

问题

我有一个数据帧列表

A1 = data.frame(name = c("a1", "a3", "a5"), cor = c(1, 0.99, 0.93))
A2 = data.frame(name = c("a2", "a3", "a4"), cor = c(1, 0.94, 0.94))
A3 = data.frame(name = c("a3", "a1", "a2", "a6"), cor = c(1, 0.99, 0.94, 0.91))
myList = list(A1, A2, A3)

每个数据框都是计算得出的相关系数（CC）。

例如：

在A1中，a1和a1之间的CC为1，a1和a3之间的CC为0.99，a1和a5之间的CC为0.93；

在A2中，a2和a2之间的CC为1，a2和a3之间的CC为0.94，a2和a4之间的CC为0.94。

我想要做的是将这些单独的数据框合并成一个完整的，如下所示：

corMatrix
     a1   a2   a3   a4   a5   a6
a1 1.00 0.00 0.99 0.00 0.93 0.00
a2 0.00 1.00 0.94 0.94 0.00 0.00
a3 0.99 0.94 1.00 0.00 0.00 0.91
a4 0.00 0.94 0.00 1.00 0.00 0.00
a5 0.93 0.00 0.00 0.00 1.00 0.00
a6 0.00 0.00 0.91 0.00 0.00 1.00

这个corMatrix数据框包含了上述数据框的所有相关信息。如果两个变量的相关信息未知，则使用0表示它们的CC值，例如变量a1和a2。

我该如何做？

非常感谢。

英文:

I have a list of data frame

A1 = data.frame(name = c(&quot;a1&quot;, &quot;a3&quot;, &quot;a5&quot;), cor = c(1, 0.99, 0.93))
A2 = data.frame(name = c(&quot;a2&quot;, &quot;a3&quot;, &quot;a4&quot;), cor = c(1, 0.94, 0.94))
A3 = data.frame(name = c(&quot;a3&quot;, &quot;a1&quot;, &quot;a2&quot;, &quot;a6&quot;), cor = c(1, 0.99, 0.94, 0.91))
myList = list(A1, A2, A3)

Each data frame is a calculated correlation coefficient (CC).

For instance:

in A1, the CC between a1 and a1 is 1, between a1 and a3 is 0.99, and between a1 and a5 is 0.93;

in A2, the CC between a2 and a2 is 1, between a2 and a3 is 0.94, and between a2 and a4 is 0.94.

What I want to do is to combine these individual dataframe into a complete one like following:

corMatrix
     a1   a2   a3   a4   a5   a6
a1 1.00 0.00 0.99 0.00 0.93 0.00
a2 0.00 1.00 0.94 0.94 0.00 0.00
a3 0.99 0.94 1.00 0.00 0.00 0.91
a4 0.00 0.94 0.00 1.00 0.00 0.00
a5 0.93 0.00 0.00 0.00 1.00 0.00
a6 0.00 0.00 0.91 0.00 0.00 1.00

This corMatrix dataframe contains all the correlation information of the above data frames. If the correlation information of two variables are unknown, then 0 is used to represent their CC values, such as variable a1 and a2.

How can I do it?

Thanks a lot.

答案1

得分: 2

我相信这是您寻找的，尽管可能不是最佳的方法：

A1 = data.frame(name = c("a1", "a3", "a5"), cor = c(1, 0.99, 0.93))
A2 = data.frame(name = c("a2", "a3", "a4"), cor = c(1, 0.94, 0.94))
A3 = data.frame(name = c("a3", "a1", "a2", "a6"), cor = c(1, 0.99, 0.94, 0.91))
myList = list(A1, A2, A3)
names(myList) = c("a1", "a2", "a3")
myMatrix = dplyr::bind_rows(myList, .id = "name2") |>
  dplyr::mutate(name2 = factor(name2, levels = c("a1", "a2", "a3", "a4", "a5", "a6")),
                name = factor(name, levels = c("a1", "a2", "a3", "a4", "a5", "a6"))) |>
  tidyr::complete(name2, name, fill = list(cor = 0)) |>
  tidyr::pivot_wider(names_from = name2, values_from = cor) |>
  tibble::column_to_rownames("name") |>
  as.matrix() 
diag(myMatrix) <- 1
myMatrix[upper.tri(myMatrix)] <- t(myMatrix)[upper.tri(myMatrix)]

它返回：

     a1   a2   a3   a4   a5   a6
a1 1.00 0.00 0.99 0.00 0.93 0.00
a2 0.00 1.00 0.94 0.94 0.00 0.00
a3 0.99 0.94 1.00 0.00 0.00 0.91
a4 0.00 0.94 0.00 1.00 0.00 0.00
a5 0.93 0.00 0.00 0.00 1.00 0.00
a6 0.00 0.00 0.91 0.00 0.00 1.00

一般思路是：

给列表命名，以确保您知道它们是哪些相关性（如果列表较长，可以使用paste()来自动生成名称）
将所有列表元素合并成一个数据框
使用因子来填充所有可能的元素（如果需要，可以以编程方式完成）
使用0填充缺失值
切换到矩阵，对角线加1，并使对角线上下对称。

英文:

I believe this does what you're looking for, although it may not be the best way of doing this:

A1 = data.frame(name = c(&quot;a1&quot;, &quot;a3&quot;, &quot;a5&quot;), cor = c(1, 0.99, 0.93))
A2 = data.frame(name = c(&quot;a2&quot;, &quot;a3&quot;, &quot;a4&quot;), cor = c(1, 0.94, 0.94))
A3 = data.frame(name = c(&quot;a3&quot;, &quot;a1&quot;, &quot;a2&quot;, &quot;a6&quot;), cor = c(1, 0.99, 0.94, 0.91))
myList = list(A1, A2, A3)
names(myList) = c(&quot;a1&quot;, &quot;a2&quot;, &quot;a3&quot;)
myMatrix = dplyr::bind_rows(myList, .id = &quot;name2&quot;) |&gt; 
  dplyr::mutate(name2 = factor(name2, levels = c(&quot;a1&quot;, &quot;a2&quot;, &quot;a3&quot;, &quot;a4&quot;, &quot;a5&quot;, &quot;a6&quot;)),
                name = factor(name, levels = c(&quot;a1&quot;, &quot;a2&quot;, &quot;a3&quot;, &quot;a4&quot;, &quot;a5&quot;, &quot;a6&quot;))) |&gt; 
  tidyr::complete(name2, name, fill = list(cor = 0)) |&gt; 
  tidyr::pivot_wider(names_from = name2, values_from = cor) |&gt; 
  tibble::column_to_rownames(&quot;name&quot;) |&gt; 
  as.matrix() 
diag(myMatrix) &lt;- 1
myMatrix[upper.tri(myMatrix)] &lt;- t(myMatrix)[upper.tri(myMatrix)]

which returns:

     a1   a2   a3   a4   a5   a6
a1 1.00 0.00 0.99 0.00 0.93 0.00
a2 0.00 1.00 0.94 0.94 0.00 0.00
a3 0.99 0.94 1.00 0.00 0.00 0.91
a4 0.00 0.94 0.00 1.00 0.00 0.00
a5 0.93 0.00 0.00 0.00 1.00 0.00
a6 0.00 0.00 0.91 0.00 0.00 1.00

The general idea is that you:

name the list to make sure you know which correlations they are (could do this programmatically with paste() if longer list)
combine all the list elements together into a dataframe
fill out all possible elements using factors (again could be done programmatically if required)
complete to add 0 for missing values
switch to a matrix, add 1 for diagonal, and make symmetric across the diagonal

答案2

得分: 2

Here's the translated code:

在基础R中，你可以这样做：
a <- do.call(rbind, Map(cbind, name1 = c('a1','a2', 'a3'), myList))
b <- unique(rbind(a, setNames(a[c(2,1,3)], names(a))))
xtabs(cor~., b)

And the table:

     name
name1   a1   a2   a3   a4   a5   a6
   a1 1.00 0.00 0.99 0.00 0.93 0.00
   a2 0.00 1.00 0.94 0.94 0.00 0.00
   a3 0.99 0.94 1.00 0.00 0.00 0.91
   a4 0.00 0.94 0.00 0.00 0.00 0.00
   a5 0.93 0.00 0.00 0.00 0.00 0.00
   a6 0.00 0.00 0.91 0.00 0.00 0.00

英文:

in Base R you could do:

a &lt;- do.call(rbind,Map(cbind, name1 = c(&#39;a1&#39;,&#39;a2&#39;, &#39;a3&#39;), myList))
b &lt;- unique(rbind(a, setNames(a[c(2,1,3)], names(a))))
xtabs(cor~., b)
    name
name1   a1   a2   a3   a4   a5   a6
   a1 1.00 0.00 0.99 0.00 0.93 0.00
   a2 0.00 1.00 0.94 0.94 0.00 0.00
   a3 0.99 0.94 1.00 0.00 0.00 0.91
   a4 0.00 0.94 0.00 0.00 0.00 0.00
   a5 0.93 0.00 0.00 0.00 0.00 0.00
   a6 0.00 0.00 0.91 0.00 0.00 0.00

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何将数据框列表合并为一个数据框使用R？

问题

答案1

答案2

如何在多列上执行字符串分割和拆分？

用 R 语言填充缩放后的数字序列。

如何计算具有不同样本大小的标准误差？

更快的将大型嵌套XML转换为R数据框的方法

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。