英文:
How to combine a list of data frame into a single dataframe using R?
问题
我有一个数据帧列表
A1 = data.frame(name = c("a1", "a3", "a5"), cor = c(1, 0.99, 0.93))
A2 = data.frame(name = c("a2", "a3", "a4"), cor = c(1, 0.94, 0.94))
A3 = data.frame(name = c("a3", "a1", "a2", "a6"), cor = c(1, 0.99, 0.94, 0.91))
myList = list(A1, A2, A3)
每个数据框都是计算得出的相关系数(CC)。
例如:
在A1中,a1和a1之间的CC为1,a1和a3之间的CC为0.99,a1和a5之间的CC为0.93;
在A2中,a2和a2之间的CC为1,a2和a3之间的CC为0.94,a2和a4之间的CC为0.94。
我想要做的是将这些单独的数据框合并成一个完整的,如下所示:
corMatrix
a1 a2 a3 a4 a5 a6
a1 1.00 0.00 0.99 0.00 0.93 0.00
a2 0.00 1.00 0.94 0.94 0.00 0.00
a3 0.99 0.94 1.00 0.00 0.00 0.91
a4 0.00 0.94 0.00 1.00 0.00 0.00
a5 0.93 0.00 0.00 0.00 1.00 0.00
a6 0.00 0.00 0.91 0.00 0.00 1.00
这个corMatrix数据框包含了上述数据框的所有相关信息。如果两个变量的相关信息未知,则使用0表示它们的CC值,例如变量a1和a2。
我该如何做?
非常感谢。
英文:
I have a list of data frame
A1 = data.frame(name = c("a1", "a3", "a5"), cor = c(1, 0.99, 0.93))
A2 = data.frame(name = c("a2", "a3", "a4"), cor = c(1, 0.94, 0.94))
A3 = data.frame(name = c("a3", "a1", "a2", "a6"), cor = c(1, 0.99, 0.94, 0.91))
myList = list(A1, A2, A3)
Each data frame is a calculated correlation coefficient (CC).
For instance:
in A1, the CC between a1 and a1 is 1, between a1 and a3 is 0.99, and between a1 and a5 is 0.93;
in A2, the CC between a2 and a2 is 1, between a2 and a3 is 0.94, and between a2 and a4 is 0.94.
What I want to do is to combine these individual dataframe into a complete one like following:
corMatrix
a1 a2 a3 a4 a5 a6
a1 1.00 0.00 0.99 0.00 0.93 0.00
a2 0.00 1.00 0.94 0.94 0.00 0.00
a3 0.99 0.94 1.00 0.00 0.00 0.91
a4 0.00 0.94 0.00 1.00 0.00 0.00
a5 0.93 0.00 0.00 0.00 1.00 0.00
a6 0.00 0.00 0.91 0.00 0.00 1.00
This corMatrix dataframe contains all the correlation information of the above data frames. If the correlation information of two variables are unknown, then 0 is used to represent their CC values, such as variable a1 and a2.
How can I do it?
Thanks a lot.
答案1
得分: 2
我相信这是您寻找的,尽管可能不是最佳的方法:
A1 = data.frame(name = c("a1", "a3", "a5"), cor = c(1, 0.99, 0.93))
A2 = data.frame(name = c("a2", "a3", "a4"), cor = c(1, 0.94, 0.94))
A3 = data.frame(name = c("a3", "a1", "a2", "a6"), cor = c(1, 0.99, 0.94, 0.91))
myList = list(A1, A2, A3)
names(myList) = c("a1", "a2", "a3")
myMatrix = dplyr::bind_rows(myList, .id = "name2") |>
dplyr::mutate(name2 = factor(name2, levels = c("a1", "a2", "a3", "a4", "a5", "a6")),
name = factor(name, levels = c("a1", "a2", "a3", "a4", "a5", "a6"))) |>
tidyr::complete(name2, name, fill = list(cor = 0)) |>
tidyr::pivot_wider(names_from = name2, values_from = cor) |>
tibble::column_to_rownames("name") |>
as.matrix()
diag(myMatrix) <- 1
myMatrix[upper.tri(myMatrix)] <- t(myMatrix)[upper.tri(myMatrix)]
它返回:
a1 a2 a3 a4 a5 a6
a1 1.00 0.00 0.99 0.00 0.93 0.00
a2 0.00 1.00 0.94 0.94 0.00 0.00
a3 0.99 0.94 1.00 0.00 0.00 0.91
a4 0.00 0.94 0.00 1.00 0.00 0.00
a5 0.93 0.00 0.00 0.00 1.00 0.00
a6 0.00 0.00 0.91 0.00 0.00 1.00
一般思路是:
- 给列表命名,以确保您知道它们是哪些相关性(如果列表较长,可以使用
paste()来自动生成名称) - 将所有列表元素合并成一个数据框
- 使用因子来填充所有可能的元素(如果需要,可以以编程方式完成)
- 使用0填充缺失值
- 切换到矩阵,对角线加1,并使对角线上下对称。
英文:
I believe this does what you're looking for, although it may not be the best way of doing this:
A1 = data.frame(name = c("a1", "a3", "a5"), cor = c(1, 0.99, 0.93))
A2 = data.frame(name = c("a2", "a3", "a4"), cor = c(1, 0.94, 0.94))
A3 = data.frame(name = c("a3", "a1", "a2", "a6"), cor = c(1, 0.99, 0.94, 0.91))
myList = list(A1, A2, A3)
names(myList) = c("a1", "a2", "a3")
myMatrix = dplyr::bind_rows(myList, .id = "name2") |>
dplyr::mutate(name2 = factor(name2, levels = c("a1", "a2", "a3", "a4", "a5", "a6")),
name = factor(name, levels = c("a1", "a2", "a3", "a4", "a5", "a6"))) |>
tidyr::complete(name2, name, fill = list(cor = 0)) |>
tidyr::pivot_wider(names_from = name2, values_from = cor) |>
tibble::column_to_rownames("name") |>
as.matrix()
diag(myMatrix) <- 1
myMatrix[upper.tri(myMatrix)] <- t(myMatrix)[upper.tri(myMatrix)]
which returns:
a1 a2 a3 a4 a5 a6
a1 1.00 0.00 0.99 0.00 0.93 0.00
a2 0.00 1.00 0.94 0.94 0.00 0.00
a3 0.99 0.94 1.00 0.00 0.00 0.91
a4 0.00 0.94 0.00 1.00 0.00 0.00
a5 0.93 0.00 0.00 0.00 1.00 0.00
a6 0.00 0.00 0.91 0.00 0.00 1.00
The general idea is that you:
- name the list to make sure you know which correlations they are (could do this programmatically with
paste()if longer list) - combine all the list elements together into a dataframe
- fill out all possible elements using factors (again could be done programmatically if required)
- complete to add 0 for missing values
- switch to a matrix, add 1 for diagonal, and make symmetric across the diagonal
答案2
得分: 2
Here's the translated code:
在基础R中,你可以这样做:
a <- do.call(rbind, Map(cbind, name1 = c('a1','a2', 'a3'), myList))
b <- unique(rbind(a, setNames(a[c(2,1,3)], names(a))))
xtabs(cor~., b)
And the table:
name
name1 a1 a2 a3 a4 a5 a6
a1 1.00 0.00 0.99 0.00 0.93 0.00
a2 0.00 1.00 0.94 0.94 0.00 0.00
a3 0.99 0.94 1.00 0.00 0.00 0.91
a4 0.00 0.94 0.00 0.00 0.00 0.00
a5 0.93 0.00 0.00 0.00 0.00 0.00
a6 0.00 0.00 0.91 0.00 0.00 0.00
英文:
in Base R you could do:
a <- do.call(rbind,Map(cbind, name1 = c('a1','a2', 'a3'), myList))
b <- unique(rbind(a, setNames(a[c(2,1,3)], names(a))))
xtabs(cor~., b)
name
name1 a1 a2 a3 a4 a5 a6
a1 1.00 0.00 0.99 0.00 0.93 0.00
a2 0.00 1.00 0.94 0.94 0.00 0.00
a3 0.99 0.94 1.00 0.00 0.00 0.91
a4 0.00 0.94 0.00 0.00 0.00 0.00
a5 0.93 0.00 0.00 0.00 0.00 0.00
a6 0.00 0.00 0.91 0.00 0.00 0.00
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。


评论