英文:
How to combine a list of data frame into a single dataframe using R?
问题
我有一个数据帧列表
A1 = data.frame(name = c("a1", "a3", "a5"), cor = c(1, 0.99, 0.93))
A2 = data.frame(name = c("a2", "a3", "a4"), cor = c(1, 0.94, 0.94))
A3 = data.frame(name = c("a3", "a1", "a2", "a6"), cor = c(1, 0.99, 0.94, 0.91))
myList = list(A1, A2, A3)
每个数据框都是计算得出的相关系数(CC)。
例如:
在A1
中,a1和a1
之间的CC为1,a1和a3
之间的CC为0.99,a1和a5
之间的CC为0.93;
在A2
中,a2和a2
之间的CC为1,a2和a3
之间的CC为0.94,a2和a4
之间的CC为0.94。
我想要做的是将这些单独的数据框合并成一个完整的,如下所示:
corMatrix
a1 a2 a3 a4 a5 a6
a1 1.00 0.00 0.99 0.00 0.93 0.00
a2 0.00 1.00 0.94 0.94 0.00 0.00
a3 0.99 0.94 1.00 0.00 0.00 0.91
a4 0.00 0.94 0.00 1.00 0.00 0.00
a5 0.93 0.00 0.00 0.00 1.00 0.00
a6 0.00 0.00 0.91 0.00 0.00 1.00
这个corMatrix
数据框包含了上述数据框的所有相关信息。如果两个变量的相关信息未知,则使用0
表示它们的CC值,例如变量a1和a2
。
我该如何做?
非常感谢。
英文:
I have a list of data frame
A1 = data.frame(name = c("a1", "a3", "a5"), cor = c(1, 0.99, 0.93))
A2 = data.frame(name = c("a2", "a3", "a4"), cor = c(1, 0.94, 0.94))
A3 = data.frame(name = c("a3", "a1", "a2", "a6"), cor = c(1, 0.99, 0.94, 0.91))
myList = list(A1, A2, A3)
Each data frame is a calculated correlation coefficient (CC).
For instance:
in A1
, the CC between a1 and a1
is 1, between a1 and a3
is 0.99, and between a1 and a5
is 0.93;
in A2
, the CC between a2 and a2
is 1, between a2 and a3
is 0.94, and between a2 and a4
is 0.94.
What I want to do is to combine these individual dataframe into a complete one like following:
corMatrix
a1 a2 a3 a4 a5 a6
a1 1.00 0.00 0.99 0.00 0.93 0.00
a2 0.00 1.00 0.94 0.94 0.00 0.00
a3 0.99 0.94 1.00 0.00 0.00 0.91
a4 0.00 0.94 0.00 1.00 0.00 0.00
a5 0.93 0.00 0.00 0.00 1.00 0.00
a6 0.00 0.00 0.91 0.00 0.00 1.00
This corMatrix
dataframe contains all the correlation information of the above data frames. If the correlation information of two variables are unknown, then 0
is used to represent their CC values, such as variable a1 and a2
.
How can I do it?
Thanks a lot.
答案1
得分: 2
我相信这是您寻找的,尽管可能不是最佳的方法:
A1 = data.frame(name = c("a1", "a3", "a5"), cor = c(1, 0.99, 0.93))
A2 = data.frame(name = c("a2", "a3", "a4"), cor = c(1, 0.94, 0.94))
A3 = data.frame(name = c("a3", "a1", "a2", "a6"), cor = c(1, 0.99, 0.94, 0.91))
myList = list(A1, A2, A3)
names(myList) = c("a1", "a2", "a3")
myMatrix = dplyr::bind_rows(myList, .id = "name2") |>
dplyr::mutate(name2 = factor(name2, levels = c("a1", "a2", "a3", "a4", "a5", "a6")),
name = factor(name, levels = c("a1", "a2", "a3", "a4", "a5", "a6"))) |>
tidyr::complete(name2, name, fill = list(cor = 0)) |>
tidyr::pivot_wider(names_from = name2, values_from = cor) |>
tibble::column_to_rownames("name") |>
as.matrix()
diag(myMatrix) <- 1
myMatrix[upper.tri(myMatrix)] <- t(myMatrix)[upper.tri(myMatrix)]
它返回:
a1 a2 a3 a4 a5 a6
a1 1.00 0.00 0.99 0.00 0.93 0.00
a2 0.00 1.00 0.94 0.94 0.00 0.00
a3 0.99 0.94 1.00 0.00 0.00 0.91
a4 0.00 0.94 0.00 1.00 0.00 0.00
a5 0.93 0.00 0.00 0.00 1.00 0.00
a6 0.00 0.00 0.91 0.00 0.00 1.00
一般思路是:
- 给列表命名,以确保您知道它们是哪些相关性(如果列表较长,可以使用
paste()
来自动生成名称) - 将所有列表元素合并成一个数据框
- 使用因子来填充所有可能的元素(如果需要,可以以编程方式完成)
- 使用0填充缺失值
- 切换到矩阵,对角线加1,并使对角线上下对称。
英文:
I believe this does what you're looking for, although it may not be the best way of doing this:
A1 = data.frame(name = c("a1", "a3", "a5"), cor = c(1, 0.99, 0.93))
A2 = data.frame(name = c("a2", "a3", "a4"), cor = c(1, 0.94, 0.94))
A3 = data.frame(name = c("a3", "a1", "a2", "a6"), cor = c(1, 0.99, 0.94, 0.91))
myList = list(A1, A2, A3)
names(myList) = c("a1", "a2", "a3")
myMatrix = dplyr::bind_rows(myList, .id = "name2") |>
dplyr::mutate(name2 = factor(name2, levels = c("a1", "a2", "a3", "a4", "a5", "a6")),
name = factor(name, levels = c("a1", "a2", "a3", "a4", "a5", "a6"))) |>
tidyr::complete(name2, name, fill = list(cor = 0)) |>
tidyr::pivot_wider(names_from = name2, values_from = cor) |>
tibble::column_to_rownames("name") |>
as.matrix()
diag(myMatrix) <- 1
myMatrix[upper.tri(myMatrix)] <- t(myMatrix)[upper.tri(myMatrix)]
which returns:
a1 a2 a3 a4 a5 a6
a1 1.00 0.00 0.99 0.00 0.93 0.00
a2 0.00 1.00 0.94 0.94 0.00 0.00
a3 0.99 0.94 1.00 0.00 0.00 0.91
a4 0.00 0.94 0.00 1.00 0.00 0.00
a5 0.93 0.00 0.00 0.00 1.00 0.00
a6 0.00 0.00 0.91 0.00 0.00 1.00
The general idea is that you:
- name the list to make sure you know which correlations they are (could do this programmatically with
paste()
if longer list) - combine all the list elements together into a dataframe
- fill out all possible elements using factors (again could be done programmatically if required)
- complete to add 0 for missing values
- switch to a matrix, add 1 for diagonal, and make symmetric across the diagonal
答案2
得分: 2
Here's the translated code:
在基础R中,你可以这样做:
a <- do.call(rbind, Map(cbind, name1 = c('a1','a2', 'a3'), myList))
b <- unique(rbind(a, setNames(a[c(2,1,3)], names(a))))
xtabs(cor~., b)
And the table:
name
name1 a1 a2 a3 a4 a5 a6
a1 1.00 0.00 0.99 0.00 0.93 0.00
a2 0.00 1.00 0.94 0.94 0.00 0.00
a3 0.99 0.94 1.00 0.00 0.00 0.91
a4 0.00 0.94 0.00 0.00 0.00 0.00
a5 0.93 0.00 0.00 0.00 0.00 0.00
a6 0.00 0.00 0.91 0.00 0.00 0.00
英文:
in Base R you could do:
a <- do.call(rbind,Map(cbind, name1 = c('a1','a2', 'a3'), myList))
b <- unique(rbind(a, setNames(a[c(2,1,3)], names(a))))
xtabs(cor~., b)
name
name1 a1 a2 a3 a4 a5 a6
a1 1.00 0.00 0.99 0.00 0.93 0.00
a2 0.00 1.00 0.94 0.94 0.00 0.00
a3 0.99 0.94 1.00 0.00 0.00 0.91
a4 0.00 0.94 0.00 0.00 0.00 0.00
a5 0.93 0.00 0.00 0.00 0.00 0.00
a6 0.00 0.00 0.91 0.00 0.00 0.00
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论