英文:
Create a matrix from operation on multiple lists in R
问题
我想生成 Jaccard 系数的热图,这些系数是通过对字符串向量进行计算得到的。因此,假设我有 4 个向量,我想计算每对向量的 Jaccard 系数,并将结果作为一个矩阵(4x4),以便每个矩阵单元格都包含特定组合的 Jaccard 系数。一个简单的例子,我的向量如下:
sample.set.1 <- c("A1", "B1", "C1", "D1")
sample.set.2 <- c("A2", "B1", "C1", "D2")
sample.set.3 <- c("A3", "B3", "C2", "D1")
sample.set.4 <- c("A4", "B4", "C4", "D4")
然后,我可以这样计算 Jaccard 系数:
jaccard <- function(a, b){
shared.len <- length(intersect(a, b))
union <- (length(a)+length(b)) - shared.len
return(shared.len / union)
}
jaccard(sample.set.1, sample.set.2)
这样可以给我特定比较的 Jaccard 系数。我的问题是,有人能建议一种简洁的方法来将这个应用于所有向量组合,让我得到一个 4x4 的矩阵(不重复加载大量代码)。
我可以通过使用循环进行每次比较来执行此操作,但我对使用 R 的 apply 函数的实现或类似简洁的方法感兴趣。
英文:
I want to generate a heatmap of Jaccard indices, which are calculated by applying the calculation on vectors of strings. Thus, say I have 4 vectors, I want to calculate the Jaccard index for every combination of vectors and have the result as a matrix (4x4), so that each matrix cell would have the Jaccard index of specific combination. A toy example, my vectors are like so:
sample.set.1 <- c("A1", "B1", "C1", "D1")
sample.set.2 <- c("A2", "B1", "C1", "D2")
sample.set.3 <- c("A3", "B3", "C2", "D1")
sample.set.4 <- c("A4", "B4", "C4", "D4")
I can then calculate the jaccard index like so:
jaccard <- function(a, b){
shared.len <- length(intersect(a, b))
union <- (length(a)+length(b)) - shared.len
return(shared.len / union)
}
jaccard(sample.set.1, sample.set.2)
This gives me the Jaccard index for a specific comparison. My question is, can someone advise on a concise way of applying this to all vector combinations, leaving me with a 4 x 4 matrix (without repeating loads of code).
I could perform this by making every comparison using a loop, but I am interested in performing this using an implementation of R's apply function, or something similarly concise.
答案1
得分: 1
proxy
包中的dist
函数允许您传递一个自定义函数来计算距离。然而,首先要做的是将您的sample.set
向量合并为一个对象。我使用了mget
函数将它们提取到一个列表中,然后将您的jaccard
函数作为方法传递进去。我还要注意一下,proxy
内置了jaccard
相似度度量。
proxy::dist(mget(grep("sample.set.\\d", ls(), value = TRUE)), method = jaccard)
# sample.set.1 sample.set.2 sample.set.3
#sample.set.2 0.3333333
#sample.set.3 0.1428571 0.0000000
#sample.set.4 0.0000000 0.0000000 0.0000000
英文:
The dist
function from the proxy
package allows you to pass a custom function to compute distance. However the first thing to do is combine your sample.set
vectors into one object. I used mget
get pull them into a list and then passed your jaccard
function as the method. I'd also note that proxy
has the jaccard
similarity metric builtin.
proxy::dist(mget(grep("sample.set.\\d", ls(), value = T)), method=jaccard)
# sample.set.1 sample.set.2 sample.set.3
#sample.set.2 0.3333333
#sample.set.3 0.1428571 0.0000000
#sample.set.4 0.0000000 0.0000000 0.0000000
答案2
得分: 1
在基本的R中,可以使用您帖子中定义的jaccard
函数简单地执行以下操作:
samples <- mget(ls(pattern = "sample.set")) # 将所有样本放入列表中
structure(combn(samples, 2, \(x)jaccard(x[[1]], x[[2]])),
Size = length(samples), Labels = names(samples), class = 'dist')
sample.set.1 sample.set.2 sample.set.3
sample.set.2 0.3333333
sample.set.3 0.1428571 0.0000000
sample.set.4 0.0000000 0.0000000 0.0000000
英文:
in Base R, using the function jaccard
as defined in your post, you could simply do:
samples <- mget(ls(pattern = "sample.set")) # Get all samples into a list
structure(combn(samples, 2, \(x)jaccard(x[[1]], x[[2]])),
Size = length(samples), Labels = names(samples), class = 'dist')
sample.set.1 sample.set.2 sample.set.3
sample.set.2 0.3333333
sample.set.3 0.1428571 0.0000000
sample.set.4 0.0000000 0.0000000 0.0000000
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论