在R中从多个列表操作创建矩阵。

huangapple go评论65阅读模式
英文:

Create a matrix from operation on multiple lists in R

问题

我想生成 Jaccard 系数的热图,这些系数是通过对字符串向量进行计算得到的。因此,假设我有 4 个向量,我想计算每对向量的 Jaccard 系数,并将结果作为一个矩阵(4x4),以便每个矩阵单元格都包含特定组合的 Jaccard 系数。一个简单的例子,我的向量如下:

sample.set.1 <- c("A1", "B1", "C1", "D1")
sample.set.2 <- c("A2", "B1", "C1", "D2")
sample.set.3 <- c("A3", "B3", "C2", "D1")
sample.set.4 <- c("A4", "B4", "C4", "D4")

然后,我可以这样计算 Jaccard 系数:

jaccard <- function(a, b){
  shared.len <- length(intersect(a, b))
  union <- (length(a)+length(b)) - shared.len
  return(shared.len / union)
}
jaccard(sample.set.1, sample.set.2)

这样可以给我特定比较的 Jaccard 系数。我的问题是,有人能建议一种简洁的方法来将这个应用于所有向量组合,让我得到一个 4x4 的矩阵(不重复加载大量代码)。

我可以通过使用循环进行每次比较来执行此操作,但我对使用 R 的 apply 函数的实现或类似简洁的方法感兴趣。

英文:

I want to generate a heatmap of Jaccard indices, which are calculated by applying the calculation on vectors of strings. Thus, say I have 4 vectors, I want to calculate the Jaccard index for every combination of vectors and have the result as a matrix (4x4), so that each matrix cell would have the Jaccard index of specific combination. A toy example, my vectors are like so:

sample.set.1 <- c("A1", "B1", "C1", "D1")
sample.set.2 <- c("A2", "B1", "C1", "D2")
sample.set.3 <- c("A3", "B3", "C2", "D1")
sample.set.4 <- c("A4", "B4", "C4", "D4")

I can then calculate the jaccard index like so:

jaccard <- function(a, b){
  shared.len <- length(intersect(a, b))
  union <- (length(a)+length(b)) - shared.len
  return(shared.len / union)
}
jaccard(sample.set.1, sample.set.2)

This gives me the Jaccard index for a specific comparison. My question is, can someone advise on a concise way of applying this to all vector combinations, leaving me with a 4 x 4 matrix (without repeating loads of code).

I could perform this by making every comparison using a loop, but I am interested in performing this using an implementation of R's apply function, or something similarly concise.

答案1

得分: 1

proxy包中的dist函数允许您传递一个自定义函数来计算距离。然而,首先要做的是将您的sample.set向量合并为一个对象。我使用了mget函数将它们提取到一个列表中,然后将您的jaccard函数作为方法传递进去。我还要注意一下,proxy内置了jaccard相似度度量。

proxy::dist(mget(grep("sample.set.\\d", ls(), value = TRUE)), method = jaccard)

#             sample.set.1 sample.set.2 sample.set.3
#sample.set.2    0.3333333                          
#sample.set.3    0.1428571    0.0000000             
#sample.set.4    0.0000000    0.0000000    0.0000000
英文:

The dist function from the proxy package allows you to pass a custom function to compute distance. However the first thing to do is combine your sample.set vectors into one object. I used mget get pull them into a list and then passed your jaccard function as the method. I'd also note that proxy has the jaccard similarity metric builtin.

proxy::dist(mget(grep("sample.set.\\d", ls(), value = T)), method=jaccard)

#             sample.set.1 sample.set.2 sample.set.3
#sample.set.2    0.3333333                          
#sample.set.3    0.1428571    0.0000000             
#sample.set.4    0.0000000    0.0000000    0.0000000

答案2

得分: 1

在基本的R中,可以使用您帖子中定义的jaccard函数简单地执行以下操作:

samples <- mget(ls(pattern = "sample.set")) # 将所有样本放入列表中

structure(combn(samples, 2, \(x)jaccard(x[[1]], x[[2]])),
     Size = length(samples), Labels = names(samples), class = 'dist')

             sample.set.1 sample.set.2 sample.set.3
sample.set.2    0.3333333                          
sample.set.3    0.1428571    0.0000000             
sample.set.4    0.0000000    0.0000000    0.0000000
英文:

in Base R, using the function jaccard as defined in your post, you could simply do:

samples &lt;- mget(ls(pattern = &quot;sample.set&quot;)) # Get all samples into a list

structure(combn(samples, 2, \(x)jaccard(x[[1]], x[[2]])),
     Size = length(samples), Labels = names(samples), class = &#39;dist&#39;)

             sample.set.1 sample.set.2 sample.set.3
sample.set.2    0.3333333                          
sample.set.3    0.1428571    0.0000000             
sample.set.4    0.0000000    0.0000000    0.0000000

huangapple
  • 本文由 发表于 2023年7月11日 04:27:27
  • 转载请务必保留本文链接:https://go.coder-hub.com/76657126.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定