2023年7月11日 04:27:27go评论91阅读模式

英文:

Create a matrix from operation on multiple lists in R

问题

我想生成 Jaccard 系数的热图，这些系数是通过对字符串向量进行计算得到的。因此，假设我有 4 个向量，我想计算每对向量的 Jaccard 系数，并将结果作为一个矩阵（4x4），以便每个矩阵单元格都包含特定组合的 Jaccard 系数。一个简单的例子，我的向量如下：

sample.set.1 &lt;- c(&quot;A1&quot;, &quot;B1&quot;, &quot;C1&quot;, &quot;D1&quot;)
sample.set.2 &lt;- c(&quot;A2&quot;, &quot;B1&quot;, &quot;C1&quot;, &quot;D2&quot;)
sample.set.3 &lt;- c(&quot;A3&quot;, &quot;B3&quot;, &quot;C2&quot;, &quot;D1&quot;)
sample.set.4 &lt;- c(&quot;A4&quot;, &quot;B4&quot;, &quot;C4&quot;, &quot;D4&quot;)

然后，我可以这样计算 Jaccard 系数：

jaccard &lt;- function(a, b){
  shared.len &lt;- length(intersect(a, b))
  union &lt;- (length(a)+length(b)) - shared.len
  return(shared.len / union)
}
jaccard(sample.set.1, sample.set.2)

这样可以给我特定比较的 Jaccard 系数。我的问题是，有人能建议一种简洁的方法来将这个应用于所有向量组合，让我得到一个 4x4 的矩阵（不重复加载大量代码）。

我可以通过使用循环进行每次比较来执行此操作，但我对使用 R 的 apply 函数的实现或类似简洁的方法感兴趣。

英文:

I want to generate a heatmap of Jaccard indices, which are calculated by applying the calculation on vectors of strings. Thus, say I have 4 vectors, I want to calculate the Jaccard index for every combination of vectors and have the result as a matrix (4x4), so that each matrix cell would have the Jaccard index of specific combination. A toy example, my vectors are like so:

sample.set.1 &lt;- c(&quot;A1&quot;, &quot;B1&quot;, &quot;C1&quot;, &quot;D1&quot;)
sample.set.2 &lt;- c(&quot;A2&quot;, &quot;B1&quot;, &quot;C1&quot;, &quot;D2&quot;)
sample.set.3 &lt;- c(&quot;A3&quot;, &quot;B3&quot;, &quot;C2&quot;, &quot;D1&quot;)
sample.set.4 &lt;- c(&quot;A4&quot;, &quot;B4&quot;, &quot;C4&quot;, &quot;D4&quot;)

I can then calculate the jaccard index like so:

jaccard &lt;- function(a, b){
  shared.len &lt;- length(intersect(a, b))
  union &lt;- (length(a)+length(b)) - shared.len
  return(shared.len / union)
}
jaccard(sample.set.1, sample.set.2)

This gives me the Jaccard index for a specific comparison. My question is, can someone advise on a concise way of applying this to all vector combinations, leaving me with a 4 x 4 matrix (without repeating loads of code).

I could perform this by making every comparison using a loop, but I am interested in performing this using an implementation of R's apply function, or something similarly concise.

答案1

得分: 1

proxy包中的dist函数允许您传递一个自定义函数来计算距离。然而，首先要做的是将您的sample.set向量合并为一个对象。我使用了mget函数将它们提取到一个列表中，然后将您的jaccard函数作为方法传递进去。我还要注意一下，proxy内置了jaccard相似度度量。

proxy::dist(mget(grep("sample.set.\\d", ls(), value = TRUE)), method = jaccard)
#             sample.set.1 sample.set.2 sample.set.3
#sample.set.2    0.3333333                          
#sample.set.3    0.1428571    0.0000000             
#sample.set.4    0.0000000    0.0000000    0.0000000

英文:

The dist function from the proxy package allows you to pass a custom function to compute distance. However the first thing to do is combine your sample.set vectors into one object. I used mget get pull them into a list and then passed your jaccard function as the method. I'd also note that proxy has the jaccard similarity metric builtin.

proxy::dist(mget(grep(&quot;sample.set.\\d&quot;, ls(), value = T)), method=jaccard)
#             sample.set.1 sample.set.2 sample.set.3
#sample.set.2    0.3333333                          
#sample.set.3    0.1428571    0.0000000             
#sample.set.4    0.0000000    0.0000000    0.0000000

答案2

得分: 1

在基本的R中，可以使用您帖子中定义的jaccard函数简单地执行以下操作：

samples <- mget(ls(pattern = "sample.set")) # 将所有样本放入列表中
structure(combn(samples, 2, \(x)jaccard(x[[1]], x[[2]])),
     Size = length(samples), Labels = names(samples), class = 'dist')
             sample.set.1 sample.set.2 sample.set.3
sample.set.2    0.3333333                          
sample.set.3    0.1428571    0.0000000             
sample.set.4    0.0000000    0.0000000    0.0000000

英文:

in Base R, using the function jaccard as defined in your post, you could simply do:

samples &lt;- mget(ls(pattern = &quot;sample.set&quot;)) # Get all samples into a list
structure(combn(samples, 2, \(x)jaccard(x[[1]], x[[2]])),
     Size = length(samples), Labels = names(samples), class = &#39;dist&#39;)
             sample.set.1 sample.set.2 sample.set.3
sample.set.2    0.3333333                          
sample.set.3    0.1428571    0.0000000             
sample.set.4    0.0000000    0.0000000    0.0000000

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

在R中从多个列表操作创建矩阵。

问题

答案1

答案2

Sorting rows of a sparse CSC matrix Golang

Get all first four digit values within a column of data and make those values a new column of data in R

在Shiny应用中相互更新材料开关

计算多个配对变量的实际差异和百分比差异同时。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。