2023年6月5日 23:25:21go评论94阅读模式

英文:

Changing cluster labels for comparison purposes

问题

我需要帮助重新定义两个聚类过程的索引，以便它们可以更直观地进行比较。

假设聚类过程 A 给出以下向量作为输出（每个个体的簇标签向量）

clust1 <- c(1, 1, 1, 1, 3, 2, 2, 1, 1, 2, 3, 2, 2)

而聚类算法 B 返回以下向量

clust2 <- c(3, 3, 3, 3, 5, 2, 2, 3, 3, 2, 5, 2, 2)

正如你所看到的，这两个算法返回了相同的聚类，但如果有数百个观测数据，很难做到这一点。

你能帮助我开发一个自动函数（或通用方式编写的代码片段），以更改两者或其中一个的簇标签，以便它们具有相同的标签吗？

我的主要目的不是比较这两个聚类，但我需要一个能实现我所说的功能的代码，因此请不要试图通过制作图表或列联表来解决我的问题。

提前感谢！

英文:

I need help in redefining the indexes of two clustering procedures in order for them to be comparable in a more straightforward manner.

Suppose that a clustering procedure A gives you the following vector as output (vector of cluster label for each individual)

clust1 &lt;- c(1, 1, 1, 1, 3, 2, 2, 1, 1, 2, 3, 2, 2)

While the clustering algorithm B return the following vector

clust2 &lt;- c(3, 3, 3, 3, 5, 2, 2, 3, 3, 2, 5, 2, 2)

As you can see the two algorithms returned the same clustering but it is not easy to get this if you have hundreds of observations.

Can you help me in develop an automatic function (or a piece of code written in a general way) that changes the cluster labels of either both or one of the two so that they have the same labels?

My main purpose is not comparing the two clustering but I need a code that does what I have said and therefore please don't try to solve my problem just saying that I can compare them with a plot or a contingency table.

Thanks in advance!

答案1

得分: 1

你可以将它们与图表或列联表进行比较。

或者，可以这样做：

relabel <- function(xs) {
  xs <- as.character(xs)
  xs_uniq <- unique(xs)
  hash <- setNames(LETTERS[seq_along(xs_uniq)], xs_uniq)
  as.character(hash[xs])
}

## > relabel(clust1)
## [1] "A" "A" "A" "A" "B" "C" "C" "A" "A" "C" "B" "C" "C"

## > identical(relabel(clust1), relabel(clust2))
## [1] TRUE

英文:

You could compare them with a plot or a contingency table.

Alternatively, like so:

relabel &lt;- \(xs) {
  xs &lt;- as.character(xs)
  xs_uniq &lt;- unique(xs)
  hash &lt;- setNames(LETTERS[seq_along(xs_uniq)], xs_uniq)
  as.character(hash[xs])
}

## &gt; relabel(clust1)
## [1] &quot;A&quot; &quot;A&quot; &quot;A&quot; &quot;A&quot; &quot;B&quot; &quot;C&quot; &quot;C&quot; &quot;A&quot; &quot;A&quot; &quot;C&quot; &quot;B&quot; &quot;C&quot; &quot;C&quot;

## &gt; identical(relabel(clust1), relabel(clust2))
## [1] TRUE

答案2

得分: 1

clust1 <- c(1, 1, 1, 1, 3, 2, 2, 1, 1, 2, 3, 2, 2)
clust2 <- c(3, 3, 3, 3, 5, 2, 2, 3, 3, 2, 5, 2, 2)
clust2_re <- 
  factor(clust2,
       levels = unique(clust2),
       labels = unique(clust1)) |
  as.character() |
  as.numeric()
clust2_re
#> [1] 1 1 1 1 3 2 2 1 1 2 3 2 2
all(clust1 == clust2_re)
#> [1] TRUE
clust3 <- c(3, 3, 3, 3, 5, 2, 2, 3, 3, 2, 5, 2, 3)
library(igraph)
compare(clust1, clust2)
#> [1] 0
compare(clust1, clust3)
#> [1] 0.4132943

英文:

clust1 &lt;- c(1, 1, 1, 1, 3, 2, 2, 1, 1, 2, 3, 2, 2)
clust2 &lt;- c(3, 3, 3, 3, 5, 2, 2, 3, 3, 2, 5, 2, 2)

Here is a solution that works as long the number of clusters is the same between the two solutions.
We are using factor() to apply the labels of clust1 to clust2.

clust2_re &lt;- 
  factor(clust2,
       levels = unique(clust2),
       labels = unique(clust1)) |&gt; 
  as.character() |&gt; 
  as.numeric()
clust2_re
#&gt;  [1] 1 1 1 1 3 2 2 1 1 2 3 2 2
all(clust1 == clust2_re)
#&gt; [1] TRUE

Furthermore: igraph has a compare() function that returns the distance between clustering results, which also works when cluster labels differ.
Let’s add a third cluster variation and change only the last value…

clust3 &lt;- c(3, 3, 3, 3, 5, 2, 2, 3, 3, 2, 5, 2, 3)

When two clustering solutions are the same compare() returns 0

library(igraph)
compare(clust1, clust2)
#&gt; [1] 0

Whenever there are differences the result will be > 0

compare(clust1, clust3)
#&gt; [1] 0.4132943

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

更改集群标签以进行比较目的

问题

答案1

答案2

如何避免在ggarrange中裁剪标签？

为什么在遍历 R 数据框的列时比遍历等价向量花费更长时间？

ggplot2 geom_text 在 Linux 上的字体大小

如何从GDELT网站上爬取数据

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。