英文:
Changing cluster labels for comparison purposes
问题
我需要帮助重新定义两个聚类过程的索引,以便它们可以更直观地进行比较。
假设聚类过程 A 给出以下向量作为输出(每个个体的簇标签向量)
clust1 <- c(1, 1, 1, 1, 3, 2, 2, 1, 1, 2, 3, 2, 2)
而聚类算法 B 返回以下向量
clust2 <- c(3, 3, 3, 3, 5, 2, 2, 3, 3, 2, 5, 2, 2)
正如你所看到的,这两个算法返回了相同的聚类,但如果有数百个观测数据,很难做到这一点。
你能帮助我开发一个自动函数(或通用方式编写的代码片段),以更改两者或其中一个的簇标签,以便它们具有相同的标签吗?
我的主要目的不是比较这两个聚类,但我需要一个能实现我所说的功能的代码,因此请不要试图通过制作图表或列联表来解决我的问题。
提前感谢!
英文:
I need help in redefining the indexes of two clustering procedures in order for them to be comparable in a more straightforward manner.
Suppose that a clustering procedure A gives you the following vector as output (vector of cluster label for each individual)
clust1 <- c(1, 1, 1, 1, 3, 2, 2, 1, 1, 2, 3, 2, 2)
While the clustering algorithm B return the following vector
clust2 <- c(3, 3, 3, 3, 5, 2, 2, 3, 3, 2, 5, 2, 2)
As you can see the two algorithms returned the same clustering but it is not easy to get this if you have hundreds of observations.
Can you help me in develop an automatic function (or a piece of code written in a general way) that changes the cluster labels of either both or one of the two so that they have the same labels?
My main purpose is not comparing the two clustering but I need a code that does what I have said and therefore please don't try to solve my problem just saying that I can compare them with a plot or a contingency table.
Thanks in advance!
答案1
得分: 1
你可以将它们与图表或列联表进行比较。
或者,可以这样做:
relabel <- function(xs) {
xs <- as.character(xs)
xs_uniq <- unique(xs)
hash <- setNames(LETTERS[seq_along(xs_uniq)], xs_uniq)
as.character(hash[xs])
}
## > relabel(clust1)
## [1] "A" "A" "A" "A" "B" "C" "C" "A" "A" "C" "B" "C" "C"
## > identical(relabel(clust1), relabel(clust2))
## [1] TRUE
英文:
You could compare them with a plot or a contingency table.
Alternatively, like so:
relabel <- \(xs) {
xs <- as.character(xs)
xs_uniq <- unique(xs)
hash <- setNames(LETTERS[seq_along(xs_uniq)], xs_uniq)
as.character(hash[xs])
}
## > relabel(clust1)
## [1] "A" "A" "A" "A" "B" "C" "C" "A" "A" "C" "B" "C" "C"
## > identical(relabel(clust1), relabel(clust2))
## [1] TRUE
答案2
得分: 1
clust1 <- c(1, 1, 1, 1, 3, 2, 2, 1, 1, 2, 3, 2, 2)
clust2 <- c(3, 3, 3, 3, 5, 2, 2, 3, 3, 2, 5, 2, 2)
clust2_re <-
factor(clust2,
levels = unique(clust2),
labels = unique(clust1)) |
as.character() |
as.numeric()
clust2_re
#> [1] 1 1 1 1 3 2 2 1 1 2 3 2 2
all(clust1 == clust2_re)
#> [1] TRUE
clust3 <- c(3, 3, 3, 3, 5, 2, 2, 3, 3, 2, 5, 2, 3)
library(igraph)
compare(clust1, clust2)
#> [1] 0
compare(clust1, clust3)
#> [1] 0.4132943
英文:
clust1 <- c(1, 1, 1, 1, 3, 2, 2, 1, 1, 2, 3, 2, 2)
clust2 <- c(3, 3, 3, 3, 5, 2, 2, 3, 3, 2, 5, 2, 2)
Here is a solution that works as long the number of clusters is the same between the two solutions.
We are using factor()
to apply the labels of clust1
to clust2
.
clust2_re <-
factor(clust2,
levels = unique(clust2),
labels = unique(clust1)) |>
as.character() |>
as.numeric()
clust2_re
#> [1] 1 1 1 1 3 2 2 1 1 2 3 2 2
all(clust1 == clust2_re)
#> [1] TRUE
Furthermore: igraph
has a compare()
function that returns the distance between clustering results, which also works when cluster labels differ.
Let’s add a third cluster variation and change only the last value…
clust3 <- c(3, 3, 3, 3, 5, 2, 2, 3, 3, 2, 5, 2, 3)
When two clustering solutions are the same compare()
returns 0
library(igraph)
compare(clust1, clust2)
#> [1] 0
Whenever there are differences the result will be > 0
compare(clust1, clust3)
#> [1] 0.4132943
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论