英文:
Performing k means cluster analysis, how can I reorganize the data into individual clusters?
问题
我正在对一个包含62个变量的数据框执行k均值聚类分析:Tapping number 1-62 和75000列。如何将数据框组织成单独的聚类?
我使用fviz_cluster来可视化聚类:
r_fit = kmeans(pressure_rotate, 5, nstart = 25)
fviz_cluster(r_fit, data = pressure_rotate)
我能够通过使用r_fit$cluster命令访问表格,查看哪个变量属于哪个聚类,但如何重新组织数据以查看每个聚类包含的内容?类似以下内容:
cluster 1: Tapping number 3, Tapping number 5, Tapping number 12, ...
cluster 2: Tapping number 7, Tapping number 9, ....
等等
英文:
I am performing a k-means cluster analysis on a data frame with 62 variables: Tapping number 1-62 and 75000 columns. How can I organize the data frame into individual clusters?
I used fviz_cluster to visualize the clusters:
r_fit = kmeans(pressure_rotate, 5, nstart = 25)
fviz_cluster(r_fit,data = pressure_rotate)
and I was able to access a table for which variable belongs to which cluster with r_fit$cluster command, but how can I reorganize the data so that I can see what each cluster contains? Like something along the lines of:
cluster 1: Tapping number 3, Tapping number 5, Tapping number 12, ...
cluster 2: Tapping number 7, tapping number 9, ....
etc
答案1
得分: 1
您有62行观测和75000列/变量。这正确吗?不是62个变量。不清楚“Tapping number”是否是您的数据中的一列还是行号。以下是使用R中包含的iris
数据的示例:
data(iris) # 150行,4个数值变量,一个物种变量
iris.km <- kmeans(iris[, -5], 3, nstart=25) # 排除物种变量
fviz_cluster(iris.km, iris[, -5]) # 制作显示聚类的图表
split(rownames(iris), iris.km$cluster) # 按行名显示簇成员
# $`1`
# [1] "51" "52" "54" "55" "56" "57" "58" "59" "60" "61" "62" "63" "64" "65" "66" "67" "68" "69" "70" "71" "72" "73" "74" "75" "76" "77"
# [27] "79" "80" "81" "82" "83" "84" "85" "86" "87" "88" "89" "90" "91" "92" "93" "94" "95" "96" "97" "98" "99" "100" "102" "107" "114" "115"
# [53] "120" "122" "124" "127" "128" "134" "139" "143" "147" "150"
# $`2`
# [1] "53" "78" "101" "103" "104" "105" "106" "108" "109" "110" "111" "112" "113" "116" "117" "118" "119" "121" "123" "125" "126" "129" "130" "131" "132" "133"
# [27] "135" "136" "137" "138" "140" "141" "142" "144" "145" "146" "148" "149"
# $`3`
# [1] "1" "2" "3" "4" "5" "6" "7" "8" "9" "10" "11" "12" "13" "14" "15" "16" "17" "18" "19" "20" "21" "22" "23" "24" "25" "26" "27" "28" "29" "30" "31" "32"
# [33] "33" "34" "35" "36" "37" "38" "39" "40" "41" "42" "43" "44" "45" "46" "47" "48" "49" "50"
请让我知道如果您需要进一步的帮助。
英文:
You have 62 rows/observations and 75000 columns/variables. Is that correct? Not 62 variables. It is not clear if "Tapping number" is a column in your data or just the row number. Here is an example using the iris
data included in R:
data(iris) # 150 rows, 4 numeric variables, one species variable
iris.km <- kmeans(iris[, -5], 3, nstart=25) # Exclude species variable
fviz_cluster(iris.km, iris[, -5]) # Make a plot showing the clusters
split(rownames(iris), iris.km$cluster) # Show cluster membership by row name
# $`1`
# [1] "51" "52" "54" "55" "56" "57" "58" "59" "60" "61" "62" "63" "64" "65" "66" "67" "68" "69" "70" "71" "72" "73" "74" "75" "76" "77"
# [27] "79" "80" "81" "82" "83" "84" "85" "86" "87" "88" "89" "90" "91" "92" "93" "94" "95" "96" "97" "98" "99" "100" "102" "107" "114" "115"
# [53] "120" "122" "124" "127" "128" "134" "139" "143" "147" "150"
#
# $`2`
# [1] "53" "78" "101" "103" "104" "105" "106" "108" "109" "110" "111" "112" "113" "116" "117" "118" "119" "121" "123" "125" "126" "129" "130" "131" "132" "133"
# [27] "135" "136" "137" "138" "140" "141" "142" "144" "145" "146" "148" "149"
#
# $`3`
# [1] "1" "2" "3" "4" "5" "6" "7" "8" "9" "10" "11" "12" "13" "14" "15" "16" "17" "18" "19" "20" "21" "22" "23" "24" "25" "26" "27" "28" "29" "30" "31" "32"
# [33] "33" "34" "35" "36" "37" "38" "39" "40" "41" "42" "43" "44" "45" "46" "47" "48" "49" "50"
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论