英文:
How to save xy coordinates from a fviz_nbclust plot in R
问题
I have a graph that takes over 20 minutes to generate in R. It is an 'elbow plot' from the fviz_nbclust package. Is there a way to save the x and y coordinates from the graph to manipulate further in a ggplot graph instead of waiting 20 minutes every time I need to edit/format the graph?
e.g. using the USA Arrests data set in R
#load data
df <- USArrests
#remove rows with missing values
df <- na.omit(df)
#scale each variable to have a mean of 0 and sd of 1
df <- scale(df)
library(cluster)
library(factoextra)
#create plot of number of clusters vs total within sum of squares
fviz_nbclust(df, kmeans, method = "wss")
#Save coordinates of graph to be plotted on a ggplot graph here...
英文:
I have a graph that takes over 20 minutes to generate in R. It is an 'elbow plot' from the fviz_nbclust package. Is there a way to save the x and y coordinates from the graph to manipulate further in a ggplot graph instead of waiting 20 minutes every time I need to edit/format the graph?
e.g. using the USA Arrests data set in R
#load data
df <- USArrests
#remove rows with missing values
df <- na.omit(df)
#scale each variable to have a mean of 0 and sd of 1
df <- scale(df)
library(cluster)
library(factoextra)
#create plot of number of clusters vs total within sum of squares
fviz_nbclust(df, kmeans, method = "wss")
#Save coordinates of graph to be plotted on a ggplot graph here...
答案1
得分: 2
尝试这个 - 如果我理解正确,您想要访问图中的数据点(x:簇的数量,y:twss)。所以,只需将图保存在一个变量中,然后访问数据框架:
library(ggplot2)
library(cluster)
library(factoextra)
# 载入数据
df <- USArrests
# 删除包含缺失值的行
df <- na.omit(df)
# 将每个变量缩放到均值为0,标准差为1
df <- scale(df)
fcl <- fviz_nbclust(df, kmeans, method = "wss")
fcl$data
# clusters y
#1 1 196.00000
#2 2 102.86240
#3 3 78.32327
#4 4 56.40317
#5 5 70.83569
#6 6 45.30784
#7 7 39.03188
#8 8 39.00701
#9 9 32.24437
#10 10 28.69826
# 使用 ggplot 创建简单的图
ggplot(fcl$data, aes(clusters, y, group=1)) + geom_point() + geom_line() + labs(title="寻找最佳簇数的肘部图", x="簇的数量", y="总内部平方和")
英文:
Try this - If I get it correctly you want to access data points in the plot (x: number of clusters, and y: twss). So, just save the plot in a variable and access the data dataframe:
library(ggplot2)
library(cluster)
library(factoextra)
#load data
df <- USArrests
#remove rows with missing values
df <- na.omit(df)
#scale each variable to have a mean of 0 and sd of 1
df <- scale(df)
fcl <- fviz_nbclust(df, kmeans, method = "wss")
fcl$data
# clusters y
#1 1 196.00000
#2 2 102.86240
#3 3 78.32327
#4 4 56.40317
#5 5 70.83569
#6 6 45.30784
#7 7 39.03188
#8 8 39.00701
#9 9 32.24437
#10 10 28.69826
# simple plot using ggplot
ggplot(fcl$data, aes(clusters, y, group=1)) + geom_point() + geom_line() + labs(title="Elbow plot to find the optimal number of clusters", x="number of clusters", y="total within sum of square")
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论