英文:
Adding lines to connect separate cluster in a chart
问题
I saw this neat principal component analysis graph online, where they had lines connecting each cluster to a center point. 我看到了一个在线上的主成分分析图,他们在每个簇与中心点之间有连接线。
I used an example data set to show that I have made it up to adding the ellipses, but after looking online, I think this PCA package currently doesn't have the ability to add these, and in some cases, it is called as a "star". Is there a way to somehow loophole around and add this into a PCA chart? 我使用一个示例数据集来展示我已经能够添加椭圆,但在查看在线内容后,我认为这个PCA包目前没有添加这些的能力,在某些情况下,它被称为"星"。是否有办法绕过这个问题并将其添加到PCA图表中?
I have added some sample code below that gets up to the part that doesn't have the lines connecting. Suggestions on this would be great please. My last thought is maybe using ggforce or something along those lines? 我在下面添加了一些示例代码,它可以执行到没有连接线的部分。请提供一些建议,这将非常有帮助。我最后的想法是也许使用ggforce或类似的东西?
Some comments have suggested these sites, but while it is close, it is a bit different since I am trying to use a data frame with one of the columns being that of the different categories I hope to use for the different clusterings. 一些评论建议了这些网站,但虽然接近,但有点不同,因为我试图使用一个包含不同类别的列的数据框进行不同的聚类。
Any possible suggestions would be much appreciated please. 任何可能的建议将不胜感激。
英文:
I saw this neat principal component analysis graph online, where they had lines connecting each cluster to a center point.
I used an example data set to show that I have made it up to adding the ellipses, but after looking online, I think this PCA package currently doesnt have the ability to add these, and in some cases, it is called as a "star". Is there a way to somehow loophole around and add this into a PCA chart?
I have added some sample code below that gets up the the part that doesn't have the lines connecting. Suggestions on this would be great please. My last thought is maybe using ggforce or something along those lines?
library(factoextra)
data(iris)
res.pca <- prcomp(iris[,-5], scale=TRUE)
fviz_pca_ind(res.pca, label="none", alpha.ind=1, pointshape=19,habillage=iris$Species, addEllipses = TRUE, ellipse.level=0.95)
Some comments have suggested these sites, but while it is close, it is a bit different since I am trying to use a data frame with one of the columns being that of the different categories I hope to use for the different clusterings.
Any possible suggestions would be much appreciated please.
答案1
得分: 1
A quick and dirty hack is to create an edges df out of the ggplot data inside the output from fviz_pca_ind()
, and then plot it with geom_segment()
.
Note that this might be visually sub-optimal because you often need the edges to be drawn before the nodes in order to highlight (i.e. not hide) the position of the latter. But barring a rewrite of df_raw_pca_viz
and the fviz
plotting functions, this is a quick way to get what you asked.
Try:
library(factoextra)
library(purrr)
library(dplyr)
data(iris)
res.pca <- prcomp(iris[,-5], scale=TRUE)
g1 <- fviz_pca_ind(res.pca, label="none", alpha.ind=1, pointshape=19,habillage=iris$Species, addEllipses = TRUE, ellipse.level=0.95)
df_edges <-
pluck(g1, "data") %>%
as_tibble() %>%
group_by(Groups) %>%
summarise(xend = mean(x), yend = mean(y)) %>%
left_join(y = pluck(g1, "data"),
by = "Groups",
multiple = "all")
g1 +
geom_segment(data = df_edges, aes(xend = xend, yend = yend, x = x, y = y, colour = Groups), alpha = 0.25)
1: https://i.stack.imgur.com/PpQKz.png
英文:
A quick and dirty hack is to create an edges df out of the ggplot data inside the output from fviz_pca_ind()
, and then plot it with geom_segment()
.
Note that this might be visually sub-optimal because you often need the edges to be drawn before the nodes in order to highlight (i.e. not hide) the position of the latter. But barring a rewrite of df_raw_pca_viz
and the fviz
plotting functions, this is a a quick way to get what you asked.
Try:
library(factoextra)
library(purrr)
library(dplyr)
data(iris)
res.pca <- prcomp(iris[,-5], scale=TRUE)
g1 <- fviz_pca_ind(res.pca, label="none", alpha.ind=1, pointshape=19,habillage=iris$Species, addEllipses = TRUE, ellipse.level=0.95)
df_edges <-
pluck(g1, "data") |> as_tibble() |>
group_by(Groups) %>%
summarise(xend = mean(x), yend = mean(y)) |>
left_join(y = pluck(g1, "data"),
by = "Groups",
multiple = "all")
g1 +
geom_segment(data = df_edges, aes(xend = xend, yend = yend, x = x, y = y, colour = Groups), alpha = 0.25)
答案2
得分: 0
最近,我开发了一个用户友好的R包,名为"GABB",用于执行简单且美观的PCA,包括从数据点到已识别组的重心的段。请查看以下示例,使用mtcars数据集,并告诉我:
library(GABB)
## 使用基础数据集"mtcars"的GABB包流程示例
my.data <- mtcars
## 为RDA和PCA准备数据:转换和缩放数值/定量变量
prep_data(data = my.data, quantitative_columns = c(1:7), transform_data_method = "log", scale_data = TRUE)
## 创建PCA
library(FactoMineR)
my.pca <- FactoMineR::PCA(X = data_quant)
## 创建、显示和保存个体和变量PCA的图形输出
# 最基本的输出,仅包含最低要求的参数
PCA_RDA_graphics(complete.data.set = initial_data_with_quant_transformed, PCA.object = my.pca, factor.names = c("vs", "am", "gear", "carb"))
# 高级输出(下面的图像)
PCA_RDA_graphics(complete.data.set = initial_data_with_quant_transformed, PCA.object = my.pca,
factor.names = c("vs", "am", "gear", "carb"), Biplot.PCA = TRUE, col.arrow.var.PCA = "grey",
Barycenter = TRUE, Segments = TRUE, Ellipse.IC = TRUE,
Barycenter.Ellipse.Fac1 = "vs", Barycenter.Ellipse.Fac2 = "am",
factor.colors = "vs", factor.shapes = "am",
Barycenter.factor.col = "vs", Barycenter.factor.shape = "am")
英文:
Recently I developped a user friendly R package named "GABB", to perform simple and nice PCA, including segment from data point to barycenter of identified groups. Check the following example with mtcars data set and let me know if you :
library(GABB)
## Example of GABB package pipeline with the base data.set "mtcars"
my.data <- mtcars
## Data preparation for RDA and PCA : tranformation and scaling of numeric/quantitative variables
prep_data(data = my.data, quantitative_columns = c(1:7), transform_data_method = "log", scale_data = T)
## Create PCA
library(FactoMineR)
my.pca <- FactoMineR::PCA(X = data_quant)
## Create, display and save graphic output of individual and variable PCA
#Basic output with minimum required parameters
PCA_RDA_graphics(complete.data.set = initial_data_with_quant_transformed, PCA.object = my.pca, factor.names = c("vs", "am", "gear", "carb"))
#Advanced outputs (image below)
PCA_RDA_graphics(complete.data.set = initial_data_with_quant_transformed, PCA.object = my.pca,
factor.names = c("vs", "am", "gear", "carb"), Biplot.PCA = TRUE,col.arrow.var.PCA = "grey",
Barycenter = TRUE, Segments = TRUE, Ellipse.IC = TRUE,
Barycenter.Ellipse.Fac1 = "vs", Barycenter.Ellipse.Fac2 = "am",
factor.colors = "vs", factor.shapes = "am",
Barycenter.factor.col = "vs", Barycenter.factor.shape = "am")
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论