英文:
Biplots in matrix format using pca
问题
这是我的数据框的一部分:
物种 喙长(毫米) 喙深(毫米) 脚蹼长(毫米) 体重(克) 预测物种
0 阿德利 18 18 181 3750 企鹅
1 阿德利 17 17 186 3800 阿德利
2 阿德利 18 18 195 3250 金图
3 阿德利 0 0 0 0 阿德利
4 企鹅 19 19 193 3450 企鹅
5 企鹅 20 20 190 3650 金图
6 企鹅 17 17 181 3625 阿德利
7 金图 19 19 195 4675 企鹅
8 金图 18 18 193 3475 金图
9 金图 20 20 190 4250 金图
但我想为每个 物种
对 预测物种
的矩阵制作一个双标图,所以有 9 个子图,与上面的相同,我不确定如何实现这一点。一种方法可能是将数据拆分为数据框,并为每个数据框制作一个双标图,但这并不是很高效,也不容易比较。
有人能提供一些建议,说明如何完成这个任务吗?
英文:
This is a snippet of my dataframe:
species bill_length_mm bill_depth_mm flipper_length_mm body_mass_g predicted_species
0 Adelie 18 18 181 3750 Chinstrap
1 Adelie 17 17 186 3800 Adelie
2 Adelie 18 18 195 3250 Gentoo
3 Adelie 0 0 0 0 Adelie
4 Chinstrap 19 19 193 3450 Chinstrap
5 Chinstrap 20 20 190 3650 Gentoo
6 Chinstrap 17 17 181 3625 Adelie
7 Gentoo 19 19 195 4675 Chinstrap
8 Gentoo 18 18 193 3475 Gentoo
9 Gentoo 20 20 190 4250 Gentoo
I want to make a biplot for my data, which would be something like this:
But I want to make a biplot for every species
vs predicted_species
matrix, so 9 subplots,same as above, I am not sure how that can be achieved. One way could be to split into dataframes, and make a biplot for each, but that isn't very efficient and difficult for comparison.
Can anyone provide some suggestions on how this could be done?
答案1
得分: 1
以下是您提供的代码的翻译部分:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
# 载入鸢尾花数据集。
iris = sns.load_dataset('iris')
X = iris.iloc[:, :4].values
y = iris.iloc[:, 4].values
features = iris.columns[:4]
targets = ['setosa', 'versicolor', 'virginica']
# 模拟一些预测。
iris['species_pred'] = (40 * ['setosa'] + 5 * ['versicolor'] + 5 * ['virginica']
+ 40 * ['versicolor'] + 5 * ['setosa'] + 5 * ['virginica']
+ 40 * ['virginica'] + 5 * ['versicolor'] + 5 * ['setosa'])
# 将特征降维到两个维度。
X_scaled = StandardScaler().fit_transform(X)
pca = PCA(n_components=2).fit(X_scaled)
X_reduced = pca.transform(X_scaled)
iris[['pc1', 'pc2']] = X_reduced
def biplot(x, y, data=None, **kwargs):
# 绘制数据点。
sns.scatterplot(data=data, x=x, y=y, **kwargs)
# 计算箭头参数。
loadings = pca.components_[:2].T
pvars = pca.explained_variance_ratio_[:2] * 100
arrows = loadings * np.ptp(X_reduced, axis=0)
width = -0.0075 * np.min([np.subtract(*plt.xlim()), np.subtract(*plt.ylim())])
# 绘制箭头。
horizontal_alignment = ['right', 'left', 'right', 'right']
vertical_alignment = ['bottom', 'top', 'top', 'bottom']
for (i, arrow), ha, va in zip(enumerate(arrows),
horizontal_alignment, vertical_alignment):
plt.arrow(0, 0, *arrow, color='k', alpha=0.5, width=width, ec='none',
length_includes_head=True)
plt.text(*(arrow * 1.05), features[i], ha=ha, va=va,
fontsize='small', color='gray')
# 绘制小图,对应于混淆矩阵。
sns.set()
g = sns.FacetGrid(iris, row='species', col='species_pred',
hue='species', margin_titles=True)
g.map(biplot, 'pc1', 'pc2')
plt.show()
我已经为您翻译了代码部分,您可以使用这个翻译来理解代码的内容。如果您有任何其他问题,请随时提出。
英文:
Combining the answer by Qiyun Zhu on how to plot a biplot with my answer on how to split the plot into the true vs. predicted subsets, you could do it like this:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
# Load iris data.
iris = sns.load_dataset('iris')
X = iris.iloc[:, :4].values
y = iris.iloc[:, 4].values
features = iris.columns[:4]
targets = ['setosa', 'versicolor', 'virginica']
# Mock up some predictions.
iris['species_pred'] = (40 * ['setosa'] + 5 * ['versicolor'] + 5 * ['virginica']
+ 40 * ['versicolor'] + 5 * ['setosa'] + 5 * ['virginica']
+ 40 * ['virginica'] + 5 * ['versicolor'] + 5 * ['setosa'])
# Reduce features to two dimensions.
X_scaled = StandardScaler().fit_transform(X)
pca = PCA(n_components=2).fit(X_scaled)
X_reduced = pca.transform(X_scaled)
iris[['pc1', 'pc2']] = X_reduced
def biplot(x, y, data=None, **kwargs):
# Plot data points.
sns.scatterplot(data=data, x=x, y=y, **kwargs)
# Calculate arrow parameters.
loadings = pca.components_[:2].T
pvars = pca.explained_variance_ratio_[:2] * 100
arrows = loadings * np.ptp(X_reduced, axis=0)
width = -0.0075 * np.min([np.subtract(*plt.xlim()), np.subtract(*plt.ylim())])
# Plot arrows.
horizontal_alignment = ['right', 'left', 'right', 'right']
vertical_alignment = ['bottom', 'top', 'top', 'bottom']
for (i, arrow), ha, va in zip(enumerate(arrows),
horizontal_alignment, vertical_alignment):
plt.arrow(0, 0, *arrow, color='k', alpha=0.5, width=width, ec='none',
length_includes_head=True)
plt.text(*(arrow * 1.05), features[i], ha=ha, va=va,
fontsize='small', color='gray')
# Plot small multiples, corresponding to confusion matrix.
sns.set()
g = sns.FacetGrid(iris, row='species', col='species_pred',
hue='species', margin_titles=True)
g.map(biplot, 'pc1', 'pc2')
plt.show()
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论