2023年4月16日 23:55:35go评论102阅读模式

英文:

Biplots in matrix format using pca

问题

这是我的数据框的一部分：

    物种       喙长（毫米）   喙深（毫米）    脚蹼长（毫米）     体重（克）      预测物种
0    阿德利        18                   18             181                   3750                企鹅
1    阿德利        17                   17             186                   3800                阿德利
2    阿德利        18                   18             195                   3250                金图
3    阿德利        0                     0               0                       0                   阿德利
4    企鹅          19                   19             193                   3450                企鹅
5    企鹅          20                   20             190                   3650                金图
6    企鹅          17                   17             181                   3625                阿德利
7    金图        19                   19             195                   4675                企鹅
8    金图        18                   18             193                   3475                金图
9    金图        20                   20             190                   4250                金图

我想为我的数据制作一个双标图，类似于这样：

但我想为每个 物种 对 预测物种 的矩阵制作一个双标图，所以有 9 个子图，与上面的相同，我不确定如何实现这一点。一种方法可能是将数据拆分为数据框，并为每个数据框制作一个双标图，但这并不是很高效，也不容易比较。

有人能提供一些建议，说明如何完成这个任务吗？

英文:

This is a snippet of my dataframe:

	species	bill_length_mm	bill_depth_mm	flipper_length_mm     body_mass_g	predicted_species
0	Adelie	     18	                  18	     181	         3750	             Chinstrap
1	Adelie	     17	                  17	     186	         3800	             Adelie
2	Adelie	     18	                  18	     195	         3250	             Gentoo
3	Adelie	     0	                  0	          0	              0	                 Adelie
4	Chinstrap	 19	                  19	     193	         3450	             Chinstrap
5	Chinstrap    20	                  20	     190	         3650	             Gentoo
6	Chinstrap	 17	                  17	     181	         3625	             Adelie
7	Gentoo	     19	                  19	     195	         4675	             Chinstrap
8	Gentoo	     18	                  18	     193	         3475	             Gentoo
9	Gentoo	     20	                  20	     190	         4250	             Gentoo

I want to make a biplot for my data, which would be something like this:

But I want to make a biplot for every species vs predicted_species matrix, so 9 subplots,same as above, I am not sure how that can be achieved. One way could be to split into dataframes, and make a biplot for each, but that isn't very efficient and difficult for comparison.

Can anyone provide some suggestions on how this could be done?

答案1

得分: 1

以下是您提供的代码的翻译部分：

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
# 载入鸢尾花数据集。
iris = sns.load_dataset('iris')
X = iris.iloc[:, :4].values
y = iris.iloc[:, 4].values
features = iris.columns[:4]
targets = ['setosa', 'versicolor', 'virginica']
# 模拟一些预测。
iris['species_pred'] = (40 * ['setosa'] + 5 * ['versicolor'] + 5 * ['virginica']
                        + 40 * ['versicolor'] + 5 * ['setosa'] + 5 * ['virginica']
                        + 40 * ['virginica'] + 5 * ['versicolor'] + 5 * ['setosa'])
# 将特征降维到两个维度。
X_scaled = StandardScaler().fit_transform(X)
pca = PCA(n_components=2).fit(X_scaled)
X_reduced = pca.transform(X_scaled)
iris[['pc1', 'pc2']] = X_reduced
def biplot(x, y, data=None, **kwargs):
    # 绘制数据点。
    sns.scatterplot(data=data, x=x, y=y, **kwargs)
    
    # 计算箭头参数。
    loadings = pca.components_[:2].T
    pvars = pca.explained_variance_ratio_[:2] * 100
    arrows = loadings * np.ptp(X_reduced, axis=0)
    width = -0.0075 * np.min([np.subtract(*plt.xlim()), np.subtract(*plt.ylim())])
    # 绘制箭头。
    horizontal_alignment = ['right', 'left', 'right', 'right']
    vertical_alignment = ['bottom', 'top', 'top', 'bottom']
    for (i, arrow), ha, va in zip(enumerate(arrows), 
                                  horizontal_alignment, vertical_alignment):
        plt.arrow(0, 0, *arrow, color='k', alpha=0.5, width=width, ec='none',
                  length_includes_head=True)
        plt.text(*(arrow * 1.05), features[i], ha=ha, va=va, 
                 fontsize='small', color='gray')
# 绘制小图，对应于混淆矩阵。
sns.set()
g = sns.FacetGrid(iris, row='species', col='species_pred', 
                  hue='species', margin_titles=True)
g.map(biplot, 'pc1', 'pc2')
plt.show()

我已经为您翻译了代码部分，您可以使用这个翻译来理解代码的内容。如果您有任何其他问题，请随时提出。

英文:

Combining the answer by Qiyun Zhu on how to plot a biplot with my answer on how to split the plot into the true vs. predicted subsets, you could do it like this:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
# Load iris data.
iris = sns.load_dataset(&#39;iris&#39;)
X = iris.iloc[:, :4].values
y = iris.iloc[:, 4].values
features = iris.columns[:4]
targets = [&#39;setosa&#39;, &#39;versicolor&#39;, &#39;virginica&#39;]
# Mock up some predictions.
iris[&#39;species_pred&#39;] = (40 * [&#39;setosa&#39;] + 5 * [&#39;versicolor&#39;] + 5 * [&#39;virginica&#39;]
                        + 40 * [&#39;versicolor&#39;] + 5 * [&#39;setosa&#39;] + 5 * [&#39;virginica&#39;]
                        + 40 * [&#39;virginica&#39;] + 5 * [&#39;versicolor&#39;] + 5 * [&#39;setosa&#39;])
# Reduce features to two dimensions.
X_scaled = StandardScaler().fit_transform(X)
pca = PCA(n_components=2).fit(X_scaled)
X_reduced = pca.transform(X_scaled)
iris[[&#39;pc1&#39;, &#39;pc2&#39;]] = X_reduced
def biplot(x, y, data=None, **kwargs):
    # Plot data points.
    sns.scatterplot(data=data, x=x, y=y, **kwargs)
    
    # Calculate arrow parameters.
    loadings = pca.components_[:2].T
    pvars = pca.explained_variance_ratio_[:2] * 100
    arrows = loadings * np.ptp(X_reduced, axis=0)
    width = -0.0075 * np.min([np.subtract(*plt.xlim()), np.subtract(*plt.ylim())])
    # Plot arrows.
    horizontal_alignment = [&#39;right&#39;, &#39;left&#39;, &#39;right&#39;, &#39;right&#39;]
    vertical_alignment = [&#39;bottom&#39;, &#39;top&#39;, &#39;top&#39;, &#39;bottom&#39;]
    for (i, arrow), ha, va in zip(enumerate(arrows), 
                                  horizontal_alignment, vertical_alignment):
        plt.arrow(0, 0, *arrow, color=&#39;k&#39;, alpha=0.5, width=width, ec=&#39;none&#39;,
                  length_includes_head=True)
        plt.text(*(arrow * 1.05), features[i], ha=ha, va=va, 
                 fontsize=&#39;small&#39;, color=&#39;gray&#39;)
    
# Plot small multiples, corresponding to confusion matrix.
sns.set()
g = sns.FacetGrid(iris, row=&#39;species&#39;, col=&#39;species_pred&#39;, 
                  hue=&#39;species&#39;, margin_titles=True)
g.map(biplot, &#39;pc1&#39;, &#39;pc2&#39;)
plt.show()

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Biplots 使用主成分分析的矩阵格式

问题

答案1

pandas将一组API函数应用于多个数据框。

使用psd-tools更改文本图层

Python sqlalchemy获取一个用户的所有评分

Conda环境不再隔离。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。