减小维度的可视化,用于真实值与预测值。

huangapple go评论68阅读模式
英文:

Reduced dimensions visualization for true vs predicted values

问题

我有一个数据框,看起来像这样:

标签 预测 F1 F2 F3 .... F40
主要 次要 2 1 4
主要 主要 1 0 10
次要 补丁 4 3 23
主要 补丁 2 1 11
次要 次要 0 4 8
补丁 主要 7 3 30
补丁 次要 8 0 1
补丁 补丁 1 7 11

我的数据包括一个“标签”,它是“id”的真实标签(未显示,因为不相关),以及“预测”标签,以及大约40个特征。

想法是将这40个特征转化为2个维度,并将它们可视化为真实标签与预测标签。我们有三个标签“主要”,“次要”和“补丁”以及它们的预测,共9种情况。

使用PCA时,使用2个成分无法捕获很多方差,我不确定如何将PCA的值与原始数据框中的标签和预测进行映射。一个实现方法是将所有情况分成9个数据框并获得结果,但这不是我寻找的方式。

是否有其他方法可以减少和可视化给定的数据?任何建议都将非常感谢。

英文:

I have a dataframe which looks like this:

label    predicted     F1  F2   F3 .... F40
major     minor         2   1   4
major     major         1   0   10
minor     patch         4   3   23
major     patch         2   1   11
minor     minor         0   4   8
patch     major         7   3   30
patch     minor         8   0   1
patch     patch         1   7   11

I have label which is the true label for the id(not shown as it is not relevant), and predicted label, and then set of around 40 features in my df.

The idea is to transform these 40 features into 2 dimensions and visualize them true vs predicted. We have 9 cases for all the three labels major,minor and patch vs their predictions.

With PCA, it is not able to capture much variance with 2 components and I am not sure how to map the PCA values with the labels and predictions in the original df as a whole. A way to achieve this is to separate all cases into 9 dataframes and achieve the result, but this isn't what I am looking for.

Is there any other way I can reduce and visualize the given data? Any suggestions would be highly appreciated.

答案1

得分: 2

以下是代码部分的翻译:

# Mock up some predictions.
iris['species_pred'] = (40 * ['setosa'] + 5 * ['versicolor'] + 5 * ['virginica']
                        + 40 * ['versicolor'] + 5 * ['setosa'] + 5 * ['virginica']
                        + 40 * ['virginica'] + 5 * ['versicolor'] + 5 * ['setosa'])

# Show confusion matrix.
pd.crosstab(iris.species, iris.species_pred)
# Reduce features to two dimensions.
X = iris.iloc[:, :4].values
X_embedded = TSNE(n_components=2, init='random', learning_rate='auto'
                 ).fit_transform(X)
iris[['tsne_x', 'tsne_y']] = X_embedded

# Plot small multiples, corresponding to confusion matrix.
sns.set()
g = sns.FacetGrid(iris, row='species', col='species_pred', margin_titles=True)
g.map(sns.scatterplot, 'tsne_x', 'tsne_y');

请注意,以上只是代码部分的翻译。

英文:

You may want to consider a small multiple plot with one scatterplot for each cell of the confusion matrix.

If PCA does not work well, t-distributed stochastic neighbor embedding (TSNE) is often a good alternative in my experience.

For example, with the iris dataset, which also has three prediction classes, it could look like this:

import pandas as pd
import seaborn as sns
from sklearn.manifold import TSNE

iris = sns.load_dataset('iris')

# Mock up some predictions.
iris['species_pred'] = (40 * ['setosa'] + 5 * ['versicolor'] + 5 * ['virginica']
                        + 40 * ['versicolor'] + 5 * ['setosa'] + 5 * ['virginica']
                        + 40 * ['virginica'] + 5 * ['versicolor'] + 5 * ['setosa'])

# Show confusion matrix.
pd.crosstab(iris.species, iris.species_pred)
  species_pred 	setosa 	versicolor 	virginica
species 			
setosa 	            40 	         5 	        5
versicolor 	         5    	    40 	        5
virginica            5 	         5 	       40
# Reduce features to two dimensions.
X = iris.iloc[:, :4].values
X_embedded = TSNE(n_components=2, init='random', learning_rate='auto'
                 ).fit_transform(X)
iris[['tsne_x', 'tsne_y']] = X_embedded

# Plot small multiples, corresponding to confusion matrix.
sns.set()
g = sns.FacetGrid(iris, row='species', col='species_pred', margin_titles=True)
g.map(sns.scatterplot, 'tsne_x', 'tsne_y');

减小维度的可视化,用于真实值与预测值。

huangapple
  • 本文由 发表于 2023年4月11日 02:31:42
  • 转载请务必保留本文链接:https://go.coder-hub.com/75979711.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定