2023年4月11日 02:31:42go评论108阅读模式

英文:

Reduced dimensions visualization for true vs predicted values

问题

我有一个数据框，看起来像这样：

标签预测 F1 F2 F3 .... F40
主要次要 2 1 4
主要主要 1 0 10
次要补丁 4 3 23
主要补丁 2 1 11
次要次要 0 4 8
补丁主要 7 3 30
补丁次要 8 0 1
补丁补丁 1 7 11

我的数据包括一个“标签”，它是“id”的真实标签（未显示，因为不相关），以及“预测”标签，以及大约40个特征。

想法是将这40个特征转化为2个维度，并将它们可视化为真实标签与预测标签。我们有三个标签“主要”，“次要”和“补丁”以及它们的预测，共9种情况。

使用PCA时，使用2个成分无法捕获很多方差，我不确定如何将PCA的值与原始数据框中的标签和预测进行映射。一个实现方法是将所有情况分成9个数据框并获得结果，但这不是我寻找的方式。

是否有其他方法可以减少和可视化给定的数据？任何建议都将非常感谢。

英文:

I have a dataframe which looks like this:

label    predicted     F1  F2   F3 .... F40
major     minor         2   1   4
major     major         1   0   10
minor     patch         4   3   23
major     patch         2   1   11
minor     minor         0   4   8
patch     major         7   3   30
patch     minor         8   0   1
patch     patch         1   7   11

I have label which is the true label for the id(not shown as it is not relevant), and predicted label, and then set of around 40 features in my df.

The idea is to transform these 40 features into 2 dimensions and visualize them true vs predicted. We have 9 cases for all the three labels major,minor and patch vs their predictions.

With PCA, it is not able to capture much variance with 2 components and I am not sure how to map the PCA values with the labels and predictions in the original df as a whole. A way to achieve this is to separate all cases into 9 dataframes and achieve the result, but this isn't what I am looking for.

Is there any other way I can reduce and visualize the given data? Any suggestions would be highly appreciated.

答案1

得分: 2

以下是代码部分的翻译：

# Mock up some predictions.
iris['species_pred'] = (40 * ['setosa'] + 5 * ['versicolor'] + 5 * ['virginica']
                        + 40 * ['versicolor'] + 5 * ['setosa'] + 5 * ['virginica']
                        + 40 * ['virginica'] + 5 * ['versicolor'] + 5 * ['setosa'])
# Show confusion matrix.
pd.crosstab(iris.species, iris.species_pred)

# Reduce features to two dimensions.
X = iris.iloc[:, :4].values
X_embedded = TSNE(n_components=2, init='random', learning_rate='auto'
                 ).fit_transform(X)
iris[['tsne_x', 'tsne_y']] = X_embedded
# Plot small multiples, corresponding to confusion matrix.
sns.set()
g = sns.FacetGrid(iris, row='species', col='species_pred', margin_titles=True)
g.map(sns.scatterplot, 'tsne_x', 'tsne_y');

请注意，以上只是代码部分的翻译。

英文:

You may want to consider a small multiple plot with one scatterplot for each cell of the confusion matrix.

If PCA does not work well, t-distributed stochastic neighbor embedding (TSNE) is often a good alternative in my experience.

For example, with the iris dataset, which also has three prediction classes, it could look like this:

import pandas as pd
import seaborn as sns
from sklearn.manifold import TSNE
iris = sns.load_dataset(&#39;iris&#39;)
# Mock up some predictions.
iris[&#39;species_pred&#39;] = (40 * [&#39;setosa&#39;] + 5 * [&#39;versicolor&#39;] + 5 * [&#39;virginica&#39;]
                        + 40 * [&#39;versicolor&#39;] + 5 * [&#39;setosa&#39;] + 5 * [&#39;virginica&#39;]
                        + 40 * [&#39;virginica&#39;] + 5 * [&#39;versicolor&#39;] + 5 * [&#39;setosa&#39;])
# Show confusion matrix.
pd.crosstab(iris.species, iris.species_pred)

  species_pred 	setosa 	versicolor 	virginica
species 			
setosa 	            40 	         5 	        5
versicolor 	         5    	    40 	        5
virginica            5 	         5 	       40

# Reduce features to two dimensions.
X = iris.iloc[:, :4].values
X_embedded = TSNE(n_components=2, init=&#39;random&#39;, learning_rate=&#39;auto&#39;
                 ).fit_transform(X)
iris[[&#39;tsne_x&#39;, &#39;tsne_y&#39;]] = X_embedded
# Plot small multiples, corresponding to confusion matrix.
sns.set()
g = sns.FacetGrid(iris, row=&#39;species&#39;, col=&#39;species_pred&#39;, margin_titles=True)
g.map(sns.scatterplot, &#39;tsne_x&#39;, &#39;tsne_y&#39;);

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

减小维度的可视化，用于真实值与预测值。

问题

答案1

打开一个存储在变量中的Python文件。

获取特定列中的最后一项在tkinter python中的方法是什么？

获取函数外的局部变量值的所有数值。

如何修复 Python 正则表达式的 if 语句？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。