颜色区域在散点图中。

huangapple go评论147阅读模式
英文:

Color Regions in a Scatter Plot

问题

我最近发现在Orange中可以为散点图创建彩色区域。我知道Orange是基于Python构建的,所以我想我应该能够重新创建这个功能,但我遇到了一些困难。我还没有弄清楚如何将pandas数据框转换为Orange格式。更重要的是,我正在一个Spark环境中工作,所以如果我能从pyspark转到Orange会更好。

我已经在seaborn和matplotlib中设置了一个基本的散点图,以查看是否能够弄清楚它。

  1. import seaborn as sns
  2. import matplotlib.pyplot as plt
  3. # 从Seaborn加载Iris数据集
  4. iris = sns.load_dataset("iris")
  5. # 创建散点图
  6. sns.scatterplot(x="sepal_length", y="petal_width", hue="species", data=iris)
  7. # 添加标签和标题
  8. plt.xlabel("Sepal Length")
  9. plt.ylabel("Petal Width")
  10. plt.title("Scatter Plot of Sepal Length vs. Petal Width")
  11. # 显示图形
  12. plt.legend()
  13. plt.show()

颜色区域在散点图中。

  1. <details>
  2. <summary>英文:</summary>
  3. I recently found out that you can create color regions for scatter plots in Orange. I know Orange sits on top of python, so I figured I&#39;d be able to recreate this, but I&#39;m having a hard time. I haven&#39;t figured out how to convert a pandas dataframe for orange. More importantly, I&#39;m working in a spark environment, so if I could go from pyspark to orange that would be better.
  4. I&#39;ve set up a basic scatter plot in both seaborn and matplotlib to see if I could figure it out.

import seaborn as sns
import matplotlib.pyplot as plt

Load the Iris dataset from Seaborn

iris = sns.load_dataset("iris")

Create a scatter plot

sns.scatterplot(x="sepal_length", y="petal_width", hue="species", data=iris)

Add labels and title

plt.xlabel("Sepal Length")
plt.ylabel("Petal Width")
plt.title("Scatter Plot of Sepal Length vs. Petal Width")

Show the plot

plt.legend()
plt.show()

  1. [![enter image description here][1]][1]
  2. [1]: https://i.stack.imgur.com/om4pt.png
  3. </details>
  4. # 答案1
  5. **得分**: 1
  6. 根据[Orange文档](https://orange3.readthedocs.io/projects/orange-visual-programming/en/latest/widgets/visualize/scatterplot.html#intelligent-data-visualization):
  7. &gt; 如果在颜色部分选择了一个分类变量,得分计算如下。对于每个数据实例,该方法在投影的2D空间中找到10个最近的邻居,即属性对的组合上。然后检查其中有多少个具有相同的颜色。然后,投影的总得分是具有相同颜色的邻居的平均数。
  8. 您可以使用scikit-learnk最近邻分类器获得类似的结果。在[它们的文档](https://scikit-learn.org/stable/auto_examples/neighbors/plot_classification.html)中有一个使用鸢尾花数据集的示例。
  9. 我已经修改了这个示例以使其更类似于您分享的截图:
  10. ```python
  11. import matplotlib.pyplot as plt
  12. import seaborn as sns
  13. from matplotlib.colors import ListedColormap
  14. from sklearn import datasets, neighbors
  15. from sklearn.inspection import DecisionBoundaryDisplay
  16. n_neighbors = 10
  17. # 导入鸢尾花数据集
  18. iris = datasets.load_iris()
  19. # 选择特征
  20. features = [2, 3]
  21. X = iris.data[:, features]
  22. y = iris.target
  23. # 创建颜色映射
  24. cmap_light = ListedColormap(["blue", "red", "green"])
  25. cmap_bold = ["blue", "red", "green"]
  26. # 创建一个最近邻分类器的实例并拟合数据。
  27. clf = neighbors.KNeighborsClassifier(n_neighbors, weights="distance")
  28. clf.fit(X, y)
  29. # 绘制边界
  30. _, ax = plt.subplots()
  31. DecisionBoundaryDisplay.from_estimator(
  32. clf,
  33. X,
  34. cmap=cmap_light,
  35. ax=ax,
  36. response_method="predict",
  37. plot_method="pcolormesh",
  38. xlabel=iris.feature_names[features[0]],
  39. ylabel=iris.feature_names[features[1]],
  40. shading="auto",
  41. alpha=0.3,
  42. )
  43. # 绘制训练点
  44. sns.scatterplot(
  45. x=X[:, 0],
  46. y=X[:, 1],
  47. hue=iris.target_names[y],
  48. palette=cmap_bold,
  49. alpha=1.0,
  50. edgecolor="black",
  51. )

这是结果图:

颜色区域在散点图中。

英文:

According to the Orange Documentation:

> If a categorical variable is selected in the Color section, the score is computed as follows. For each data instance, the method finds 10 nearest neighbors in the projected 2D space, that is, on the combination of attribute pairs. It then checks how many of them have the same color. The total score of the projection is then the average number of same-colored neighbors.

You can get similar results using scikit-learn's k nearest neighbour classifier. There is an example in their docs that uses the iris dataset as well.

I've modified this example to be more similar to the screenshot you shared:

  1. import matplotlib.pyplot as plt
  2. import seaborn as sns
  3. from matplotlib.colors import ListedColormap
  4. from sklearn import datasets, neighbors
  5. from sklearn.inspection import DecisionBoundaryDisplay
  6. n_neighbors = 10
  7. # import iris dataset
  8. iris = datasets.load_iris()
  9. # Select features
  10. features = [2, 3]
  11. X = iris.data[:, features]
  12. y = iris.target
  13. # Create color maps
  14. cmap_light = ListedColormap([&quot;blue&quot;, &quot;red&quot;, &quot;green&quot;])
  15. cmap_bold = [&quot;blue&quot;, &quot;red&quot;, &quot;green&quot;]
  16. # we create an instance of Neighbours Classifier and fit the data.
  17. clf = neighbors.KNeighborsClassifier(n_neighbors, weights=&quot;distance&quot;)
  18. clf.fit(X, y)
  19. # Plot boundaries
  20. _, ax = plt.subplots()
  21. DecisionBoundaryDisplay.from_estimator(
  22. clf,
  23. X,
  24. cmap=cmap_light,
  25. ax=ax,
  26. response_method=&quot;predict&quot;,
  27. plot_method=&quot;pcolormesh&quot;,
  28. xlabel=iris.feature_names[features[0]],
  29. ylabel=iris.feature_names[features[1]],
  30. shading=&quot;auto&quot;,
  31. alpha=0.3,
  32. )
  33. # Plot training points
  34. sns.scatterplot(
  35. x=X[:, 0],
  36. y=X[:, 1],
  37. hue=iris.target_names[y],
  38. palette=cmap_bold,
  39. alpha=1.0,
  40. edgecolor=&quot;black&quot;,
  41. )

This is the result:

颜色区域在散点图中。

答案2

得分: 1

以下是翻译好的代码部分:

  1. from sklearn.neighbors import KNeighborsClassifier
  2. from sklearn.datasets import load_iris
  3. import pandas as pd
  4. import numpy as np
  5. import matplotlib.pyplot as plt
  6. import matplotlib
  7. #
  8. #加载数据
  9. #
  10. iris = load_iris(as_frame=True)
  11. iris_x = iris.data
  12. iris_y = iris.target
  13. iris_x.columns = [col.capitalize()[:-5] for col in iris_x.columns]
  14. #
  15. #为每个类别选择颜色
  16. #
  17. # 在所有类别中自动选择颜色
  18. np.random.seed(2)
  19. class_colors = np.random.choice(
  20. list(matplotlib.colors.CSS4_COLORS),
  21. size=len(iris_y.unique()),
  22. replace=False
  23. )
  24. # 或者,为每个类别指定颜色:
  25. class_colors = ['tab:red', 'tab:green', 'tab:blue']
  26. print('类别颜色为:', class_colors)
  27. display(matplotlib.colors.ListedColormap(class_colors))
  28. #从每种颜色创建一个颜色映射
  29. class_cmaps = [
  30. matplotlib.colors.LinearSegmentedColormap.from_list('自定义', ['w', color])
  31. for color in class_colors
  32. ]
  33. #查看颜色映射
  34. # for cmap in class_cmaps: display(cmap)
  35. #
  36. #选择特征并拟合KNN分类器
  37. #
  38. feat0 = '花瓣长度'
  39. feat1 = '花瓣宽度'
  40. iris_x = iris_x[[feat0, feat1]]
  41. n_neighbors = 10
  42. knn = KNeighborsClassifier(n_neighbors=n_neighbors, weights='distance').fit(iris_x.values, iris_y)
  43. #
  44. #定义特征空间并获取整个区域的预测
  45. #
  46. x_grid, y_grid = np.meshgrid(
  47. np.linspace(iris_x[feat0].min(), iris_x[feat0].max(), 100),
  48. np.linspace(iris_x[feat1].min(), iris_x[feat1].max(), 100)
  49. )
  50. grid_flat = np.hstack([x_grid.reshape(-1, 1), y_grid.reshape(-1, 1)])
  51. #在特征空间的每个点上,获取:
  52. #预测类别和最近的邻居
  53. classes = knn.predict(grid_flat)
  54. neighbors = knn.kneighbors(grid_flat, return_distance=False)
  55. #对于每个点,邻居中有多少与预测类别匹配
  56. prop_per_gridpt = [sum(iris_y[row_neighbors] == clas) / n_neighbors
  57. for row_neighbors, clas
  58. in zip(neighbors, classes)]
  59. #将比例转换为颜色。每个类别都有一种颜色。
  60. rgb_per_gridpt = [
  61. class_cmaps[clas](prop)
  62. for clas, prop in zip(classes, prop_per_gridpt)
  63. ]
  64. rgb_per_gridpt = np.array(rgb_per_gridpt).reshape(x_grid.shape + (4,))
  65. #绘制图像
  66. f, ax = plt.subplots(figsize=(8, 8))
  67. ax.scatter(iris_x[feat0], iris_x[feat1], c=np.choose(iris_y.values, class_colors), s=60,
  68. alpha=0.7, linewidth=2)
  69. ax.set_xlabel(feat0)
  70. ax.set_ylabel(feat1)
  71. ax.set_title(f'散点图 {feat0} vs. {feat1}')
  72. ax.imshow(rgb_per_gridpt, extent=ax.axis(), alpha=0.5,
  73. interpolation='bicubic', origin='lower')

颜色区域在散点图中。

英文:

The code below produces a similar-looking plot to the one you posted. It uses matplotlib directly for plotting.

Output:

颜色区域在散点图中。

  1. from sklearn.neighbors import KNeighborsClassifier
  2. from sklearn.datasets import load_iris
  3. import pandas as pd
  4. import numpy as np
  5. import matplotlib.pyplot as plt
  6. import matplotlib
  7. #
  8. #Load data
  9. #
  10. iris = load_iris(as_frame=True)
  11. iris_x = iris.data
  12. iris_y = iris.target
  13. iris_x.columns = [col.capitalize()[:-5] for col in iris_x.columns]
  14. #
  15. #Choose a color for each class
  16. #
  17. # Choose automatically across all classes
  18. np.random.seed(2)
  19. class_colors = np.random.choice(
  20. list(matplotlib.colors.CSS4_COLORS),
  21. size=len(iris_y.unique()),
  22. replace=False
  23. )
  24. # Alternatively, specify per class:
  25. class_colors = [&#39;tab:red&#39;, &#39;tab:green&#39;, &#39;tab:blue&#39;]
  26. print(&#39;Class colors are:&#39;, class_colors)
  27. display( matplotlib.colors.ListedColormap(class_colors) )
  28. #Create a colormap out of each color
  29. class_cmaps = [
  30. matplotlib.colors.LinearSegmentedColormap.from_list(&#39;Custom&#39;, [&#39;w&#39;, color])
  31. for color in class_colors
  32. ]
  33. #View the colormap
  34. # for cmap in class_cmaps: display(cmap)
  35. #
  36. #Select features and fit KNN classifier
  37. #
  38. feat0 = &#39;Petal length&#39;
  39. feat1 = &#39;Petal width&#39;
  40. iris_x = iris_x[[feat0, feat1]]
  41. n_neighbors = 10
  42. knn = KNeighborsClassifier(n_neighbors=n_neighbors, weights=&#39;distance&#39;).fit(iris_x.values, iris_y)
  43. #
  44. #Define a feature space and get a prediction over the entire area
  45. #
  46. x_grid, y_grid = np.meshgrid(
  47. np.linspace(iris_x[feat0].min(), iris_x[feat0].max(), 100),
  48. np.linspace(iris_x[feat1].min(), iris_x[feat1].max(), 100)
  49. )
  50. grid_flat = np.hstack([x_grid.reshape(-1, 1), y_grid.reshape(-1, 1)])
  51. #At each point in the feature space, get the:
  52. #predicted class and nearest neighbors
  53. classes = knn.predict(grid_flat)
  54. neighbors = knn.kneighbors(grid_flat, return_distance=False)
  55. #For each point, what proportion of neighbors match the predicted class
  56. prop_per_gridpt = [sum(iris_y[row_neighbors] == clas) / n_neighbors
  57. for row_neighbors, clas
  58. in zip(neighbors, classes)]
  59. #Convert proportions to colours. Each class has a colour.
  60. rgb_per_gridpt = [
  61. class_cmaps[clas](prop)
  62. for clas, prop in zip(classes, prop_per_gridpt)
  63. ]
  64. rgb_per_gridpt = np.array(rgb_per_gridpt).reshape(x_grid.shape + (4,))
  65. #Plot
  66. f, ax = plt.subplots(figsize=(8, 8))
  67. ax.scatter(iris_x[feat0], iris_x[feat1], c=np.choose(iris_y.values, class_colors), s=60,
  68. alpha=0.7, linewidth=2)
  69. ax.set_xlabel(feat0)
  70. ax.set_ylabel(feat1)
  71. ax.set_title(f&#39;Scatter plot of {feat0} vs. {feat1}&#39;)
  72. ax.imshow(rgb_per_gridpt, extent=ax.axis(), alpha=0.5,
  73. interpolation=&#39;bicubic&#39;, origin=&#39;lower&#39;)

huangapple
  • 本文由 发表于 2023年8月10日 22:45:37
  • 转载请务必保留本文链接:https://go.coder-hub.com/76876844.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定