英文:
How to fit regression lines on each non-diagonal segment of a pairplot, while retaining breakdown of data points by hue
问题
Iris数据集的Pairplot中,回归线显示在三种鸢尾花的每一个上面,而不是整个数据集(这是我想要的)。我希望Pairplot的每个部分根据鸢尾花的类别对数据点进行颜色标记,但是回归线应该基于整个样本,而不是在每个部分内有三条独立的回归线。这可能吗?
英文:
I have made a pairplot using the Iris dataset using the code below, but the regression lines show for each of the three iris flowers, rather than for the dataset as a whole (which is what I am looking for). I want each segment of pairplot to color each data point according to which category of Iris it came from, but I want the regression line to be for the whole sample rather than having three separate regression lines within each segment. Is this possible?
(df is my DataFrame created from a downloaded .csv file for the dataset.)
iris_pairplot = sns.pairplot(df, hue = "variety", palette="Dark2", height=3, aspect=1, corner=True, kind="reg")
iris_pairplot.fig.suptitle("Pairplot of traits for full Iris sample", fontsize = "xx-large")
plt.tight_layout()
plt.savefig('iris_pairplot.png')
I have tried to use .regplot() but that seems to be for individual scatterplots as opposed to a pairplot?
答案1
得分: 2
以下是代码部分的翻译:
创建一个带有所有数据的回归线的 pairplot(不按颜色分组)在 pairplot 内部不是自动完成的。你需要首先创建一个不带 `kind='reg'` 选项的 pairplot,这会绘制出不带线的图。
然后,你可以使用 `map_offdiag()` 来排除对角子图,这将给你每个子图。注意:我获取的 iris 数据的列名是 `species`,而不是 `variety`,你可能需要重命名列名。你可以为每个子图绘制回归线。希望这是你想要的内容...
df = sns.load_dataset('iris')
## 注意,移除了 kind=reg"
iris_pairplot = sns.pairplot(df, hue = "species", palette="Dark2", height=3, aspect=1, corner=True)#, kind="reg")
iris_pairplot.fig.suptitle("完整 Iris 样本的特征 pairplot", fontsize = "xx-large")
## 定义一个绘制单个回归线的函数
def regline(x, y, **kwargs):
sns.regplot(data=kwargs['data'], x=x.name, y=y.name, scatter=False, color=kwargs['color'])
## 对 pairplot 内的每个非对角子图调用该函数
iris_pairplot.map_offdiag(regline, color='red', data=df)
plt.tight_layout()
plt.show()
请注意,这只是代码的翻译,不包括任何其他内容。
英文:
Creation of a pairplot with the regression line for all data (not split by hue) is not automatically possible within pairplot. You will need to create the pairplot first without the kind='reg'
option, which will plot the plot without the lines.
Then, you can take each of the individual subplots within this excluding the diagonal subplots using map_offdiag()
, which will give you each of the subplots. Note: I get iris with the column as species
, not variety
... you may need to rename the column. You can plot the regression line there for each of the subplots. Hope this is what you are looking for...
df = sns.load_dataset('iris')
## Note, removed kind=reg"
iris_pairplot = sns.pairplot(df, hue = "species", palette="Dark2", height=3, aspect=1, corner=True)#, kind="reg")
iris_pairplot.fig.suptitle("Pairplot of traits for full Iris sample", fontsize = "xx-large")
## Define function to plot a single regression line
def regline(x, y, **kwargs):
sns.regplot(data=kwargs['data'], x=x.name, y=y.name, scatter=False, color=kwargs['color'])
## Call the function for each non-diagonal subplot within pairplot
iris_pairplot.map_offdiag(regline, color='red', data=df)
plt.tight_layout()
plt.show()
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论