在每个非对角线段上拟合回归线,同时保留按色调分类的数据点的分布。

huangapple go评论132阅读模式
英文:

How to fit regression lines on each non-diagonal segment of a pairplot, while retaining breakdown of data points by hue

问题

Iris数据集的Pairplot中,回归线显示在三种鸢尾花的每一个上面,而不是整个数据集(这是我想要的)。我希望Pairplot的每个部分根据鸢尾花的类别对数据点进行颜色标记,但是回归线应该基于整个样本,而不是在每个部分内有三条独立的回归线。这可能吗?

英文:

I have made a pairplot using the Iris dataset using the code below, but the regression lines show for each of the three iris flowers, rather than for the dataset as a whole (which is what I am looking for). I want each segment of pairplot to color each data point according to which category of Iris it came from, but I want the regression line to be for the whole sample rather than having three separate regression lines within each segment. Is this possible?

(df is my DataFrame created from a downloaded .csv file for the dataset.)

iris_pairplot = sns.pairplot(df, hue = "variety", palette="Dark2", height=3, aspect=1, corner=True, kind="reg")
iris_pairplot.fig.suptitle("Pairplot of traits for full Iris sample", fontsize = "xx-large")
plt.tight_layout()
plt.savefig('iris_pairplot.png')

在每个非对角线段上拟合回归线,同时保留按色调分类的数据点的分布。

I have tried to use .regplot() but that seems to be for individual scatterplots as opposed to a pairplot?

答案1

得分: 2

以下是代码部分的翻译:

创建一个带有所有数据的回归线的 pairplot不按颜色分组在 pairplot 内部不是自动完成的你需要首先创建一个不带 `kind='reg'` 选项的 pairplot,这会绘制出不带线的图。

然后你可以使用 `map_offdiag()` 来排除对角子图这将给你每个子图注意我获取的 iris 数据的列名是 `species`,而不是 `variety`,你可能需要重命名列名你可以为每个子图绘制回归线希望这是你想要的内容...

df = sns.load_dataset('iris')
## 注意,移除了 kind=reg"
iris_pairplot = sns.pairplot(df, hue = "species", palette="Dark2", height=3, aspect=1, corner=True)#, kind="reg")
iris_pairplot.fig.suptitle("完整 Iris 样本的特征 pairplot", fontsize = "xx-large")

## 定义一个绘制单个回归线的函数
def regline(x, y, **kwargs):
    sns.regplot(data=kwargs['data'], x=x.name, y=y.name, scatter=False, color=kwargs['color'])

## 对 pairplot 内的每个非对角子图调用该函数
iris_pairplot.map_offdiag(regline, color='red', data=df)

plt.tight_layout()
plt.show()

请注意,这只是代码的翻译,不包括任何其他内容。

英文:

Creation of a pairplot with the regression line for all data (not split by hue) is not automatically possible within pairplot. You will need to create the pairplot first without the kind='reg' option, which will plot the plot without the lines.

Then, you can take each of the individual subplots within this excluding the diagonal subplots using map_offdiag(), which will give you each of the subplots. Note: I get iris with the column as species, not variety... you may need to rename the column. You can plot the regression line there for each of the subplots. Hope this is what you are looking for...

df = sns.load_dataset('iris')
## Note, removed kind=reg"
iris_pairplot = sns.pairplot(df, hue = "species", palette="Dark2", height=3, aspect=1, corner=True)#, kind="reg")
iris_pairplot.fig.suptitle("Pairplot of traits for full Iris sample", fontsize = "xx-large")

## Define function to plot a single regression line
def regline(x, y, **kwargs):
    sns.regplot(data=kwargs['data'], x=x.name, y=y.name, scatter=False, color=kwargs['color'])

## Call the function for each non-diagonal subplot within pairplot
iris_pairplot.map_offdiag(regline, color='red', data=df)

plt.tight_layout()
plt.show()

在每个非对角线段上拟合回归线,同时保留按色调分类的数据点的分布。

huangapple
  • 本文由 发表于 2023年5月10日 18:53:40
  • 转载请务必保留本文链接:https://go.coder-hub.com/76217544.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定