英文:
How to calculate corr from a dataframe with non-numeric columns
问题
我有以下数据集:
这些数据属于宝可梦数据集:
https://elitedatascience.com/wp-content/uploads/2022/07/Pokemon.csv
我想绘制如下的热力图:
# 计算相关性
corr = stats_df.corr()
# 热力图
plt.figure(figsize=(9,8))
sns.heatmap(corr)
但我得到了下面的错误;我该如何解决它?
英文:
I have these data set as shown below:
which belong to Pokemon dataset
https://elitedatascience.com/wp-content/uploads/2022/07/Pokemon.csv
I want to plot the heatmap as shown below:
# Calculate correlations
corr = stats_df.corr()
# Heatmap
plt.figure(figsize=(9,8))
sns.heatmap(corr)
But I get this error below; how can I solve it?
答案1
得分: 1
要计算(Pearson)相关性,您需要具有数值数据。
尝试:
df = pd.read_csv('Pokemon.csv', encoding='latin1', index_col='#')
corr = df.select_dtypes('number').drop(columns=['Total', 'Generation']).corr()
sns.heatmap(data=corr)
plt.tight_layout()
plt.show()
输出:
>>> corr
HP Attack Defense Sp. Atk Sp. Def Speed
HP 1.000000 0.422386 0.239622 0.362380 0.378718 0.175952
Attack 0.422386 1.000000 0.438687 0.396362 0.263990 0.381240
Defense 0.239622 0.438687 1.000000 0.223549 0.510747 0.015227
Sp. Atk 0.362380 0.396362 0.223549 1.000000 0.506121 0.473018
Sp. Def 0.378718 0.263990 0.510747 0.506121 1.000000 0.259133
Speed 0.175952 0.381240 0.015227 0.473018 0.259133 1.000000
英文:
To compute the (Pearson) correlation you need to have numeric data.
Try:
df = pd.read_csv('Pokemon.csv', encoding='latin1', index_col='#')
corr = df.select_dtypes('number').drop(columns=['Total', 'Generation']).corr()
sns.heatmap(data=corr)
plt.tight_layout()
plt.show()
Output:
>>> corr
HP Attack Defense Sp. Atk Sp. Def Speed
HP 1.000000 0.422386 0.239622 0.362380 0.378718 0.175952
Attack 0.422386 1.000000 0.438687 0.396362 0.263990 0.381240
Defense 0.239622 0.438687 1.000000 0.223549 0.510747 0.015227
Sp. Atk 0.362380 0.396362 0.223549 1.000000 0.506121 0.473018
Sp. Def 0.378718 0.263990 0.510747 0.506121 1.000000 0.259133
Speed 0.175952 0.381240 0.015227 0.473018 0.259133 1.000000
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论