英文:
How to convert Pandas Dataframe to the shape of a correlation matrix
问题
我有一个pandas数据框,大致看起来像这样:
```plaintext
xvar yvar meanRsquared
0 filled_water precip 0.119730
1 filled_water snow 0.113214
2 filled_water filled_wetland 0.119529
3 filled_wetland precip 0.104826
4 filled_wetland snow 0.121540
5 filled_wetland filled_water 0.121540
[676 rows x 3 columns]
我想将其形状转换为更传统的相关矩阵,其中列和索引是变量,而值是meanRsquared。
有没有简单的方法可以做到这一点?我已经试了一个小时了,但无法弄清楚如何做到这一点。
免责声明:是的,我知道pandas有一个用于创建相关矩阵的内置函数。但是,我的当前数据框是许多流域上数百个相关矩阵的平均值,所以我不能使用它。
这是我的最佳尝试,但显然逻辑在最后失败了。
listOfdicts = []
for xvar in df['xvar'].unique():
for yvar in df['yvar'].unique():
adict = {}
adict['index'] = xvar
adict[yvar] = yvar
adict['r'] = df['insert r value here']
listOfdicts.append(adict)
answer = pd.DataFrame.from_dict(listOfdicts)
我不指望这会起作用,但这是我最好的尝试。
<details>
<summary>英文:</summary>
I have a pandas dataframe which looks vaguely like this:
Out[130]:
xvar yvar meanRsquared
0 filled_water precip 0.119730
1 filled_water snow 0.113214
2 filled_water filled_wetland 0.119529
3 filled_wetland precip 0.104826
4 filled_wetland snow 0.121540
5 filled_wetland filled_water 0.121540
[676 rows x 3 columns]
I would like to transform it's shape into a more traditional correlation matrix, where the columns and the index are the variables, and the values are the meanRsquared.
Is there any easy way to do this? I've been playing around for an hour and can't figure out how I could do this.
DISCLAIMER: Yes, I know pandas has a built in function for creating a correlation matrix. However my current df is the average of hundreds of correlation matrices over many watersheds, so I cannot use that.
This is my best attempt, but obviously the logic failed towards the end.
listOfdicts = []
for xvar in df['xvar'].unique():
for yvar in df['yvar'].unique():
adict = {}
adict['index'] = xvar
adict[yvar] = yvar
adict['r'] = df['insert r value here']
listOfdicts.append(adict)
answer = pd.Dataframe.from_dict(listOfdicts)
I don't expect this to work, but this was my best shot.
</details>
# 答案1
**得分**: 1
请查看透视方法(https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.pivot.html)。
```python
import pandas as pd
df = pd.DataFrame(
data={
'xvar': ['filled_water', 'filled_water', 'filled_water', 'filled_wetland', 'filled_wetland', 'filled_wetland'],
'yvar': ['precip', 'snow', 'filled_wetland', 'precip', 'snow', 'filled_water'],
'meanRsquared': [1, 2, 3, 4, 5, 6]
}, index=range(6)
)
df.pivot(index='xvar', columns='yvar', values='meanRsquared')
输出:
yvar filled_water filled_wetland precip snow
xvar
filled_water NaN 3.0 1.0 2.0
filled_wetland 6.0 NaN 4.0 5.0
英文:
You need to look at pivot method (https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.pivot.html).
import pandas as pd
df =pd.DataFrame(
data={
'xvar': ['filled_water', 'filled_water', 'filled_water',
'filled_wetland', 'filled_wetland', 'filled_wetland'],
'yvar':['precip','snow','filled_wetland',
'precip','snow','filled_water' ],
'meanRsquared':[1,2,3,4,5,6]
}, index=range(6)
)
df.pivot(index='xvar', columns='yvar', values='meanRsquared')
Output:
yvar filled_water filled_wetland precip snow
xvar
filled_water NaN 3.0 1.0 2.0
filled_wetland 6.0 NaN 4.0 5.0
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论